PDA

View Full Version : SLES 12 SP3 systemd's connections to /run/systemd/private ?



SCFg4gyODe
01-Jul-2019, 18:21
At $JOB, our logs are getting swamped with messages saying:

"Too many concurrent connections, refusing"

It's hampering our ability to manage services, e.g.:


# systemctl status ntpd
Failed to get properties: Connection reset by peer

Near as I can tell from a quick read of the source of dbus.c, we're hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).

I think this is related to the number of connections systemd (pid 1) has to /run/systemd/private:


# ss -x | grep /run/systemd/private | wc -l
4015

But, despite the almost 4k connections, 'ss' shows that there are no connected peers:


# ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
0

- Are there any tunables that would help us mitigate the "Too many concurrent connections, refusing" messages?

- Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private correct?

I wanted to ask these questions directly to the systemd people, but their mail server is not configured to allow confirmation emails as to sign up to their mailing list:


<systemd-devel-request@lists.freedesktop.org>:
131.252.210.177 does not like recipient.
Remote host said: 550 5.1.1 <systemd-devel-request@lists.freedesktop.org>: Recip
ient address rejected: User unknown in local recipient table
Giving up on 131.252.210.177.

malcolmlewis
01-Jul-2019, 19:41
At $JOB, our logs are getting swamped with messages saying:

"Too many concurrent connections, refusing"

It's hampering our ability to manage services, e.g.:


# systemctl status ntpd
Failed to get properties: Connection reset by peer

Near as I can tell from a quick read of the source of dbus.c, we're hitting a hard-coded limit of CONNECTIONS_MAX (set to 4096).

I think this is related to the number of connections systemd (pid 1) has to /run/systemd/private:


# ss -x | grep /run/systemd/private | wc -l
4015

But, despite the almost 4k connections, 'ss' shows that there are no connected peers:


# ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
0

- Are there any tunables that would help us mitigate the "Too many concurrent connections, refusing" messages?

- Is my guess about CONNECTIONS_MAX's relationship to /run/systemd/private correct?

I wanted to ask these questions directly to the systemd people, but their mail server is not configured to allow confirmation emails as to sign up to their mailing list:


<systemd-devel-request@lists.freedesktop.org>:
131.252.210.177 does not like recipient.
Remote host said: 550 5.1.1 <systemd-devel-request@lists.freedesktop.org>: Recip
ient address rejected: User unknown in local recipient table
Giving up on 131.252.210.177.
Hi and welcome to the Forum :)
Sounds like an issue with ntpd?



systemctl status ntpd

● ntpd.service - NTP Server Daemon
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/generator/ntpd.service.d
└─50-insserv.conf-$time.conf
Active: active (running) since Sat 2019-06-29 16:28:56 CDT; 1 day 21h ago
Docs: man:ntpd(1)
Process: 19596 ExecStart=/usr/sbin/start-ntpd start (code=exited, status=0/SUCCESS)
Main PID: 19605 (ntpd)
Tasks: 2 (limit: 512)
CGroup: /system.slice/ntpd.service
├─19605 /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
└─19606 ntpd: asynchronous dns resolver

Jun 29 16:28:56 gekkota-suma systemd[1]: Starting NTP Server Daemon...
Jun 29 16:28:56 gekkota-suma ntpd[19604]: ntpd 4.2.8p13@1.3847-o Wed Mar 13 12:24:30 UTC 2019 (1): Starting
Jun 29 16:28:56 gekkota-suma ntpd[19604]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
Jun 29 16:28:56 gekkota-suma ntpd[19605]: proto: precision = 0.053 usec (-24)
Jun 29 16:28:56 gekkota-suma ntpd[19605]: basedate set to 2019-03-01
Jun 29 16:28:56 gekkota-suma ntpd[19605]: gps base set to 2019-03-03 (week 2043)
Jun 29 16:28:56 gekkota-suma ntpd[19605]: switching logging to file /var/log/ntp
Jun 29 16:28:56 gekkota-suma start-ntpd[19596]: Starting network time protocol daemon (NTPD)
Jun 29 16:28:56 gekkota-suma systemd[1]: Started NTP Server Daemon.

ntpq

ntpq> peers
remote refid st t when poll reach delay offset jitter
================================================== ============================
*time-a-g.nist.g .NIST. 1 u 584 1024 205 168.297 -0.847 1.932
+time-b-g.nist.g .NIST. 1 u 289 1024 7 165.909 -0.151 0.442

gekkota-suma:~ # ss -x | grep /run/systemd/private | wc -l
0

systemctl status dbus
● dbus.service - D-Bus System Message Bus
Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
Active: active (running) since Sat 2019-06-29 15:16:13 CDT; 1 day 22h ago
Docs: man:dbus-daemon(1)
Main PID: 838 (dbus-daemon)
Tasks: 1 (limit: 512)
CGroup: /system.slice/dbus.service
└─838 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Jun 29 15:16:13 gekkota-suma systemd[1]: Started D-Bus System Message Bus.
Jun 29 15:16:15 gekkota-suma dbus[838]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Jun 29 15:16:15 gekkota-suma dbus[838]: [system] Successfully activated service 'org.freedesktop.hostname1'

SCFg4gyODe
01-Jul-2019, 20:13
Nope, I can independently confirm the ntpd process is running and functioning.

The symptom here is that depending on system activity, systemd stops being able to process new requests. systemd allows requests to come in (e.g. via an invocation of 'systemctl', but if I understand the source of dbus.c, when there are too many connections to it's outgoing stream, systemd rejects the efforts, apparently with no retry.

At $JOB, when we first spin up a new SLES12 host with our custom services, the number of connections to /run/systemd/private numbers in the mere hundreds. As workloads increase, the number of connections raises to the thousands.

Some hosts are plagued with the 'Too many concurrent' connections, some are not. Empirically, all I've been able to see is that the number of systemd's connections to /run/systemd/private tips over 4k.

- I can't demonstrate that there are any consumers of this stream.
- I can't explain why the connection count increases over time.
- The CONNECTION_MAX constant is hard-coded, and it gets increased every few months/years, but never seems to be expressed as something you can set in a config file.
- I don't know what tunables affect the lifetime/culling of those connections.

I have a hypothesis that this may be some resource leak in systemd, but I've not found a way to test that.

malcolmlewis
01-Jul-2019, 21:57
Nope, I can independently confirm the ntpd process is running and functioning.

The symptom here is that depending on system activity, systemd stops being able to process new requests. systemd allows requests to come in (e.g. via an invocation of 'systemctl', but if I understand the source of dbus.c, when there are too many connections to it's outgoing stream, systemd rejects the efforts, apparently with no retry.

At $JOB, when we first spin up a new SLES12 host with our custom services, the number of connections to /run/systemd/private numbers in the mere hundreds. As workloads increase, the number of connections raises to the thousands.

Some hosts are plagued with the 'Too many concurrent' connections, some are not. Empirically, all I've been able to see is that the number of systemd's connections to /run/systemd/private tips over 4k.

- I can't demonstrate that there are any consumers of this stream.
- I can't explain why the connection count increases over time.
- The CONNECTION_MAX constant is hard-coded, and it gets increased every few months/years, but never seems to be expressed as something you can set in a config file.
- I don't know what tunables affect the lifetime/culling of those connections.

I have a hypothesis that this may be some resource leak in systemd, but I've not found a way to test that.
Hi
So, my SLES 12 Sp3 just runs SuMA, but I see no private directory. I'm assuming this is a systemd service your starting for your service? Almost sounds like the process keeps respawning and maybe needs a killmode added in the service to clean them up.