PDA

View Full Version : kswapd0 takes 100% CPU



tojanov
28-Nov-2013, 13:34
Hi there,

after updateing SLES 11 SP2 to SP3 i have different servers, which take 100% CPU (1 Core) für kswapd0.
All servers are VM and running on same HyperV Cluster.
Some machines have this bug, some not and i don't know why.
After reboot such a buggy system, everything runs fine for a few hours, in some cases for a few days.

All systems are up to date and using the same Kernel.

jmozdzen
28-Nov-2013, 13:51
Hi tojanov,

a few things I'd like to know about the situation:

- How's memory usage when kswapd hits the roof? ("free -m")

- Do you have swap space configured?

- Does it go away after some time, all by itself?

- Does purging the cache help? ("sync;echo 3 > /proc/sys/vm/drop_caches")

- Could you please report the actual kernel version(s) of the affected systems? This would be helpful if forwarding this case to our SUSE back-ends...

- Do you have an active support subscription (not only updates - may you open support requests)? This may quickly turn into a case a support engineer ought to have a look at, since the actual cause can be one of several. (I'd rule out bad memory as it has hit several servers after the update, so this may be a question of proper memory tuning or even a kernel problem).

The usual conclusion for similar reports was that this can be caused by bad memory or a low memory situation.

Regards,
Jens

tojanov
28-Nov-2013, 14:15
Hi Jens,

thanks for your quickly post.

The systems are running with 4gb memory and 2gb of swap partition.
Swap seems not to be needed, because memory usage at most time is under 50% inluded cache mem.

total used free shared buffers cached
Mem: 3825 1907 1918 0 248 1532
-/+ buffers/cache: 126 3699
Swap: 2055 0 2055

There is not more load ore some special activitiy which induces the problem.
I only can resolve it by rebooting for some hours, is never go away by itself.

But thanks für you advice by purging the cache ....it works !!! But what does that mean for me??

I use Kernel 3.0.93-0.8-default and updated yesterday 1 machine to 3.0.101-0.8-default, but the problem still exists with this kernel version.

Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.

jmozdzen
28-Nov-2013, 14:24
Hi tojanov,


[...]But thanks für you advice by purging the cache ....it works !!! But what does that mean for me??

[...] Of course we have active support subscriptions, but first i would have a look if this is a known bug in the community and hope to be able to fix it myself.

it might be an upstream kernel problem, the net is full of reports on this. I'll ask my SuSE contact for advice, since I cannot tell which fixes/improvement from later kernels were back-ported to the SLES kernels, that's something the developers will have to answer.

I'll get back to you once I have a proper reply, but due to the holiday season, this may take a few days...

Regards,
Jens

jmozdzen
13-Dec-2013, 12:36
Hi tojanov,

could you collect /proc/meminfo and /proc/vmstat while kswapd is at 100% CPU (taken every 1s or so)? This would help further diagnosing the effect. Is this by chance a NUMA machine?

Regards,
Jens

tojanov
23-Dec-2013, 10:10
Hi and sorry for my late reply.

here are the logs:


cat /proc/meminfo
MemTotal: 3917640 kB
MemFree: 3018940 kB
Buffers: 223684 kB
Cached: 458328 kB
SwapCached: 0 kB
Active: 669160 kB
Inactive: 93120 kB
Active(anon): 68964 kB
Inactive(anon): 11496 kB
Active(file): 600196 kB
Inactive(file): 81624 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2104476 kB
SwapFree: 2104476 kB
Dirty: 280 kB
Writeback: 0 kB
AnonPages: 80304 kB
Mapped: 22688 kB
Shmem: 192 kB
Slab: 60184 kB
SReclaimable: 35048 kB
SUnreclaim: 25136 kB
KernelStack: 1608 kB
PageTables: 7308 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4063296 kB
Committed_AS: 464796 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 35544 kB
VmallocChunk: 34359695356 kB
HardwareCorrupted: 0 kB
AnonHugePages: 26624 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 44992 kB
DirectMap2M: 4149248 kB




cat /proc/vmstat
nr_free_pages 754750
nr_inactive_anon 2874
nr_active_anon 17193
nr_inactive_file 20408
nr_active_file 150051
nr_unevictable 0
nr_mlock 0
nr_anon_pages 13363
nr_mapped 5641
nr_file_pages 170504
nr_dirty 84
nr_writeback 0
nr_slab_reclaimable 8746
nr_slab_unreclaimable 6290
nr_page_table_pages 1775
nr_kernel_stack 197
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 48
nr_dirtied 7422473
nr_written 6878676
numa_hit 1563044047
numa_miss 0
numa_foreign 0
numa_interleave 4894
numa_local 1563044047
numa_other 0
nr_anon_transparent_hugepages 13
nr_dirty_threshold 378110
nr_dirty_background_threshold 94527
pgpgin 71768191
pgpgout 45388604
pswpin 0
pswpout 0
pgalloc_dma 9
pgalloc_dma32 1590204000
pgalloc_normal 16195
pgalloc_movable 0
pgfree 1590975539
pgactivate 4828774
pgdeactivate 2884
pgfault 5027834737
pgmajfault 7267
pgrefill_dma 0
pgrefill_dma32 2368
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 0
pgsteal_dma32 3461797
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_dma32 3464385
pgscan_kswapd_normal 599
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_dma32 0
pgscan_direct_normal 0
pgscan_direct_movable 0
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 24710144
kswapd_steal 3461797
kswapd_inodesteal 15784638
kswapd_low_wmark_hit_quickly 0
kswapd_high_wmark_hit_quickly 1115988083510
kswapd_skip_congestion_wait 0
pageoutrun 1115988083512
allocstall 0
pgrotated 62
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 1305
unevictable_pgs_scanned 0
unevictable_pgs_rescued 6429
unevictable_pgs_mlocked 6451
unevictable_pgs_munlocked 6451
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0
thp_fault_alloc 518
thp_fault_fallback 0
thp_collapse_alloc 29
thp_collapse_alloc_failed 0
thp_split 9


The HyperV hosts, where the SLES VM's are running, use NUMA technology. But the problem only exists on a few of them.

tojanov
23-Jan-2014, 15:31
Your PM quota is full ;)

I can only select openSuse platform, if i would open a bug in the bugtracker. There is no SLES, is that right and should i chose this one?