Results 1 to 4 of 4

Thread: Memory not free, but no process ?!?

Hybrid View

  1. Memory not free, but no process ?!?

    Hi,

    we have the following problem with some HPC nodes. After some time (7-30 days) large parts of the memory
    are occupied. There are no active or inactive processes using the memory. Any idea how do find the memory (ps,lsof , /proc/.. did not help).

    System SLES 11SP3, 64GB RAM

    Code:
    BAD node:
    node22:~ # cat /proc/meminfo 
    MemTotal:       66066804 kB
    MemFree:        49673020 kB
    Buffers:        12298540 kB
    Cached:          1467596 kB
    SwapCached:            0 kB
    Active:         12875052 kB
    Inactive:         948312 kB
    Active(anon):      42288 kB
    Inactive(anon):       88 kB
    Active(file):   12832764 kB
    Inactive(file):   948224 kB
    Unevictable:           0 kB
    Mlocked:               0 kB
    SwapTotal:      16779260 kB
    SwapFree:       16779260 kB
    Dirty:                52 kB
    Writeback:             0 kB
    AnonPages:         42236 kB
    Mapped:            13780 kB
    Shmem:               140 kB
    Slab:            1638204 kB
    SReclaimable:    1578768 kB
    SUnreclaim:        59436 kB
    KernelStack:        6400 kB
    PageTables:         2696 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:    49812660 kB
    Committed_AS:     115352 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:      383588 kB
    VmallocChunk:   34359352372 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:     14336 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    DirectMap4k:      227328 kB
    DirectMap2M:    11290624 kB
    DirectMap1G:    55574528 kB
    
    
    GOOD node:
    node08:~ # cat /proc/meminfo 
    MemTotal:       66066804 kB
    MemFree:        63056704 kB
    Buffers:          814328 kB
    Cached:           668740 kB
    SwapCached:         6528 kB
    Active:          1056692 kB
    Inactive:         511812 kB
    Active(anon):      32872 kB
    Inactive(anon):    47336 kB
    Active(file):    1023820 kB
    Inactive(file):   464476 kB
    Unevictable:           0 kB
    Mlocked:               0 kB
    SwapTotal:      16779260 kB
    SwapFree:       16761860 kB
    Dirty:                88 kB
    Writeback:             0 kB
    AnonPages:         74692 kB
    Mapped:            12376 kB
    Shmem:                52 kB
    Slab:             507524 kB
    SReclaimable:     364540 kB
    SUnreclaim:       142984 kB
    KernelStack:        6416 kB
    PageTables:         2980 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:    49812660 kB
    Committed_AS:     150608 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:      383588 kB
    VmallocChunk:   34359352372 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:     32768 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    DirectMap4k:       71680 kB
    DirectMap2M:     4106240 kB
    DirectMap1G:    62914560 kB

  2. #2

    Re: Memory not free, but no process ?!?

    On 11/21/2018 01:44 AM, mpibgc wrote:
    >
    > we have the following problem with some HPC nodes. After some time
    > (7-30 days) large parts of the memory
    > are occupied. There are no active or inactive processes using the
    > memory. Any idea how do find the memory (ps,lsof , /proc/.. did not
    > help).
    >
    > System SLES 11SP3, 64GB RAM


    Just to be sure, other than the display from pseudo-files like
    /proc/meminfo do you have any actual symptoms that make you classify this
    as a "problem" rather than a great feature which improves your system
    performance? Have you done any benchmarking, particularly of things using
    disk I/O for files, that might indicate which box performs better?

    > Code:
    > --------------------
    >
    > BAD node:
    > node22:~ # cat /proc/meminfo
    > MemTotal: 66066804 kB
    > MemFree: 49673020 kB
    > Buffers: 12298540 kB
    > Cached: 1467596 kB


    The Free, Buffers, and Cached lines above equal almost all of the system
    memory, so it looks to me like your memory is mostly available. This may
    seem like a "bad thing" at first, but it is actually the way that Linux
    (the kernel) helps increase performance of your system.

    RAM that is not being used is essentially wasted, and RAM, used or not, is
    very fast, so its waste is particularly unfortunate. Disks, on the other
    hand, have traditionally been slower than RAM, particularly before SSDs
    but even still today. As a result, ideally we would have the system do an
    operation with RAM rather than with the disk. So far, hopefully this is
    all clear.

    Linux (the kernel) automatically keeps files it has used recently in RAM
    in a filesystem cache. This is why you have things like 'Active(file)'
    below (or Buffers above) showing 12 GiB of use. Anytime you then go to
    the OS and ask for that file the OS will first check its cache and,
    assuming it is current, present the file from there. Any writes that
    happen go to disk, of course, and update the cache, but since many files
    are read a billion times for every write, it makes sense to just load from
    cache and leave the slow disk out of the picture. As a result, the
    system's built up cache of commonly-used files helps the system
    performance significantly.

    Of course, the often-asked question (see Google) that follows is, "What
    happens when I need to load a process that takes 50+ GiB of RAM and my
    system only shows "free" RAM of 49 GiB?" The answer is that the system
    isn't stupid, and recognizes that need for RAM trumps a desire to cache
    things, and, because RAM is very fast, it simply clears out the cache to
    make room for a real process's needs. You can easily test this with a few
    utilities online that test RAM, or a simple script that does the same, or
    real life programs. When you do you will see that the cache amounts drop
    and the RAM is given to your process which needs it.

    Usually this question comes up (in this forum as well as other places)
    with the 'free' command output, so perhaps review that as its simpler
    dataset makes things a little clearer than the verbose output from
    /proc/meminfo, and if you Google for this question and the 'free' command
    you'll see this same kind of response all over the Internet.

    > SwapCached: 0 kB
    > Active: 12875052 kB
    > Inactive: 948312 kB
    > Active(anon): 42288 kB
    > Inactive(anon): 88 kB
    > Active(file): 12832764 kB
    > Inactive(file): 948224 kB
    > Unevictable: 0 kB
    > Mlocked: 0 kB
    > SwapTotal: 16779260 kB
    > SwapFree: 16779260 kB
    > Dirty: 52 kB
    > Writeback: 0 kB
    > AnonPages: 42236 kB
    > Mapped: 13780 kB
    > Shmem: 140 kB
    > Slab: 1638204 kB
    > SReclaimable: 1578768 kB
    > SUnreclaim: 59436 kB
    > KernelStack: 6400 kB
    > PageTables: 2696 kB
    > NFS_Unstable: 0 kB
    > Bounce: 0 kB
    > WritebackTmp: 0 kB
    > CommitLimit: 49812660 kB
    > Committed_AS: 115352 kB
    > VmallocTotal: 34359738367 kB
    > VmallocUsed: 383588 kB
    > VmallocChunk: 34359352372 kB
    > HardwareCorrupted: 0 kB
    > AnonHugePages: 14336 kB
    > HugePages_Total: 0
    > HugePages_Free: 0
    > HugePages_Rsvd: 0
    > HugePages_Surp: 0
    > Hugepagesize: 2048 kB
    > DirectMap4k: 227328 kB
    > DirectMap2M: 11290624 kB
    > DirectMap1G: 55574528 kB
    >
    >
    > GOOD node:
    > node08:~ # cat /proc/meminfo
    > MemTotal: 66066804 kB
    > MemFree: 63056704 kB
    > Buffers: 814328 kB
    > Cached: 668740 kB


    By "good" I presume you mean wasting a lot of RAM by doing nothing
    efficient with it. Yes, that describes this machine better than the
    other. :-)

    > SwapCached: 6528 kB
    > Active: 1056692 kB
    > Inactive: 511812 kB
    > Active(anon): 32872 kB
    > Inactive(anon): 47336 kB
    > Active(file): 1023820 kB
    > Inactive(file): 464476 kB
    > Unevictable: 0 kB
    > Mlocked: 0 kB
    > SwapTotal: 16779260 kB
    > SwapFree: 16761860 kB
    > Dirty: 88 kB
    > Writeback: 0 kB
    > AnonPages: 74692 kB
    > Mapped: 12376 kB
    > Shmem: 52 kB
    > Slab: 507524 kB
    > SReclaimable: 364540 kB
    > SUnreclaim: 142984 kB
    > KernelStack: 6416 kB
    > PageTables: 2980 kB
    > NFS_Unstable: 0 kB
    > Bounce: 0 kB
    > WritebackTmp: 0 kB
    > CommitLimit: 49812660 kB
    > Committed_AS: 150608 kB
    > VmallocTotal: 34359738367 kB
    > VmallocUsed: 383588 kB
    > VmallocChunk: 34359352372 kB
    > HardwareCorrupted: 0 kB
    > AnonHugePages: 32768 kB
    > HugePages_Total: 0
    > HugePages_Free: 0
    > HugePages_Rsvd: 0
    > HugePages_Surp: 0
    > Hugepagesize: 2048 kB
    > DirectMap4k: 71680 kB
    > DirectMap2M: 4106240 kB
    > DirectMap1G: 62914560 kB
    >
    > --------------------


    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below.

    If you want to send me a private message, please let me know in the
    forum as I do not use the web interface often.

  3. Re: Memory not free, but no process ?!?

    Hi Peer & all,

    this seems to be a duplicate of "https://forums.suse.com/showthread.php?12909-Memory-not-free-but-no-process-!" and the post there reports that processes won't start, reporting insufficient memory.

    Please also see my answer to that thread, which also references a command to "flush" buffers and caches, to test if this is really the cause for not being able to start the process.

    Regards,
    J
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

  4. Re: Memory not free, but no process ?!?

    You are right,

    doing a "sync; echo 3 > /proc/sys/vm/drop_caches" solved the "problem".

    I'll have to execute it on all idle nodes, so that the memory values are shown correctly for
    the queueing system.

    Thanks, Peer

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •