Hi Peer,

please keep in mind that the concept of "free memory" may be different from what you expect:

- MemFree is actually unused memory. From kernel developer point of view, that's *wasted memory*
- Buffers, Cached is memory dynamically used by the memory management system to (generally speaking) cache data, i. e. results of "disk reads"

If processes request memory, but no unused memory is available, then the memory management will dynamically reduce the amount of memory used for buffers and caches.

If your system interacts with file systems / block devices (read and write operations) and unused memory is available, the buffers / caches will dynamically grow, to avoid future slow block device interaction.

The (by you called) "bad" node had much block device interaction, therefore memory is used for buffers and caches. The other node didn't (yet).

If you monitor your system long-term, you'll see that right after boot, you have a lot of "free" memory, which will gradually decrease over time - either used for applications or for buffers/caches and that ratio likely changing over time, if you have different work loads. If you see significant amounts of "MemFree" all the time, you should reduce the available amount of physical (or configured virtual) memory - it's a waste of resources. Coming form the other side, you have insufficient memory once you see significant swap operations or insufficient i/o throughput because of too small buffer/cache allocations.

> Jobs looking for >50GB will not start

That's interesting. Who's reporting this, the OS? Or is some starter process looking at the memory stats and complains because MemFree seems too low?

In order to clear buffers/caches, you could issue "sync; echo 3 > /proc/sys/vm/drop_caches", which is a one-shot operation. "cat /proc/meminfo" right after that command should show that most of the memory is reported as "free" again.

Regards,
J