Salient features
-a. Enable control of both RSS (mapped) and Page Cache (unmapped) pages
+a. Enable control of Anonymous, Page Cache (mapped and unmapped) and
+ Swap Cache memory pages.
b. The infrastructure allows easy addition of other types of memory to control
c. Provides *zero overhead* for non memory controller users
d. Provides a double LRU: global memory pressure causes reclaim from the
global LRU; a cgroup on hitting a limit, reclaims from the per
cgroup LRU
-NOTE: Swap Cache (unmapped) is not accounted now.
-
Benefits and Purpose of the memory controller
The memory controller isolates the memory behaviour of a group of tasks
usage of mem+swap is limited by memsw.limit_in_bytes.
-Note: why 'mem+swap' rather than swap.
+* why 'mem+swap' rather than swap.
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
-mem+swap.
+mem+swap. In other words, when we want to limit the usage of swap without
+affecting global LRU, mem+swap limit is better than just limiting swap from
+OS point of view.
-In other words, when we want to limit the usage of swap without affecting
-global LRU, mem+swap limit is better than just limiting swap from OS point
-of view.
+* What happens when a cgroup hits memory.memsw.limit_in_bytes
+When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
+in this cgroup. Then, swap-out will not be done by cgroup routine and file
+caches are dropped. But as mentioned above, global LRU can do swapout memory
+from it for sanity of the system's memory management state. You can't forbid
+it by cgroup.
2.5 Reclaim
pages that are selected for reclaiming come from the per cgroup LRU
list.
+NOTE: Reclaim does not work for the root cgroup, since we cannot set any
+limits on the root cgroup.
+
2. Locking
The memory controller uses the following hierarchy
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes.
+NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
+NOTE: We cannot set limits on the root cgroup any more.
# cat /cgroups/0/memory.limit_in_bytes
4194304
moved to the parent. If you want to avoid that, force_empty will be useful.
5.2 stat file
- memory.stat file includes following statistics (now)
- cache - # of pages from page-cache and shmem.
- rss - # of pages from anonymous memory.
- pgpgin - # of event of charging
- pgpgout - # of event of uncharging
- active_anon - # of pages on active lru of anon, shmem.
- inactive_anon - # of pages on active lru of anon, shmem
- active_file - # of pages on active lru of file-cache
- inactive_file - # of pages on inactive lru of file cache
- unevictable - # of pages cannot be reclaimed.(mlocked etc)
-
- Below is depend on CONFIG_DEBUG_VM.
- inactive_ratio - VM inernal parameter. (see mm/page_alloc.c)
- recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
- recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
- recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
- recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
-
- Memo:
+
+memory.stat file includes following statistics
+
+cache - # of bytes of page cache memory.
+rss - # of bytes of anonymous and swap cache memory.
+pgpgin - # of pages paged in (equivalent to # of charging events).
+pgpgout - # of pages paged out (equivalent to # of uncharging events).
+active_anon - # of bytes of anonymous and swap cache memory on active
+ lru list.
+inactive_anon - # of bytes of anonymous memory and swap cache memory on
+ inactive lru list.
+active_file - # of bytes of file-backed memory on active lru list.
+inactive_file - # of bytes of file-backed memory on inactive lru list.
+unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).
+
+The following additional stats are dependent on CONFIG_DEBUG_VM.
+
+inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
+recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
+recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
+recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
+recent_scanned_file - VM internal parameter. (see mm/vmscan.c)
+
+Memo:
recent_rotated means recent frequency of lru rotation.
recent_scanned means recent # of scans to lru.
showing for better debug please see the code for meanings.
+Note:
+ Only anonymous and swap cache memory is listed as part of 'rss' stat.
+ This should not be confused with the true 'resident set size' or the
+ amount of physical memory used by the cgroup. Per-cgroup rss
+ accounting is not done yet.
5.3 swappiness
Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
- Following cgroup's swapiness can't be changed.
+ Following cgroups' swapiness can't be changed.
- root cgroup (uses /proc/sys/vm/swappiness).
- a cgroup which uses hierarchy and it has child cgroup.
- a cgroup which uses hierarchy and not the root of hierarchy.
NOTE2: This feature can be enabled/disabled per subtree.
-7. TODO
+7. Soft limits
+
+Soft limits allow for greater sharing of memory. The idea behind soft limits
+is to allow control groups to use as much of the memory as needed, provided
+
+a. There is no memory contention
+b. They do not exceed their hard limit
+
+When the system detects memory contention or low memory control groups
+are pushed back to their soft limits. If the soft limit of each control
+group is very high, they are pushed back as much as possible to make
+sure that one control group does not starve the others of memory.
+
+Please note that soft limits is a best effort feature, it comes with
+no guarantees, but it does its best to make sure that when memory is
+heavily contended for, memory is allocated based on the soft limit
+hints/setup. Currently soft limit based reclaim is setup such that
+it gets invoked from balance_pgdat (kswapd).
+
+7.1 Interface
+
+Soft limits can be setup by using the following commands (in this example we
+assume a soft limit of 256 megabytes)
+
+# echo 256M > memory.soft_limit_in_bytes
+
+If we want to change this to 1G, we can at any time use
+
+# echo 1G > memory.soft_limit_in_bytes
+
+NOTE1: Soft limits take effect over a long period of time, since they involve
+ reclaiming memory for balancing between memory cgroups
+NOTE2: It is recommended to set the soft limit always below the hard limit,
+ otherwise the hard limit will take precedence.
+
+8. TODO
1. Add support for accounting huge pages (as a separate controller)
2. Make per-cgroup scanner reclaim not-shared pages first