They are used to assign limits to the resources (CPU, Memory, IO, Network, etc.) which can be used by processes

Each resource has its own hierarchy tree
Every process will belong to a node in the hierarchy tree of the resources
Each hierarchy starts out with only one node (the root node), each process starts out at this node
Every node in the hierarchy tree is a group of processes that share the same resource

PID 1 is at the root of each cgroup
A new process starts in its parents group

Groups are materialized by pseudo-fs : /sys/fs/cgroup
A new cgroup can be created using mkdir in the pseudo-fs : mkdir /sys/fs/cgroup/memory/mycgroup
Move process to group : echo $PID > /sys/fs/cgroup/../tasks

Memory Cgroup

Accounting

Allows to keep track of the amount of memory used by each process/ group of processes

The memory is amounted in terms of pages (4KB on most systems)
Memory pages are of two types:

  • file (points to data present on disk)
  • anonymous (points to data not present on disk) There are two pools active (recently accessed) and inactive (candidate for eviction) the kernel will decide based on the memory available which pages need to be evicted

Each page is charged/ tagged to a group
If multiple groups access a page then only one of the group is charged/ tagged for the page i.e. the memory occupied the page will be only counted against one process

Limits

Each group can be assigned an optional limit
They are of two types : hard and soft

When hard limits are surpassed it will cause an Out of Memory (OOM) processor killer that will randomly kill processes
Soft limits are not enforced but if the system reaches an state where it cannot operate safely anymore the soft limits are checked and based on how much above the soft limit the process is running the more likely pages are to be removed from it

These limits can be set for different types of memories as well like physical, kernel, total (RAM + Swap)

OOM Notification can be set which causes an group whose hard limit is surpassed to freeze. An notification is raised which can be acted up my the user and once the processes is memory usage is lower than the limit the process can be unfrozen

Every time the kernal gives an page or removes an page from a process it has to update an counter
There is an slight overhead cost associated with this. This cannot be enabled/ disabled at the process level its set on the whole machine

HugeTLS cgroup

Controls the amount of huge pages that can be used by a cgroup
By default a process can use all the Huge Pages

CPU cgroup

Allows to keep track of user/system CPU time
Keeps track of usage per CPU
We cannot set limit on CPU usage

CPUset cgroup

Allows to pin groups to certain CPU(s)
Reverse CPU for certain processes/ apps
Allows to avoid process bouncing between processes

Block IO cgroup

Keeps track of I/O for each group for each block device
It can be tracked by the no. of read and write operations performed by a group as well as the type of operation (sync & async)

Allows to set limits for each group for each block device
Based on speed of read and write operations or even based on operations and bytes Allows to set weights to each group as well

Net_cls & Net_prio cgroup

Automatically set class or priority to the traffic that is generated by a group
It only applies to egress traffic
Net_cls will add an tag to the traffic that is generated by a group which then can be shaped as per our need using tools like tc/iptables
Net_prio assigns an priority to the traffic which is used by queuing algorithms

Devices cgroup

Controls what a group can do on a device node
Allows to control permissions like read, write
Used to prevent containers from having access to all the directories on the system

Freezer cgroup

Allows to throttle/ thaw a group of processes
Similar in functionally to SIGSTOP/ SIGCONT
Freezer signals cannot be identified by the processes and hence will not impede ptrace/ debugging