Lines Matching +full:echo +full:- +full:active +full:- +full:ms
1 .. _cgroup-v2:
11 conventions of cgroup v2. It describes all userland-visible aspects
14 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
22 1-1. Terminology
23 1-2. What is cgroup?
25 2-1. Mounting
26 2-2. Organizing Processes and Threads
27 2-2-1. Processes
28 2-2-2. Threads
29 2-3. [Un]populated Notification
30 2-4. Controlling Controllers
31 2-4-1. Availability
32 2-4-2. Enabling and Disabling
33 2-4-3. Top-down Constraint
34 2-4-4. No Internal Process Constraint
35 2-5. Delegation
36 2-5-1. Model of Delegation
37 2-5-2. Delegation Containment
38 2-6. Guidelines
39 2-6-1. Organize Once and Control
40 2-6-2. Avoid Name Collisions
42 3-1. Weights
43 3-2. Limits
44 3-3. Protections
45 3-4. Allocations
47 4-1. Format
48 4-2. Conventions
49 4-3. Core Interface Files
51 5-1. CPU
52 5-1-1. CPU Interface Files
53 5-2. Memory
54 5-2-1. Memory Interface Files
55 5-2-2. Usage Guidelines
56 5-2-3. Memory Ownership
57 5-3. IO
58 5-3-1. IO Interface Files
59 5-3-2. Writeback
60 5-3-3. IO Latency
61 5-3-3-1. How IO Latency Throttling Works
62 5-3-3-2. IO Latency Interface Files
63 5-3-4. IO Priority
64 5-4. PID
65 5-4-1. PID Interface Files
66 5-5. Cpuset
67 5.5-1. Cpuset Interface Files
68 5-6. Device controller
69 5-7. RDMA
70 5-7-1. RDMA Interface Files
71 5-8. DMEM
72 5-8-1. DMEM Interface Files
73 5-9. HugeTLB
74 5.9-1. HugeTLB Interface Files
75 5-10. Misc
76 5.10-1 Misc Interface Files
77 5.10-2 Migration and Ownership
78 5-11. Others
79 5-11-1. perf_event
80 5-N. Non-normative information
81 5-N-1. CPU controller root cgroup process behaviour
82 5-N-2. IO controller root cgroup process behaviour
84 6-1. Basics
85 6-2. The Root and Views
86 6-3. Migration and setns(2)
87 6-4. Interaction with Other Namespaces
89 P-1. Filesystem Support for Writeback
92 R-1. Multiple Hierarchies
93 R-2. Thread Granularity
94 R-3. Competition Between Inner Nodes and Threads
95 R-4. Other Interface Issues
96 R-5. Controller Issues and Remedies
97 R-5-1. Memory
104 -----------
113 ---------------
119 cgroup is largely composed of two parts - the core and controllers.
135 hierarchical - if a controller is enabled on a cgroup, it affects all
137 sub-hierarchy of the cgroup. When a controller is enabled on a nested
147 --------
152 # mount -t cgroup2 none $MOUNT_POINT
157 Controllers which are not in active use in the v2 hierarchy can be
162 is no longer referenced in its current hierarchy. Because per-cgroup
169 to inter-controller dependencies, other controllers may need to be
190 ignored on non-init namespace mounts. Please refer to the
207 option is ignored on non-init namespace mounts.
215 behavior but is a mount-option to avoid regressing setups
229 controller. The pre-allocated pool does not belong to anyone.
249 The option restores v1-like behavior of pids.events:max, that is only
257 --------------------------------
263 A child cgroup can be created by creating a sub-directory::
268 structure. Each cgroup has a read-writable interface file
270 belong to the cgroup one-per-line. The PIDs are not ordered and the
301 0::/test-cgroup/test-cgroup-nested
308 0::/test-cgroup/test-cgroup-nested (deleted)
334 constraint - threaded controllers can be enabled on non-leaf cgroups
353 # echo threaded > cgroup.type
358 - As the cgroup will join the parent's resource domain. The parent
361 - When the parent is an unthreaded domain, it must not have any domain
365 Topology-wise, a cgroup can be in an invalid state. Please consider
368 A (threaded domain) - B (threaded) - C (domain, just created)
383 threads in the cgroup. Except that the operations are per-thread
384 instead of per-process, "cgroup.threads" has the same format and
406 between threads in a non-leaf cgroup and its child cgroups. Each
412 - cpu
413 - cpuset
414 - perf_event
415 - pids
418 --------------------------
420 Each non-root cgroup has a "cgroup.events" file which contains
421 "populated" field indicating whether the cgroup's sub-hierarchy has
425 example, to start a clean-up operation after all processes of a given
426 sub-hierarchy have exited. The populated state updates and
427 notifications are recursive. Consider the following sub-hierarchy
431 A(4) - B(0) - C(1)
441 -----------------------
464 # echo "+cpu +memory -io" > cgroup.subtree_control
473 Consider the following sub-hierarchy. The enabled controllers are
476 A(cpu,memory) - B(memory) - C()
490 controller interface files - anything which doesn't start with
494 Top-down Constraint
497 Resources are distributed top-down and a cgroup can further distribute
499 parent. This means that all non-root "cgroup.subtree_control" files
509 Non-root cgroups can distribute domain resources to their children
524 refer to the Non-normative information section in the Controllers
537 ----------
559 delegated, the user can build sub-hierarchy under the directory,
563 happens in the delegated sub-hierarchy, nothing can escape the
567 cgroups in or nesting depth of a delegated sub-hierarchy; however,
574 A delegated sub-hierarchy is contained in the sense that processes
575 can't be moved into or out of the sub-hierarchy by the delegatee.
578 requiring the following conditions for a process with a non-root euid
582 - The writer must have write access to the "cgroup.procs" file.
584 - The writer must have write access to the "cgroup.procs" file of the
588 processes around freely in the delegated sub-hierarchy it can't pull
589 in from or push out to outside the sub-hierarchy.
595 ~~~~~~~~~~~~~ - C0 - C00
598 ~~~~~~~~~~~~~ - C1 - C10
605 will be denied with -EACCES.
610 is not reachable, the migration is rejected with -ENOENT.
614 ----------
622 inherent trade-offs between migration and various hot paths in terms
628 resource structure once on start-up. Dynamic adjustments to resource
661 -------
664 active children and giving each the fraction matching the ratio of its
667 work-conserving. Due to the dynamic nature, this model is usually
678 "cpu.weight" proportionally distributes CPU cycles to active children
682 .. _cgroupv2-limits-distributor:
685 ------
688 Limits can be over-committed - the sum of the limits of children can
693 As limits can be over-committed, all configuration combinations are
700 .. _cgroupv2-protections-distributor:
703 -----------
708 soft boundaries. Protections can also be over-committed in which case
715 As protections can be over-committed, all configuration combinations
719 "memory.low" implements best-effort memory protection and is an
724 -----------
727 resource. Allocations can't be over-committed - the sum of the
734 As allocations can't be over-committed, some configuration
739 "cpu.rt.max" hard-allocates realtime slices and is an example of this
747 ------
752 New-line separated values
760 (when read-only or multiple values can be written at once)
786 -----------
788 - Settings for a single feature should be contained in a single file.
790 - The root cgroup should be exempt from resource control and thus
793 - The default time unit is microseconds. If a different unit is ever
796 - A parts-per quantity should use a percentage decimal with at least
797 two digit fractional part - e.g. 13.40.
799 - If a controller implements weight based resource distribution, its
805 - If a controller implements an absolute resource guarantee and/or
814 - If a setting has a configurable default value and keyed specific
828 # cat cgroup-example-interface-file
834 # echo 125 > cgroup-example-interface-file
838 # echo "default 125" > cgroup-example-interface-file
842 # echo "8:16 170" > cgroup-example-interface-file
846 # echo "8:0 default" > cgroup-example-interface-file
847 # cat cgroup-example-interface-file
851 - For events which are not very high frequency, an interface file
858 --------------------
863 A read-write single value file which exists on non-root
869 - "domain" : A normal valid domain cgroup.
871 - "domain threaded" : A threaded domain cgroup which is
874 - "domain invalid" : A cgroup which is in an invalid state.
878 - "threaded" : A threaded cgroup which is a member of a
885 A read-write new-line separated values file which exists on
889 the cgroup one-per-line. The PIDs are not ordered and the
898 - It must have write access to the "cgroup.procs" file.
900 - It must have write access to the "cgroup.procs" file of the
903 When delegating a sub-hierarchy, write access to this file
911 A read-write new-line separated values file which exists on
915 the cgroup one-per-line. The TIDs are not ordered and the
924 - It must have write access to the "cgroup.threads" file.
926 - The cgroup that the thread is currently in must be in the
929 - It must have write access to the "cgroup.procs" file of the
932 When delegating a sub-hierarchy, write access to this file
936 A read-only space separated values file which exists on all
943 A read-write space separated values file which exists on all
950 Space separated list of controllers prefixed with '+' or '-'
952 name prefixed with '+' enables the controller and '-'
958 A read-only flat-keyed file which exists on non-root cgroups.
970 A read-write single value files. The default is "max".
977 A read-write single value files. The default is "max".
984 A read-only flat-keyed file with the following entries:
999 limits, which were active at the moment of cgroup deletion.
1010 A read-only flat-keyed file which exists in non-root cgroups.
1028 A read-write single value file which exists on non-root cgroups.
1051 create new sub-cgroups.
1054 A write-only single value file which exists in non-root cgroups.
1066 the whole thread-group.
1069 A read-write single value file that allowed values are "0" and "1".
1073 Writing "1" to the file will re-enable the cgroup PSI accounting.
1081 This may cause non-negligible overhead for some workloads when under
1083 be used to disable PSI accounting in the non-leaf cgroups.
1086 A read-write nested-keyed file.
1094 .. _cgroup-v2-cpu:
1097 ---
1115 management software may already have placed RT processes into non-root cgroups
1134 * Processes under the fair-class scheduler
1139 For details on when a process is under the fair-class scheduler or a BPF scheduler,
1140 check out :ref:`Documentation/scheduler/sched-ext.rst <sched-ext>`.
1146 A read-only flat-keyed file.
1152 - usage_usec
1153 - user_usec
1154 - system_usec
1157 only the processes under the fair-class scheduler:
1159 - nr_periods
1160 - nr_throttled
1161 - throttled_usec
1162 - nr_bursts
1163 - burst_usec
1166 A read-write single value file which exists on non-root
1175 This file affects only processes under the fair-class scheduler and a BPF
1180 A read-write single value file which exists on non-root
1183 The nice value is in the range [-20, 19].
1191 This file affects only processes under the fair-class scheduler and a BPF
1196 A read-write two value file which exists on non-root cgroups.
1207 This file affects only processes under the fair-class scheduler.
1210 A read-write single value file which exists on non-root
1215 This file affects only processes under the fair-class scheduler.
1218 A read-write nested-keyed file.
1226 A read-write single value file which exists on non-root cgroups.
1244 A read-write single value file which exists on non-root cgroups.
1258 A read-write single value file which exists on non-root cgroups.
1261 This is the cgroup analog of the per-task SCHED_IDLE sched policy.
1267 This file affects only processes under the fair-class scheduler.
1270 ------
1278 While not completely water-tight, all major memory usages by a given
1283 - Userland memory - page cache and anonymous memory.
1285 - Kernel data structures such as dentries and inodes.
1287 - TCP socket buffers.
1300 A read-only single value file which exists on non-root
1307 A read-write single value file which exists on non-root
1333 A read-write single value file which exists on non-root
1336 Best-effort memory protection. If the memory usage of a
1356 A read-write single value file which exists on non-root
1379 busy-hitting its memory to slow down reclaim.
1382 A read-write single value file which exists on non-root
1391 In default configuration regular 0-order allocations always
1396 as -ENOMEM or silently ignore in cases like disk readahead.
1399 reclaim and oom-kill are bypassed. This is useful for admin
1402 The job will trigger the reclaim and/or oom-kill on its next
1408 busy-hitting its memory to slow down reclaim.
1411 A write-only nested-keyed file which exists for all cgroups.
1418 echo "1G" > memory.reclaim
1422 specified amount, -EAGAIN is returned.
1442 The valid range for swappiness is [0-200, max], setting
1446 A read-write single value file which exists on non-root cgroups.
1451 A write of any non-empty string to this file resets it to the
1456 A read-write single value file which exists on non-root
1466 Tasks with the OOM protection (oom_score_adj set to -1000)
1474 A read-only flat-keyed file which exists on non-root cgroups.
1488 boundary is over-committed.
1508 considered as an option, e.g. for failed high-order
1524 A read-only flat-keyed file which exists on non-root cgroups.
1527 types of memory, type-specific details, and other information
1536 If the entry has no per-node counter (or not show in the
1537 memory.numa_stat). We use 'npn' (non-per-node) as the tag
1568 Amount of memory used for storing per-cpu kernel
1578 Amount of cached filesystem data that is swap-backed,
1618 Amount of memory, swap-backed and filesystem-backed,
1624 the value for the foo counter, since the foo counter is type-based, not
1625 list-based.
1636 Amount of memory used for storing in-kernel data
1654 an active workingset before they got reclaimed.
1658 active workingset before they got reclaimed.
1706 Amount of scanned pages (in an active LRU list)
1709 Amount of pages moved to the active LRU list
1726 Number of zero-filled pages swapped out with I/O skipped due to the
1785 A read-only nested-keyed file which exists on non-root cgroups.
1788 types of memory, type-specific details, and other information
1810 A read-only single value file which exists on non-root
1817 A read-write single value file which exists on non-root
1822 allow userspace to implement custom out-of-memory procedures.
1833 A read-write single value file which exists on non-root cgroups.
1838 A write of any non-empty string to this file resets it to the
1843 A read-write single value file which exists on non-root
1850 A read-only flat-keyed file which exists on non-root cgroups.
1866 because of running out of swap system-wide or max
1875 A read-only single value file which exists on non-root
1882 A read-write single value file which exists on non-root
1890 A read-write single value file. The default value is "1".
1908 A read-only nested-keyed file.
1918 Over-committing on high limit (sum of high limits > available memory)
1932 pressure - how much the workload is being impacted due to lack of
1933 memory - is necessary to determine whether a workload needs more
1947 To which cgroup the area will be charged is in-deterministic; however,
1958 --
1963 only if cfq-iosched is in use and neither scheme is available for
1964 blk-mq devices.
1971 A read-only nested-keyed file.
1991 A read-write nested-keyed file which exists only on the root
2003 enable Weight-based control enable
2025 latencies is above 75ms or write 150ms, and adjust the overall
2035 devices which show wide temporary behavior changes - e.g. a
2046 A read-write nested-keyed file which exists only on the root
2059 model The cost model in use - "linear"
2085 generate device-specific coefficients.
2088 A read-write flat-keyed file which exists on non-root cgroups.
2108 A read-write nested-keyed file which exists on non-root
2122 When writing, any number of nested key-value pairs can be
2132 echo "8:16 rbps=2097152 wiops=120" > io.max
2140 echo "8:16 wiops=max" > io.max
2147 A read-only nested-keyed file.
2166 writes out dirty pages for the memory domain. Both system-wide and
2167 per-cgroup dirty memory states are examined and the more restrictive
2205 memory controller and system-wide clean memory.
2238 your real setting, setting at 10-15% higher than the value in io.stat.
2248 - Queue depth throttling. This is the number of outstanding IO's a group is
2252 - Artificial delay induction. There are certain types of IO that cannot be
2299 no-change
2302 promote-to-rt
2303 For requests that have a non-RT I/O priority class, change it into RT.
2307 restrict-to-be
2317 none-to-rt
2318 Deprecated. Just an alias for promote-to-rt.
2322 +----------------+---+
2323 | no-change | 0 |
2324 +----------------+---+
2325 | promote-to-rt | 1 |
2326 +----------------+---+
2327 | restrict-to-be | 2 |
2328 +----------------+---+
2330 +----------------+---+
2334 +-------------------------------+---+
2336 +-------------------------------+---+
2337 | IOPRIO_CLASS_RT (real-time) | 1 |
2338 +-------------------------------+---+
2340 +-------------------------------+---+
2342 +-------------------------------+---+
2346 - If I/O priority class policy is promote-to-rt, change the request I/O
2349 - If I/O priority class policy is not promote-to-rt, translate the I/O priority
2355 ---
2374 A read-write single value file which exists on non-root
2380 A read-only single value file which exists on non-root cgroups.
2386 A read-only single value file which exists on non-root cgroups.
2392 A read-only flat-keyed file which exists on non-root cgroups. Unless
2410 through fork() or clone(). These will return -EAGAIN if the creation
2415 ------
2422 memory placement to reduce cross-node memory access and contention
2433 A read-write multiple values file which exists on non-root
2434 cpuset-enabled cgroups.
2441 The CPU numbers are comma-separated numbers or ranges.
2445 0-4,6,8-10
2448 setting as the nearest cgroup ancestor with a non-empty
2455 A read-only multiple values file which exists on all
2456 cpuset-enabled cgroups.
2472 A read-write multiple values file which exists on non-root
2473 cpuset-enabled cgroups.
2480 The memory node numbers are comma-separated numbers or ranges.
2484 0-1,3
2487 setting as the nearest cgroup ancestor with a non-empty
2494 Setting a non-empty value to "cpuset.mems" causes memory of
2502 a need to change "cpuset.mems" with active tasks, it shouldn't
2506 A read-only multiple values file which exists on all
2507 cpuset-enabled cgroups.
2522 A read-write multiple values file which exists on non-root
2523 cpuset-enabled cgroups.
2556 A read-only multiple values file which exists on all non-root
2557 cpuset-enabled cgroups.
2569 A read-only and root cgroup only multiple values file.
2576 A read-write single value file which exists on non-root
2577 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2583 "member" Non-root member of a partition
2588 A cpuset partition is a collection of cpuset-enabled cgroups with
2595 There are two types of partitions - local and remote. A local
2611 be changed. All other non-root cgroups start out as "member".
2624 two possible states - valid or invalid. An invalid partition
2635 "member" Non-root member of a partition
2662 A valid non-root parent partition may distribute out all its CPUs
2681 A user can pre-configure certain CPUs to an isolated state
2688 -----------------
2699 on the return value the attempt will succeed or fail with -EPERM.
2704 If the program returns 0, the attempt fails with -EPERM, otherwise it
2712 ----
2721 A readwrite nested-keyed file that exists for all the cgroups
2742 A read-only file that describes current resource usage.
2751 ----
2761 A readwrite nested-keyed file that exists for all the cgroups
2774 A read-only file that describes maximum region capacity.
2785 A read-only file that describes current resource usage.
2794 -------
2811 A read-only flat-keyed file which exists on non-root cgroups.
2823 hugetlb pages of <hugepagesize> in this cgroup. Only active in
2824 use hugetlb pages are included. The per-node values are in bytes.
2827 ----
2849 A read-only flat-keyed file shown only in the root cgroup. It shows
2858 A read-only flat-keyed file shown in the all cgroups. It shows
2866 A read-only flat-keyed file shown in all cgroups. It shows the
2875 A read-write flat-keyed file shown in the non root cgroups. Allowed
2884 # echo res_a 1 > misc.max
2888 # echo res_a max > misc.max
2894 A read-only flat-keyed file which exists on non-root cgroups. The
2917 ------
2928 Non-normative information
2929 -------------------------
2945 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2961 ------
2980 The path '/batchjobs/container_id1' can be considered as system-data
2985 # ls -l /proc/self/ns/cgroup
2986 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2992 # ls -l /proc/self/ns/cgroup
2993 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2997 When some thread from a multi-threaded process unshares its cgroup
3009 ------------------
3020 # ~/unshare -c # unshare cgroupns in some cgroup
3024 # echo 0 > sub_cgrp_1/cgroup.procs
3028 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
3036 # echo 7353 > sub_cgrp_1/cgroup.procs
3059 ----------------------
3069 # echo 7353 > batchjobs/container_id2/cgroup.procs
3088 ---------------------------------
3091 running inside a non-init cgroup namespace::
3093 # mount -t cgroup2 none $MOUNT_POINT
3100 the view of cgroup hierarchy by namespace-private cgroupfs mount
3113 --------------------------------
3116 address_space_operations->writepages() to annotate bio's using the
3133 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
3150 - Multiple hierarchies including named ones are not supported.
3152 - All v1 mount options are not supported.
3154 - The "tasks" file is removed and "cgroup.procs" is not sorted.
3156 - "cgroup.clone_children" is removed.
3158 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or
3166 --------------------
3219 ------------------
3227 Generally, in-process knowledge is available only to the process
3228 itself; thus, unlike service-level organization of processes,
3229 categorizing threads of a process requires active participation from
3235 sub-hierarchies and control resource distributions along them. This
3236 effectively raised cgroup to the status of a syscall-like API exposed
3246 that the process would actually be operating on its own sub-hierarchy.
3250 system-management pseudo filesystem. cgroup ended up with interface
3253 individual applications through the ill-defined delegation mechanism
3263 -------------------------------------------
3274 cycles and the number of internal threads fluctuated - the ratios
3290 clearly defined. There were attempts to add ad-hoc behaviors and
3304 ----------------------
3308 was how an empty cgroup was notified - a userland helper binary was
3311 to in-kernel event delivery filtering mechanism further complicating
3333 ------------------------------
3340 global reclaim prefers is opt-in, rather than opt-out. The costs for
3350 becomes self-defeating.
3352 The memory.low boundary on the other hand is a top-down allocated
3390 new limit is met - or the task writing to memory.max is killed.
3399 groups can sabotage swapping by other means - such as referencing its
3400 anonymous memory in a tight loop - and an admin can not assume full