xref: /linux/Documentation/mm/overcommit-accounting.rst (revision ee65728e103bb7dd99d8604bf6c7aa89c7d7e446)
1*ee65728eSMike Rapoport.. _overcommit_accounting:
2*ee65728eSMike Rapoport
3*ee65728eSMike Rapoport=====================
4*ee65728eSMike RapoportOvercommit Accounting
5*ee65728eSMike Rapoport=====================
6*ee65728eSMike Rapoport
7*ee65728eSMike RapoportThe Linux kernel supports the following overcommit handling modes
8*ee65728eSMike Rapoport
9*ee65728eSMike Rapoport0
10*ee65728eSMike Rapoport	Heuristic overcommit handling. Obvious overcommits of address
11*ee65728eSMike Rapoport	space are refused. Used for a typical system. It ensures a
12*ee65728eSMike Rapoport	seriously wild allocation fails while allowing overcommit to
13*ee65728eSMike Rapoport	reduce swap usage.  root is allowed to allocate slightly more
14*ee65728eSMike Rapoport	memory in this mode. This is the default.
15*ee65728eSMike Rapoport
16*ee65728eSMike Rapoport1
17*ee65728eSMike Rapoport	Always overcommit. Appropriate for some scientific
18*ee65728eSMike Rapoport	applications. Classic example is code using sparse arrays and
19*ee65728eSMike Rapoport	just relying on the virtual memory consisting almost entirely
20*ee65728eSMike Rapoport	of zero pages.
21*ee65728eSMike Rapoport
22*ee65728eSMike Rapoport2
23*ee65728eSMike Rapoport	Don't overcommit. The total address space commit for the
24*ee65728eSMike Rapoport	system is not permitted to exceed swap + a configurable amount
25*ee65728eSMike Rapoport	(default is 50%) of physical RAM.  Depending on the amount you
26*ee65728eSMike Rapoport	use, in most situations this means a process will not be
27*ee65728eSMike Rapoport	killed while accessing pages but will receive errors on memory
28*ee65728eSMike Rapoport	allocation as appropriate.
29*ee65728eSMike Rapoport
30*ee65728eSMike Rapoport	Useful for applications that want to guarantee their memory
31*ee65728eSMike Rapoport	allocations will be available in the future without having to
32*ee65728eSMike Rapoport	initialize every page.
33*ee65728eSMike Rapoport
34*ee65728eSMike RapoportThe overcommit policy is set via the sysctl ``vm.overcommit_memory``.
35*ee65728eSMike Rapoport
36*ee65728eSMike RapoportThe overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
37*ee65728eSMike Rapoportor ``vm.overcommit_kbytes`` (absolute value). These only have an effect
38*ee65728eSMike Rapoportwhen ``vm.overcommit_memory`` is set to 2.
39*ee65728eSMike Rapoport
40*ee65728eSMike RapoportThe current overcommit limit and amount committed are viewable in
41*ee65728eSMike Rapoport``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
42*ee65728eSMike Rapoport
43*ee65728eSMike RapoportGotchas
44*ee65728eSMike Rapoport=======
45*ee65728eSMike Rapoport
46*ee65728eSMike RapoportThe C language stack growth does an implicit mremap. If you want absolute
47*ee65728eSMike Rapoportguarantees and run close to the edge you MUST mmap your stack for the
48*ee65728eSMike Rapoportlargest size you think you will need. For typical stack usage this does
49*ee65728eSMike Rapoportnot matter much but it's a corner case if you really really care
50*ee65728eSMike Rapoport
51*ee65728eSMike RapoportIn mode 2 the MAP_NORESERVE flag is ignored.
52*ee65728eSMike Rapoport
53*ee65728eSMike Rapoport
54*ee65728eSMike RapoportHow It Works
55*ee65728eSMike Rapoport============
56*ee65728eSMike Rapoport
57*ee65728eSMike RapoportThe overcommit is based on the following rules
58*ee65728eSMike Rapoport
59*ee65728eSMike RapoportFor a file backed map
60*ee65728eSMike Rapoport	| SHARED or READ-only	-	0 cost (the file is the map not swap)
61*ee65728eSMike Rapoport	| PRIVATE WRITABLE	-	size of mapping per instance
62*ee65728eSMike Rapoport
63*ee65728eSMike RapoportFor an anonymous or ``/dev/zero`` map
64*ee65728eSMike Rapoport	| SHARED			-	size of mapping
65*ee65728eSMike Rapoport	| PRIVATE READ-only	-	0 cost (but of little use)
66*ee65728eSMike Rapoport	| PRIVATE WRITABLE	-	size of mapping per instance
67*ee65728eSMike Rapoport
68*ee65728eSMike RapoportAdditional accounting
69*ee65728eSMike Rapoport	| Pages made writable copies by mmap
70*ee65728eSMike Rapoport	| shmfs memory drawn from the same pool
71*ee65728eSMike Rapoport
72*ee65728eSMike RapoportStatus
73*ee65728eSMike Rapoport======
74*ee65728eSMike Rapoport
75*ee65728eSMike Rapoport*	We account mmap memory mappings
76*ee65728eSMike Rapoport*	We account mprotect changes in commit
77*ee65728eSMike Rapoport*	We account mremap changes in size
78*ee65728eSMike Rapoport*	We account brk
79*ee65728eSMike Rapoport*	We account munmap
80*ee65728eSMike Rapoport*	We report the commit status in /proc
81*ee65728eSMike Rapoport*	Account and check on fork
82*ee65728eSMike Rapoport*	Review stack handling/building on exec
83*ee65728eSMike Rapoport*	SHMfs accounting
84*ee65728eSMike Rapoport*	Implement actual limit enforcement
85*ee65728eSMike Rapoport
86*ee65728eSMike RapoportTo Do
87*ee65728eSMike Rapoport=====
88*ee65728eSMike Rapoport*	Account ptrace pages (this is hard)
89