1.. SPDX-License-Identifier: GPL-2.0 2 3==================== 4The /proc Filesystem 5==================== 6 7===================== ======================================= ================ 8/proc/sys Terrehon Bowden <terrehon@pacbell.net>, October 7 1999 9 Bodo Bauer <bb@ricochet.net> 102.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000 11move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009 12fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 13===================== ======================================= ================ 14 15 16 17.. Table of Contents 18 19 0 Preface 20 0.1 Introduction/Credits 21 0.2 Legal Stuff 22 23 1 Collecting System Information 24 1.1 Process-Specific Subdirectories 25 1.2 Kernel data 26 1.3 Networking info in /proc/net 27 1.4 SCSI info 28 1.5 Parallel port info in /proc/parport 29 1.6 TTY info in /proc/tty 30 1.7 Miscellaneous kernel statistics in /proc/stat 31 1.8 Ext4 file system parameters 32 1.9 /proc/consoles - Shows registered system consoles 33 34 2 Modifying System Parameters 35 36 3 Per-Process Parameters 37 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer 38 score 39 3.2 /proc/<pid>/oom_score - Display current oom-killer score 40 3.3 /proc/<pid>/io - Display the IO accounting fields 41 3.4 /proc/<pid>/coredump_filter - Core dump filtering settings 42 3.5 /proc/<pid>/mountinfo - Information about mounts 43 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 44 3.7 /proc/<pid>/task/<tid>/children - Information about task children 45 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file 46 3.9 /proc/<pid>/map_files - Information about memory mapped files 47 3.10 /proc/<pid>/timerslack_ns - Task timerslack value 48 3.11 /proc/<pid>/patch_state - Livepatch patch operation state 49 3.12 /proc/<pid>/arch_status - Task architecture specific information 50 3.13 /proc/<pid>/fd - List of symlinks to open files 51 3.14 /proc/<pid>/ksm_stat - Information about the process's ksm status. 52 53 4 Configuring procfs 54 4.1 Mount options 55 4.2 Mount restrictions 56 57 5 Filesystem behavior 58 59Preface 60======= 61 620.1 Introduction/Credits 63------------------------ 64 65We'd like to thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of 66other people for help compiling this documentation. We'd also like to extend a 67special thank you to Andi Kleen for documentation, which we relied on heavily 68to create this document, as well as the additional information he provided. 69Thanks to everybody else who contributed source or docs to the Linux kernel 70and helped create a great piece of software... :) 71 72The latest version of this document is available online at 73https://www.kernel.org/doc/html/latest/filesystems/proc.html 74 750.2 Legal Stuff 76--------------- 77 78We don't guarantee the correctness of this document, and if you come to us 79complaining about how you screwed up your system because of incorrect 80documentation, we won't feel responsible... 81 82Chapter 1: Collecting System Information 83======================================== 84 85In This Chapter 86--------------- 87* Investigating the properties of the pseudo file system /proc and its 88 ability to provide information on the running Linux system 89* Examining /proc's structure 90* Uncovering various information about the kernel and the processes running 91 on the system 92 93------------------------------------------------------------------------------ 94 95The proc file system acts as an interface to internal data structures in the 96kernel. It can be used to obtain information about the system and to change 97certain kernel parameters at runtime (sysctl). 98 99First, we'll take a look at the read-only parts of /proc. In Chapter 2, we 100show you how you can use /proc/sys to change settings. 101 1021.1 Process-Specific Subdirectories 103----------------------------------- 104 105The directory /proc contains (among other things) one subdirectory for each 106process running on the system, which is named after the process ID (PID). 107 108The link 'self' points to the process reading the file system. Each process 109subdirectory has the entries listed in Table 1-1. 110 111A process can read its own information from /proc/PID/* with no extra 112permissions. When reading /proc/PID/* information for other processes, reading 113process is required to have either CAP_SYS_PTRACE capability with 114PTRACE_MODE_READ access permissions, or, alternatively, CAP_PERFMON 115capability. This applies to all read-only information like `maps`, `environ`, 116`pagemap`, etc. The only exception is `mem` file due to its read-write nature, 117which requires CAP_SYS_PTRACE capabilities with more elevated 118PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access 119to /proc/PID/mem for other processes. 120 121Note that an open file descriptor to /proc/<pid> or to any of its 122contained files or subdirectories does not prevent <pid> from being reused 123for some other process in the event that <pid> exits. Operations on 124open /proc/<pid> file descriptors corresponding to dead processes 125never act on any new process that the kernel may, through chance, have 126also assigned the process ID <pid>. Instead, operations on these FDs 127usually fail with ESRCH. 128 129.. table:: Table 1-1: Process specific entries in /proc 130 131 ============= =============================================================== 132 File Content 133 ============= =============================================================== 134 clear_refs Clears page referenced bits shown in smaps output 135 cmdline Command line arguments 136 cpu Current and last cpu in which it was executed (2.4)(smp) 137 cwd Link to the current working directory 138 environ Values of environment variables 139 exe Link to the executable of this process 140 fd Directory, which contains all file descriptors 141 maps Memory maps to executables and library files (2.4) 142 mem Memory held by this process 143 root Link to the root directory of this process 144 stat Process status 145 statm Process memory status information 146 status Process status in human readable form 147 wchan Present with CONFIG_KALLSYMS=y: it shows the kernel function 148 symbol the task is blocked in - or "0" if not blocked. 149 pagemap Page table 150 stack Report full stack trace, enable via CONFIG_STACKTRACE 151 smaps An extension based on maps, showing the memory consumption of 152 each mapping and flags associated with it 153 smaps_rollup Accumulated smaps stats for all mappings of the process. This 154 can be derived from smaps, but is faster and more convenient 155 numa_maps An extension based on maps, showing the memory locality and 156 binding policy as well as mem usage (in pages) of each mapping. 157 ============= =============================================================== 158 159For example, to get the status information of a process, all you have to do is 160read the file /proc/PID/status:: 161 162 >cat /proc/self/status 163 Name: cat 164 State: R (running) 165 Tgid: 5452 166 Pid: 5452 167 PPid: 743 168 TracerPid: 0 (2.4) 169 Uid: 501 501 501 501 170 Gid: 100 100 100 100 171 FDSize: 256 172 Groups: 100 14 16 173 Kthread: 0 174 VmPeak: 5004 kB 175 VmSize: 5004 kB 176 VmLck: 0 kB 177 VmHWM: 476 kB 178 VmRSS: 476 kB 179 RssAnon: 352 kB 180 RssFile: 120 kB 181 RssShmem: 4 kB 182 VmData: 156 kB 183 VmStk: 88 kB 184 VmExe: 68 kB 185 VmLib: 1412 kB 186 VmPTE: 20 kb 187 VmSwap: 0 kB 188 HugetlbPages: 0 kB 189 CoreDumping: 0 190 THP_enabled: 1 191 Threads: 1 192 SigQ: 0/28578 193 SigPnd: 0000000000000000 194 ShdPnd: 0000000000000000 195 SigBlk: 0000000000000000 196 SigIgn: 0000000000000000 197 SigCgt: 0000000000000000 198 CapInh: 00000000fffffeff 199 CapPrm: 0000000000000000 200 CapEff: 0000000000000000 201 CapBnd: ffffffffffffffff 202 CapAmb: 0000000000000000 203 NoNewPrivs: 0 204 Seccomp: 0 205 Speculation_Store_Bypass: thread vulnerable 206 SpeculationIndirectBranch: conditional enabled 207 voluntary_ctxt_switches: 0 208 nonvoluntary_ctxt_switches: 1 209 210This shows you nearly the same information you would get if you viewed it with 211the ps command. In fact, ps uses the proc file system to obtain its 212information. But you get a more detailed view of the process by reading the 213file /proc/PID/status. It fields are described in table 1-2. 214 215The statm file contains more detailed information about the process 216memory usage. Its seven fields are explained in Table 1-3. The stat file 217contains detailed information about the process itself. Its fields are 218explained in Table 1-4. 219 220(for SMP CONFIG users) 221 222For making accounting scalable, RSS related information are handled in an 223asynchronous manner and the value may not be very precise. To see a precise 224snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table. 225It's slow but very precise. 226 227.. table:: Table 1-2: Contents of the status fields (as of 4.19) 228 229 ========================== =================================================== 230 Field Content 231 ========================== =================================================== 232 Name filename of the executable 233 Umask file mode creation mask 234 State state (R is running, S is sleeping, D is sleeping 235 in an uninterruptible wait, Z is zombie, 236 T is traced or stopped) 237 Tgid thread group ID 238 Ngid NUMA group ID (0 if none) 239 Pid process id 240 PPid process id of the parent process 241 TracerPid PID of process tracing this process (0 if not, or 242 the tracer is outside of the current pid namespace) 243 Uid Real, effective, saved set, and file system UIDs 244 Gid Real, effective, saved set, and file system GIDs 245 FDSize number of file descriptor slots currently allocated 246 Groups supplementary group list 247 NStgid descendant namespace thread group ID hierarchy 248 NSpid descendant namespace process ID hierarchy 249 NSpgid descendant namespace process group ID hierarchy 250 NSsid descendant namespace session ID hierarchy 251 Kthread kernel thread flag, 1 is yes, 0 is no 252 VmPeak peak virtual memory size 253 VmSize total program size 254 VmLck locked memory size 255 VmPin pinned memory size 256 VmHWM peak resident set size ("high water mark") 257 VmRSS size of memory portions. It contains the three 258 following parts 259 (VmRSS = RssAnon + RssFile + RssShmem) 260 RssAnon size of resident anonymous memory 261 RssFile size of resident file mappings 262 RssShmem size of resident shmem memory (includes SysV shm, 263 mapping of tmpfs and shared anonymous mappings) 264 VmData size of private data segments 265 VmStk size of stack segments 266 VmExe size of text segment 267 VmLib size of shared library code 268 VmPTE size of page table entries 269 VmSwap amount of swap used by anonymous private data 270 (shmem swap usage is not included) 271 HugetlbPages size of hugetlb memory portions 272 CoreDumping process's memory is currently being dumped 273 (killing the process may lead to a corrupted core) 274 THP_enabled process is allowed to use THP (returns 0 when 275 PR_SET_THP_DISABLE is set on the process to disable 276 THP completely, not just partially) 277 Threads number of threads 278 SigQ number of signals queued/max. number for queue 279 SigPnd bitmap of pending signals for the thread 280 ShdPnd bitmap of shared pending signals for the process 281 SigBlk bitmap of blocked signals 282 SigIgn bitmap of ignored signals 283 SigCgt bitmap of caught signals 284 CapInh bitmap of inheritable capabilities 285 CapPrm bitmap of permitted capabilities 286 CapEff bitmap of effective capabilities 287 CapBnd bitmap of capabilities bounding set 288 CapAmb bitmap of ambient capabilities 289 NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...) 290 Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...) 291 Speculation_Store_Bypass speculative store bypass mitigation status 292 SpeculationIndirectBranch indirect branch speculation mode 293 Cpus_allowed mask of CPUs on which this process may run 294 Cpus_allowed_list Same as previous, but in "list format" 295 Mems_allowed mask of memory nodes allowed to this process 296 Mems_allowed_list Same as previous, but in "list format" 297 voluntary_ctxt_switches number of voluntary context switches 298 nonvoluntary_ctxt_switches number of non voluntary context switches 299 ========================== =================================================== 300 301 302.. table:: Table 1-3: Contents of the statm fields (as of 2.6.8-rc3) 303 304 ======== =============================== ============================== 305 Field Content 306 ======== =============================== ============================== 307 size total program size (pages) (same as VmSize in status) 308 resident size of memory portions (pages) (same as VmRSS in status) 309 shared number of pages that are shared (i.e. backed by a file, same 310 as RssFile+RssShmem in status) 311 trs number of pages that are 'code' (not including libs; broken, 312 includes data segment) 313 lrs number of pages of library (always 0 on 2.6) 314 drs number of pages of data/stack (including libs; broken, 315 includes library text) 316 dt number of dirty pages (always 0 on 2.6) 317 ======== =============================== ============================== 318 319 320.. table:: Table 1-4: Contents of the stat fields (as of 2.6.30-rc7) 321 322 ============= =============================================================== 323 Field Content 324 ============= =============================================================== 325 pid process id 326 tcomm filename of the executable 327 state state (R is running, S is sleeping, D is sleeping in an 328 uninterruptible wait, Z is zombie, T is traced or stopped) 329 ppid process id of the parent process 330 pgrp pgrp of the process 331 sid session id 332 tty_nr tty the process uses 333 tty_pgrp pgrp of the tty 334 flags task flags 335 min_flt number of minor faults 336 cmin_flt number of minor faults with child's 337 maj_flt number of major faults 338 cmaj_flt number of major faults with child's 339 utime user mode jiffies 340 stime kernel mode jiffies 341 cutime user mode jiffies with child's 342 cstime kernel mode jiffies with child's 343 priority priority level 344 nice nice level 345 num_threads number of threads 346 it_real_value (obsolete, always 0) 347 start_time time the process started after system boot 348 vsize virtual memory size 349 rss resident set memory size 350 rsslim current limit in bytes on the rss 351 start_code address above which program text can run 352 end_code address below which program text can run 353 start_stack address of the start of the main process stack 354 esp current value of ESP 355 eip current value of EIP 356 pending bitmap of pending signals 357 blocked bitmap of blocked signals 358 sigign bitmap of ignored signals 359 sigcatch bitmap of caught signals 360 0 (place holder, used to be the wchan address, 361 use /proc/PID/wchan instead) 362 0 (place holder) 363 0 (place holder) 364 exit_signal signal to send to parent thread on exit 365 task_cpu which CPU the task is scheduled on 366 rt_priority realtime priority 367 policy scheduling policy (man sched_setscheduler) 368 blkio_ticks time spent waiting for block IO 369 gtime guest time of the task in jiffies 370 cgtime guest time of the task children in jiffies 371 start_data address above which program data+bss is placed 372 end_data address below which program data+bss is placed 373 start_brk address above which program heap can be expanded with brk() 374 arg_start address above which program command line is placed 375 arg_end address below which program command line is placed 376 env_start address above which program environment is placed 377 env_end address below which program environment is placed 378 exit_code the thread's exit_code in the form reported by the waitpid 379 system call 380 ============= =============================================================== 381 382The /proc/PID/maps file contains the currently mapped memory regions and 383their access permissions. 384 385The format is:: 386 387 address perms offset dev inode pathname 388 389 08048000-08049000 r-xp 00000000 03:00 8312 /opt/test 390 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test 391 0804a000-0806b000 rw-p 00000000 00:00 0 [heap] 392 a7cb1000-a7cb2000 ---p 00000000 00:00 0 393 a7cb2000-a7eb2000 rw-p 00000000 00:00 0 394 a7eb2000-a7eb3000 ---p 00000000 00:00 0 395 a7eb3000-a7ed5000 rw-p 00000000 00:00 0 396 a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 397 a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 398 a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 399 a800b000-a800e000 rw-p 00000000 00:00 0 400 a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0 401 a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0 402 a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0 403 a8024000-a8027000 rw-p 00000000 00:00 0 404 a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 405 a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 406 a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 407 aff35000-aff4a000 rw-p 00000000 00:00 0 [stack] 408 ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] 409 410where "address" is the address space in the process that it occupies, "perms" 411is a set of permissions:: 412 413 r = read 414 w = write 415 x = execute 416 s = shared 417 p = private (copy on write) 418 419"offset" is the offset into the mapping, "dev" is the device (major:minor), and 420"inode" is the inode on that device. 0 indicates that no inode is associated 421with the memory region, as the case would be with BSS (uninitialized data). 422The "pathname" shows the name associated file for this mapping. If the mapping 423is not associated with a file: 424 425 =================== =========================================== 426 [heap] the heap of the program 427 [stack] the stack of the main process 428 [vdso] the "virtual dynamic shared object", 429 the kernel system call handler 430 [anon:<name>] a private anonymous mapping that has been 431 named by userspace 432 [anon_shmem:<name>] an anonymous shared memory mapping that has 433 been named by userspace 434 =================== =========================================== 435 436 or if empty, the mapping is anonymous. 437 438Starting with 6.11 kernel, /proc/PID/maps provides an alternative 439ioctl()-based API that gives ability to flexibly and efficiently query and 440filter individual VMAs. This interface is binary and is meant for more 441efficient and easy programmatic use. `struct procmap_query`, defined in 442linux/fs.h UAPI header, serves as an input/output argument to the 443`PROCMAP_QUERY` ioctl() command. See comments in linus/fs.h UAPI header for 444details on query semantics, supported flags, data returned, and general API 445usage information. 446 447The /proc/PID/smaps is an extension based on maps, showing the memory 448consumption for each of the process's mappings. For each mapping (aka Virtual 449Memory Area, or VMA) there is a series of lines such as the following:: 450 451 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash 452 453 Size: 1084 kB 454 KernelPageSize: 4 kB 455 MMUPageSize: 4 kB 456 Rss: 892 kB 457 Pss: 374 kB 458 Pss_Dirty: 0 kB 459 Shared_Clean: 892 kB 460 Shared_Dirty: 0 kB 461 Private_Clean: 0 kB 462 Private_Dirty: 0 kB 463 Referenced: 892 kB 464 Anonymous: 0 kB 465 KSM: 0 kB 466 LazyFree: 0 kB 467 AnonHugePages: 0 kB 468 FilePmdMapped: 0 kB 469 ShmemPmdMapped: 0 kB 470 Shared_Hugetlb: 0 kB 471 Private_Hugetlb: 0 kB 472 Swap: 0 kB 473 SwapPss: 0 kB 474 Locked: 0 kB 475 THPeligible: 0 476 VmFlags: rd ex mr mw me dw 477 478The first of these lines shows the same information as is displayed for 479the mapping in /proc/PID/maps. Following lines show the size of the 480mapping (size); the smallest possible page size allocated when backing a 481VMA (KernelPageSize), which is the granularity in which VMA modifications 482can be performed; the smallest possible page size that could be used by the 483MMU (MMUPageSize) when backing a VMA; the amount of the mapping that is 484currently resident in RAM (RSS); the process's proportional share of this 485mapping (PSS); and the number of clean and dirty shared and private pages 486in the mapping. 487 488"KernelPageSize" always corresponds to "MMUPageSize", except when a larger 489kernel page size is emulated on a system with a smaller page size used by the 490MMU, which is the case for some PPC64 setups with hugetlb. Furthermore, 491"KernelPageSize" and "MMUPageSize" always correspond to the smallest 492possible granularity (fallback) that can be encountered in a VMA throughout 493its lifetime. These values are not affected by Transparent Huge Pages 494being in effect, or any usage of larger MMU page sizes (either through 495architectural huge-page mappings or other explicit/implicit coalescing of 496virtual ranges performed by the MMU). "AnonHugePages", "ShmemPmdMapped" and 497"FilePmdMapped" provide insight into the usage of PMD-level architectural 498huge-page mappings. 499 500The "proportional set size" (PSS) of a process is the count of pages it has 501in memory, where each page is divided by the number of processes sharing it. 502So if a process has 1000 pages all to itself, and 1000 shared with one other 503process, its PSS will be 1500. "Pss_Dirty" is the portion of PSS which 504consists of dirty pages. ("Pss_Clean" is not included, but it can be 505calculated by subtracting "Pss_Dirty" from "Pss".) 506 507Traditionally, a page is accounted as "private" if it is mapped exactly once, 508and a page is accounted as "shared" when mapped multiple times, even when 509mapped in the same process multiple times. Note that this accounting is 510independent of MAP_SHARED. 511 512In some kernel configurations, the semantics of pages part of a larger 513allocation (e.g., THP) can differ: a page is accounted as "private" if all 514pages part of the corresponding large allocation are *certainly* mapped in the 515same process, even if the page is mapped multiple times in that process. A 516page is accounted as "shared" if any page page of the larger allocation 517is *maybe* mapped in a different process. In some cases, a large allocation 518might be treated as "maybe mapped by multiple processes" even though this 519is no longer the case. 520 521Some kernel configurations do not track the precise number of times a page part 522of a larger allocation is mapped. In this case, when calculating the PSS, the 523average number of mappings per page in this larger allocation might be used 524as an approximation for the number of mappings of a page. The PSS calculation 525will be imprecise in this case. 526 527"Referenced" indicates the amount of memory currently marked as referenced or 528accessed. 529 530"Anonymous" shows the amount of memory that does not belong to any file. Even 531a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE 532and a page is modified, the file page is replaced by a private anonymous copy. 533 534"KSM" reports how many of the pages are KSM pages. Note that KSM-placed zeropages 535are not included, only actual KSM pages. 536 537"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE). 538The memory isn't freed immediately with madvise(). It's freed in memory 539pressure if the memory is clean. Please note that the printed value might 540be lower than the real value due to optimizations used in the current 541implementation. If this is not desirable please file a bug report. 542 543"AnonHugePages", "ShmemPmdMapped" and "FilePmdMapped" show the amount of 544memory backed by Transparent Huge Pages that are currently mapped by 545architectural huge-page mappings at the PMD level. "AnonHugePages" 546corresponds to memory that does not belong to a file, "ShmemPmdMapped" to 547shared memory (shmem/tmpfs) and "FilePmdMapped" to file-backed memory 548(excluding shmem/tmpfs). 549 550There are no dedicated entries for Transparent Huge Pages (or similar concepts) 551that are not mapped by architectural huge-page mappings at the PMD level. 552 553"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by 554hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical 555reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. 556 557"Swap" shows how much would-be-anonymous memory is also used, but out on swap. 558 559For shmem mappings, "Swap" includes also the size of the mapped (and not 560replaced by copy-on-write) part of the underlying shmem object out on swap. 561"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this 562does not take into account swapped out page of underlying shmem objects. 563"Locked" indicates whether the mapping is locked in memory or not. 564 565"THPeligible" indicates whether the mapping is eligible for allocating 566naturally aligned THP pages of any currently enabled size. 1 if true, 0 567otherwise. 568 569If both the kernel and the CPU support protection keys (pkeys), 570"ProtectionKey" indicates the memory protection key associated with the 571virtual memory area. 572 573"VmFlags" field deserves a separate description. This member represents the 574kernel flags associated with the particular virtual memory area in two letter 575encoded manner. The codes are the following: 576 577 == ============================================================= 578 rd readable 579 wr writeable 580 ex executable 581 sh shared 582 mr may read 583 mw may write 584 me may execute 585 ms may share 586 gd stack segment growns down 587 pf pure PFN range 588 lo pages are locked in memory 589 io memory mapped I/O area 590 sr sequential read advise provided 591 rr random read advise provided 592 dc do not copy area on fork 593 de do not expand area on remapping 594 ac area is accountable 595 nr swap space is not reserved for the area 596 ht area uses huge tlb pages 597 sf synchronous page fault 598 ar architecture specific flag 599 wf wipe on fork 600 dd do not include area into core dump 601 sd soft dirty flag 602 mm mixed map area 603 hg huge page advise flag 604 nh no huge page advise flag 605 mg mergeable advise flag 606 bt arm64 BTI guarded page 607 mt arm64 MTE allocation tags are enabled 608 um userfaultfd missing tracking 609 uw userfaultfd wr-protect tracking 610 ui userfaultfd minor fault 611 ss shadow/guarded control stack page 612 sl sealed 613 lf lock on fault pages 614 dp always lazily freeable mapping 615 gu maybe contains guard regions (if not set, definitely doesn't) 616 == ============================================================= 617 618Note that there is no guarantee that every flag and associated mnemonic will 619be present in all further kernel releases. Things get changed, the flags may 620be vanished or the reverse -- new added. Interpretation of their meaning 621might change in future as well. So each consumer of these flags has to 622follow each specific kernel version for the exact semantic. 623 624This file is only present if the CONFIG_MMU kernel configuration option is 625enabled. 626 627Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent 628output can be achieved only in the single read call). 629 630This typically manifests when doing partial reads of these files while the 631memory map is being modified. Despite the races, we do provide the following 632guarantees: 633 6341) The mapped addresses never go backwards, which implies no two 635 regions will ever overlap. 6362) If there is something at a given vaddr during the entirety of the 637 life of the smaps/maps walk, there will be some output for it. 638 639The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps, 640but their values are the sums of the corresponding values for all mappings of 641the process. Additionally, it contains these fields: 642 643- Pss_Anon 644- Pss_File 645- Pss_Shmem 646 647They represent the proportional shares of anonymous, file, and shmem pages, as 648described for smaps above. These fields are omitted in smaps since each 649mapping identifies the type (anon, file, or shmem) of all pages it contains. 650Thus all information in smaps_rollup can be derived from smaps, but at a 651significantly higher cost. 652 653The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG 654bits on both physical and virtual pages associated with a process, and the 655soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst 656for details). 657To clear the bits for all the pages associated with the process:: 658 659 > echo 1 > /proc/PID/clear_refs 660 661To clear the bits for the anonymous pages associated with the process:: 662 663 > echo 2 > /proc/PID/clear_refs 664 665To clear the bits for the file mapped pages associated with the process:: 666 667 > echo 3 > /proc/PID/clear_refs 668 669To clear the soft-dirty bit:: 670 671 > echo 4 > /proc/PID/clear_refs 672 673To reset the peak resident set size ("high water mark") to the process's 674current value:: 675 676 > echo 5 > /proc/PID/clear_refs 677 678Any other value written to /proc/PID/clear_refs will have no effect. 679 680The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags 681using /proc/kpageflags and number of times a page is mapped using 682/proc/kpagecount. For detailed explanation, see 683Documentation/admin-guide/mm/pagemap.rst. 684 685The /proc/pid/numa_maps is an extension based on maps, showing the memory 686locality and binding policy, as well as the memory usage (in pages) of 687each mapping. The output follows a general format where mapping details get 688summarized separated by blank spaces, one mapping per each file line:: 689 690 address policy mapping details 691 692 00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4 693 00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4 694 3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4 695 320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 696 3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 697 3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4 698 3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4 699 320698b000 default file=/lib64/libc-2.12.so 700 3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4 701 3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 702 3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4 703 7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4 704 7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4 705 7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048 706 7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4 707 7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4 708 709Where: 710 711"address" is the starting address for the mapping; 712 713"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst); 714 715"mapping details" summarizes mapping data such as mapping type, page usage counters, 716node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page 717size, in KB, that is backing the mapping up. 718 719Note that some kernel configurations do not track the precise number of times 720a page part of a larger allocation (e.g., THP) is mapped. In these 721configurations, "mapmax" might corresponds to the average number of mappings 722per page in such a larger allocation instead. 723 7241.2 Kernel data 725--------------- 726 727Similar to the process entries, the kernel data files give information about 728the running kernel. The files used to obtain this information are contained in 729/proc and are listed in Table 1-5. Not all of these will be present in your 730system. It depends on the kernel configuration and the loaded modules, which 731files are there, and which are missing. 732 733.. table:: Table 1-5: Kernel info in /proc 734 735 ============ =============================================================== 736 File Content 737 ============ =============================================================== 738 allocinfo Memory allocations profiling information 739 apm Advanced power management info 740 bootconfig Kernel command line obtained from boot config, 741 and, if there were kernel parameters from the 742 boot loader, a "# Parameters from bootloader:" 743 line followed by a line containing those 744 parameters prefixed by "# ". (5.5) 745 buddyinfo Kernel memory allocator information (see text) (2.5) 746 bus Directory containing bus specific information 747 cmdline Kernel command line, both from bootloader and embedded 748 in the kernel image 749 cpuinfo Info about the CPU 750 devices Available devices (block and character) 751 dma Used DMA channels 752 filesystems Supported filesystems 753 driver Various drivers grouped here, currently rtc (2.4) 754 execdomains Execdomains, related to security (2.4) 755 fb Frame Buffer devices (2.4) 756 fs File system parameters, currently nfs/exports (2.4) 757 ide Directory containing info about the IDE subsystem 758 interrupts Interrupt usage 759 iomem Memory map (2.4) 760 ioports I/O port usage 761 irq Masks for irq to cpu affinity (2.4)(smp?) 762 isapnp ISA PnP (Plug&Play) Info (2.4) 763 kcore Kernel core image (can be ELF or A.OUT(deprecated in 2.4)) 764 kmsg Kernel messages 765 ksyms Kernel symbol table 766 loadavg Load average of last 1, 5 & 15 minutes; 767 number of processes currently runnable (running or on ready queue); 768 total number of processes in system; 769 last pid created. 770 All fields are separated by one space except "number of 771 processes currently runnable" and "total number of processes 772 in system", which are separated by a slash ('/'). Example: 773 0.61 0.61 0.55 3/828 22084 774 locks Kernel locks 775 meminfo Memory info 776 misc Miscellaneous 777 modules List of loaded modules 778 mounts Mounted filesystems 779 net Networking info (see text) 780 pagetypeinfo Additional page allocator information (see text) (2.5) 781 partitions Table of partitions known to the system 782 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, 783 decoupled by lspci (2.4) 784 rtc Real time clock 785 scsi SCSI info (see text) 786 slabinfo Slab pool info 787 softirqs softirq usage 788 stat Overall statistics 789 swaps Swap space utilization 790 sys See chapter 2 791 sysvipc Info of SysVIPC Resources (msg, sem, shm) (2.4) 792 tty Info of tty drivers 793 uptime Wall clock since boot, combined idle time of all cpus 794 version Kernel version 795 video bttv info of video resources (2.4) 796 vmallocinfo Show vmalloced areas 797 ============ =============================================================== 798 799You can, for example, check which interrupts are currently in use and what 800they are used for by looking in the file /proc/interrupts:: 801 802 > cat /proc/interrupts 803 CPU0 804 0: 8728810 XT-PIC timer 805 1: 895 XT-PIC keyboard 806 2: 0 XT-PIC cascade 807 3: 531695 XT-PIC aha152x 808 4: 2014133 XT-PIC serial 809 5: 44401 XT-PIC pcnet_cs 810 8: 2 XT-PIC rtc 811 11: 8 XT-PIC i82365 812 12: 182918 XT-PIC PS/2 Mouse 813 13: 1 XT-PIC fpu 814 14: 1232265 XT-PIC ide0 815 15: 7 XT-PIC ide1 816 NMI: 0 817 818In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the 819output of a SMP machine):: 820 821 > cat /proc/interrupts 822 823 CPU0 CPU1 824 0: 1243498 1214548 IO-APIC-edge timer 825 1: 8949 8958 IO-APIC-edge keyboard 826 2: 0 0 XT-PIC cascade 827 5: 11286 10161 IO-APIC-edge soundblaster 828 8: 1 0 IO-APIC-edge rtc 829 9: 27422 27407 IO-APIC-edge 3c503 830 12: 113645 113873 IO-APIC-edge PS/2 Mouse 831 13: 0 0 XT-PIC fpu 832 14: 22491 24012 IO-APIC-edge ide0 833 15: 2183 2415 IO-APIC-edge ide1 834 17: 30564 30414 IO-APIC-level eth0 835 18: 177 164 IO-APIC-level bttv 836 NMI: 2457961 2457959 837 LOC: 2457882 2457881 838 ERR: 2155 839 840NMI is incremented in this case because every timer interrupt generates a NMI 841(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups. 842 843LOC is the local interrupt counter of the internal APIC of every CPU. 844 845ERR is incremented in the case of errors in the IO-APIC bus (the bus that 846connects the CPUs in a SMP system. This means that an error has been detected, 847the IO-APIC automatically retry the transmission, so it should not be a big 848problem, but you should read the SMP-FAQ. 849 850In 2.6.2* /proc/interrupts was expanded again. This time the goal was for 851/proc/interrupts to display every IRQ vector in use by the system, not 852just those considered 'most important'. The new vectors are: 853 854THR 855 interrupt raised when a machine check threshold counter 856 (typically counting ECC corrected errors of memory or cache) exceeds 857 a configurable threshold. Only available on some systems. 858 859TRM 860 a thermal event interrupt occurs when a temperature threshold 861 has been exceeded for the CPU. This interrupt may also be generated 862 when the temperature drops back to normal. 863 864SPU 865 a spurious interrupt is some interrupt that was raised then lowered 866 by some IO device before it could be fully processed by the APIC. Hence 867 the APIC sees the interrupt but does not know what device it came from. 868 For this case the APIC will generate the interrupt with a IRQ vector 869 of 0xff. This might also be generated by chipset bugs. 870 871RES, CAL, TLB 872 rescheduling, call and TLB flush interrupts are 873 sent from one CPU to another per the needs of the OS. Typically, 874 their statistics are used by kernel developers and interested users to 875 determine the occurrence of interrupts of the given type. 876 877The above IRQ vectors are displayed only when relevant. For example, 878the threshold vector does not exist on x86_64 platforms. Others are 879suppressed when the system is a uniprocessor. As of this writing, only 880i386 and x86_64 platforms support the new IRQ vector displays. 881 882Of some interest is the introduction of the /proc/irq directory to 2.4. 883It could be used to set IRQ to CPU affinity. This means that you can "hook" an 884IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the 885irq subdir is one subdir for each IRQ, and default_smp_affinity. 886 887For example:: 888 889 > ls /proc/irq/ 890 0 10 12 14 16 18 2 4 6 8 default_smp_affinity 891 1 11 13 15 17 19 3 5 7 9 892 > ls /proc/irq/0/ 893 smp_affinity 894 895smp_affinity is a bitmask, in which you can specify which CPUs can handle the 896IRQ. You can set it by doing:: 897 898 > echo 1 > /proc/irq/10/smp_affinity 899 900This means that only the first CPU will handle the IRQ, but you can also echo 9015 which means that only the first and third CPU can handle the IRQ. 902 903The contents of each smp_affinity file is the same by default:: 904 905 > cat /proc/irq/0/smp_affinity 906 ffffffff 907 908There is an alternate interface, smp_affinity_list which allows specifying 909a CPU range instead of a bitmask:: 910 911 > cat /proc/irq/0/smp_affinity_list 912 1024-1031 913 914The default_smp_affinity mask applies to all non-active IRQs, which are the 915IRQs which have not yet been allocated/activated, and hence which lack a 916/proc/irq/[0-9]* directory. 917 918The node file on an SMP system shows the node to which the device using the IRQ 919reports itself as being attached. This hardware locality information does not 920include information about any possible driver locality preference. 921 922The way IRQs are routed is handled by the IO-APIC, and it's Round Robin 923between all the CPUs which are allowed to handle it. As usual the kernel has 924more info than you and does a better job than you, so the defaults are the 925best choice for almost everyone. [Note this applies only to those IO-APIC's 926that support "Round Robin" interrupt distribution.] 927 928There are three more important subdirectories in /proc: net, scsi, and sys. 929The general rule is that the contents, or even the existence of these 930directories, depend on your kernel configuration. If SCSI is not enabled, the 931directory scsi may not exist. The same is true with the net, which is there 932only when networking support is present in the running kernel. 933 934The slabinfo file gives information about memory usage at the slab level. 935Linux uses slab pools for memory management above page level in version 2.2. 936Commonly used objects have their own slab pool (such as network buffers, 937directory cache, and so on). 938 939:: 940 941 > cat /proc/buddyinfo 942 943 Node 0, zone DMA 0 4 5 4 4 3 ... 944 Node 0, zone Normal 1 0 0 1 101 8 ... 945 Node 0, zone HighMem 2 0 0 1 1 0 ... 946 947External fragmentation is a problem under some workloads, and buddyinfo is a 948useful tool for helping diagnose these problems. Buddyinfo will give you a 949clue as to how big an area you can safely allocate, or why a previous 950allocation failed. 951 952Each column represents the number of pages of a certain order which are 953available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in 954ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 955available in ZONE_NORMAL, etc... 956 957More information relevant to external fragmentation can be found in 958pagetypeinfo:: 959 960 > cat /proc/pagetypeinfo 961 Page block order: 9 962 Pages per block: 512 963 964 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 965 Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 1 1 0 966 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 967 Node 0, zone DMA, type Movable 1 1 2 1 2 1 1 0 1 0 2 968 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0 969 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 970 Node 0, zone DMA32, type Unmovable 103 54 77 1 1 1 11 8 7 1 9 971 Node 0, zone DMA32, type Reclaimable 0 0 2 1 0 0 0 0 1 0 0 972 Node 0, zone DMA32, type Movable 169 152 113 91 77 54 39 13 6 1 452 973 Node 0, zone DMA32, type Reserve 1 2 2 2 2 0 1 1 1 1 0 974 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 975 976 Number of blocks type Unmovable Reclaimable Movable Reserve Isolate 977 Node 0, zone DMA 2 0 5 1 0 978 Node 0, zone DMA32 41 6 967 2 0 979 980Fragmentation avoidance in the kernel works by grouping pages of different 981migrate types into the same contiguous regions of memory called page blocks. 982A page block is typically the size of the default hugepage size, e.g. 2MB on 983X86-64. By keeping pages grouped based on their ability to move, the kernel 984can reclaim pages within a page block to satisfy a high-order allocation. 985 986The pagetypinfo begins with information on the size of a page block. It 987then gives the same type of information as buddyinfo except broken down 988by migrate-type and finishes with details on how many page blocks of each 989type exist. 990 991If min_free_kbytes has been tuned correctly (recommendations made by hugeadm 992from libhugetlbfs https://github.com/libhugetlbfs/libhugetlbfs/), one can 993make an estimate of the likely number of huge pages that can be allocated 994at a given point in time. All the "Movable" blocks should be allocatable 995unless memory has been mlock()'d. Some of the Reclaimable blocks should 996also be allocatable although a lot of filesystem metadata may have to be 997reclaimed to achieve this. 998 999 1000allocinfo 1001~~~~~~~~~ 1002 1003Provides information about memory allocations at all locations in the code 1004base. Each allocation in the code is identified by its source file, line 1005number, module (if originates from a loadable module) and the function calling 1006the allocation. The number of bytes allocated and number of calls at each 1007location are reported. The first line indicates the version of the file, the 1008second line is the header listing fields in the file. 1009If file version is 2.0 or higher then each line may contain additional 1010<key>:<value> pairs representing extra information about the call site. 1011For example if the counters are not accurate, the line will be appended with 1012"accurate:no" pair. 1013 1014Supported markers in v2: 1015accurate:no 1016 1017 Absolute values of the counters in this line are not accurate 1018 because of the failure to allocate memory to track some of the 1019 allocations made at this location. Deltas in these counters are 1020 accurate, therefore counters can be used to track allocation size 1021 and count changes. 1022 1023Example output. 1024 1025:: 1026 1027 > tail -n +3 /proc/allocinfo | sort -rn 1028 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 1029 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 1030 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 1031 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 1032 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 1033 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 1034 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 1035 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 1036 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 1037 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 1038 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node 1039 ... 1040 1041 1042meminfo 1043~~~~~~~ 1044 1045Provides information about distribution and utilization of memory. This 1046varies by architecture and compile options. Some of the counters reported 1047here overlap. The memory reported by the non overlapping counters may not 1048add up to the overall memory usage and the difference for some workloads 1049can be substantial. In many cases there are other means to find out 1050additional memory using subsystem specific interfaces, for instance 1051/proc/net/sockstat for TCP memory allocations. 1052 1053Example output. You may not have all of these fields. 1054 1055:: 1056 1057 > cat /proc/meminfo 1058 1059 MemTotal: 32858820 kB 1060 MemFree: 21001236 kB 1061 MemAvailable: 27214312 kB 1062 Buffers: 581092 kB 1063 Cached: 5587612 kB 1064 SwapCached: 0 kB 1065 Active: 3237152 kB 1066 Inactive: 7586256 kB 1067 Active(anon): 94064 kB 1068 Inactive(anon): 4570616 kB 1069 Active(file): 3143088 kB 1070 Inactive(file): 3015640 kB 1071 Unevictable: 0 kB 1072 Mlocked: 0 kB 1073 SwapTotal: 0 kB 1074 SwapFree: 0 kB 1075 Zswap: 1904 kB 1076 Zswapped: 7792 kB 1077 Dirty: 12 kB 1078 Writeback: 0 kB 1079 AnonPages: 4654780 kB 1080 Mapped: 266244 kB 1081 Shmem: 9976 kB 1082 KReclaimable: 517708 kB 1083 Slab: 660044 kB 1084 SReclaimable: 517708 kB 1085 SUnreclaim: 142336 kB 1086 KernelStack: 11168 kB 1087 PageTables: 20540 kB 1088 SecPageTables: 0 kB 1089 NFS_Unstable: 0 kB 1090 Bounce: 0 kB 1091 WritebackTmp: 0 kB 1092 CommitLimit: 16429408 kB 1093 Committed_AS: 7715148 kB 1094 VmallocTotal: 34359738367 kB 1095 VmallocUsed: 40444 kB 1096 VmallocChunk: 0 kB 1097 Percpu: 29312 kB 1098 EarlyMemtestBad: 0 kB 1099 HardwareCorrupted: 0 kB 1100 AnonHugePages: 4149248 kB 1101 ShmemHugePages: 0 kB 1102 ShmemPmdMapped: 0 kB 1103 FileHugePages: 0 kB 1104 FilePmdMapped: 0 kB 1105 CmaTotal: 0 kB 1106 CmaFree: 0 kB 1107 Unaccepted: 0 kB 1108 Balloon: 0 kB 1109 GPUActive: 0 kB 1110 GPUReclaim: 0 kB 1111 HugePages_Total: 0 1112 HugePages_Free: 0 1113 HugePages_Rsvd: 0 1114 HugePages_Surp: 0 1115 Hugepagesize: 2048 kB 1116 Hugetlb: 0 kB 1117 DirectMap4k: 401152 kB 1118 DirectMap2M: 10008576 kB 1119 DirectMap1G: 24117248 kB 1120 1121MemTotal 1122 Total usable RAM (i.e. physical RAM minus a few reserved 1123 bits and the kernel binary code) 1124MemFree 1125 Total free RAM. On highmem systems, the sum of LowFree+HighFree 1126MemAvailable 1127 An estimate of how much memory is available for starting new 1128 applications, without swapping. Calculated from MemFree, 1129 SReclaimable, the size of the file LRU lists, and the low 1130 watermarks in each zone. 1131 The estimate takes into account that the system needs some 1132 page cache to function well, and that not all reclaimable 1133 slab will be reclaimable, due to items being in use. The 1134 impact of those factors will vary from system to system. 1135Buffers 1136 Relatively temporary storage for raw disk blocks 1137 shouldn't get tremendously large (20MB or so) 1138Cached 1139 In-memory cache for files read from the disk (the 1140 pagecache) as well as tmpfs & shmem. 1141 Doesn't include SwapCached. 1142SwapCached 1143 Memory that once was swapped out, is swapped back in but 1144 still also is in the swapfile (if memory is needed it 1145 doesn't need to be swapped out AGAIN because it is already 1146 in the swapfile. This saves I/O) 1147Active 1148 Memory that has been used more recently and usually not 1149 reclaimed unless absolutely necessary. 1150Inactive 1151 Memory which has been less recently used. It is more 1152 eligible to be reclaimed for other purposes 1153Unevictable 1154 Memory allocated for userspace which cannot be reclaimed, such 1155 as mlocked pages, ramfs backing pages, secret memfd pages etc. 1156Mlocked 1157 Memory locked with mlock(). 1158HighTotal, HighFree 1159 Highmem is all memory above ~860MB of physical memory. 1160 Highmem areas are for use by userspace programs, or 1161 for the pagecache. The kernel must use tricks to access 1162 this memory, making it slower to access than lowmem. 1163LowTotal, LowFree 1164 Lowmem is memory which can be used for everything that 1165 highmem can be used for, but it is also available for the 1166 kernel's use for its own data structures. Among many 1167 other things, it is where everything from the Slab is 1168 allocated. Bad things happen when you're out of lowmem. 1169SwapTotal 1170 total amount of swap space available 1171SwapFree 1172 Memory which has been evicted from RAM, and is temporarily 1173 on the disk 1174Zswap 1175 Memory consumed by the zswap backend (compressed size) 1176Zswapped 1177 Amount of anonymous memory stored in zswap (original size) 1178Dirty 1179 Memory which is waiting to get written back to the disk 1180Writeback 1181 Memory which is actively being written back to the disk 1182AnonPages 1183 Non-file backed pages mapped into userspace page tables. Note that 1184 some kernel configurations might consider all pages part of a 1185 larger allocation (e.g., THP) as "mapped", as soon as a single 1186 page is mapped. 1187Mapped 1188 files which have been mmapped, such as libraries. Note that some 1189 kernel configurations might consider all pages part of a larger 1190 allocation (e.g., THP) as "mapped", as soon as a single page is 1191 mapped. 1192Shmem 1193 Total memory used by shared memory (shmem) and tmpfs 1194KReclaimable 1195 Kernel allocations that the kernel will attempt to reclaim 1196 under memory pressure. Includes SReclaimable (below), and other 1197 direct allocations with a shrinker. 1198Slab 1199 in-kernel data structures cache 1200SReclaimable 1201 Part of Slab, that might be reclaimed, such as caches 1202SUnreclaim 1203 Part of Slab, that cannot be reclaimed on memory pressure 1204KernelStack 1205 Memory consumed by the kernel stacks of all tasks 1206PageTables 1207 Memory consumed by userspace page tables 1208SecPageTables 1209 Memory consumed by secondary page tables, this currently includes 1210 KVM mmu and IOMMU allocations on x86 and arm64. 1211NFS_Unstable 1212 Always zero. Previously counted pages which had been written to 1213 the server, but has not been committed to stable storage. 1214Bounce 1215 Always zero. Previously memory used for block device 1216 "bounce buffers". 1217WritebackTmp 1218 Always zero. Previously memory used by FUSE for temporary 1219 writeback buffers. 1220CommitLimit 1221 Based on the overcommit ratio ('vm.overcommit_ratio'), 1222 this is the total amount of memory currently available to 1223 be allocated on the system. This limit is only adhered to 1224 if strict overcommit accounting is enabled (mode 2 in 1225 'vm.overcommit_memory'). 1226 1227 The CommitLimit is calculated with the following formula:: 1228 1229 CommitLimit = ([total RAM pages] - [total huge TLB pages]) * 1230 overcommit_ratio / 100 + [total swap pages] 1231 1232 For example, on a system with 1G of physical RAM and 7G 1233 of swap with a `vm.overcommit_ratio` of 30 it would 1234 yield a CommitLimit of 7.3G. 1235 1236 For more details, see the memory overcommit documentation 1237 in mm/overcommit-accounting. 1238Committed_AS 1239 The amount of memory presently allocated on the system. 1240 The committed memory is a sum of all of the memory which 1241 has been allocated by processes, even if it has not been 1242 "used" by them as of yet. A process which malloc()'s 1G 1243 of memory, but only touches 300M of it will show up as 1244 using 1G. This 1G is memory which has been "committed" to 1245 by the VM and can be used at any time by the allocating 1246 application. With strict overcommit enabled on the system 1247 (mode 2 in 'vm.overcommit_memory'), allocations which would 1248 exceed the CommitLimit (detailed above) will not be permitted. 1249 This is useful if one needs to guarantee that processes will 1250 not fail due to lack of memory once that memory has been 1251 successfully allocated. 1252VmallocTotal 1253 total size of vmalloc virtual address space 1254VmallocUsed 1255 amount of vmalloc area which is used 1256VmallocChunk 1257 largest contiguous block of vmalloc area which is free 1258Percpu 1259 Memory allocated to the percpu allocator used to back percpu 1260 allocations. This stat excludes the cost of metadata. 1261EarlyMemtestBad 1262 The amount of RAM/memory in kB, that was identified as corrupted 1263 by early memtest. If memtest was not run, this field will not 1264 be displayed at all. Size is never rounded down to 0 kB. 1265 That means if 0 kB is reported, you can safely assume 1266 there was at least one pass of memtest and none of the passes 1267 found a single faulty byte of RAM. 1268HardwareCorrupted 1269 The amount of RAM/memory in KB, the kernel identifies as 1270 corrupted. 1271AnonHugePages 1272 Non-file backed huge pages mapped into userspace page tables 1273ShmemHugePages 1274 Memory used by shared memory (shmem) and tmpfs allocated 1275 with huge pages 1276ShmemPmdMapped 1277 Shared memory mapped into userspace with huge pages 1278FileHugePages 1279 Memory used for filesystem data (page cache) allocated 1280 with huge pages 1281FilePmdMapped 1282 Page cache mapped into userspace with huge pages 1283CmaTotal 1284 Memory reserved for the Contiguous Memory Allocator (CMA) 1285CmaFree 1286 Free remaining memory in the CMA reserves 1287Unaccepted 1288 Memory that has not been accepted by the guest 1289Balloon 1290 Memory returned to Host by VM Balloon Drivers 1291GPUActive 1292 System memory allocated to active GPU objects 1293GPUReclaim 1294 System memory stored in GPU pools for reuse. This memory is not 1295 counted in GPUActive. It is shrinker reclaimable memory kept in a reuse 1296 pool because it has non-standard page table attributes, like WC or UC. 1297HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb 1298 See Documentation/admin-guide/mm/hugetlbpage.rst. 1299DirectMap4k, DirectMap2M, DirectMap1G 1300 Breakdown of page table sizes used in the kernel's 1301 identity mapping of RAM 1302 1303vmallocinfo 1304~~~~~~~~~~~ 1305 1306Provides information about vmalloced/vmaped areas. One line per area, 1307containing the virtual address range of the area, size in bytes, 1308caller information of the creator, and optional information depending 1309on the kind of area: 1310 1311 ========== =================================================== 1312 pages=nr number of pages 1313 phys=addr if a physical address was specified 1314 ioremap I/O mapping (ioremap() and friends) 1315 vmalloc vmalloc() area 1316 vmap vmap()ed pages 1317 user VM_USERMAP area 1318 vpages buffer for pages pointers was vmalloced (huge area) 1319 N<node>=nr (Only on NUMA kernels) 1320 Number of pages allocated on memory node <node> 1321 ========== =================================================== 1322 1323:: 1324 1325 > cat /proc/vmallocinfo 1326 0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... 1327 /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 1328 0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... 1329 /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 1330 0xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f... 1331 phys=7fee8000 ioremap 1332 0xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f... 1333 phys=7fee7000 ioremap 1334 0xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210 1335 0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ... 1336 /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 1337 0xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ... 1338 pages=2 vmalloc N1=2 1339 0xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ... 1340 /0x130 [x_tables] pages=4 vmalloc N0=4 1341 0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ... 1342 pages=14 vmalloc N2=14 1343 0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ... 1344 pages=4 vmalloc N1=4 1345 0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ... 1346 pages=2 vmalloc N1=2 1347 0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ... 1348 pages=10 vmalloc N0=10 1349 1350 1351softirqs 1352~~~~~~~~ 1353 1354Provides counts of softirq handlers serviced since boot time, for each CPU. 1355 1356:: 1357 1358 > cat /proc/softirqs 1359 CPU0 CPU1 CPU2 CPU3 1360 HI: 0 0 0 0 1361 TIMER: 27166 27120 27097 27034 1362 NET_TX: 0 0 0 17 1363 NET_RX: 42 0 0 39 1364 BLOCK: 0 0 107 1121 1365 TASKLET: 0 0 0 290 1366 SCHED: 27035 26983 26971 26746 1367 HRTIMER: 0 0 0 0 1368 RCU: 1678 1769 2178 2250 1369 13701.3 Networking info in /proc/net 1371-------------------------------- 1372 1373The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the 1374additional values you get for IP version 6 if you configure the kernel to 1375support this. Table 1-9 lists the files and their meaning. 1376 1377 1378.. table:: Table 1-8: IPv6 info in /proc/net 1379 1380 ========== ===================================================== 1381 File Content 1382 ========== ===================================================== 1383 udp6 UDP sockets (IPv6) 1384 tcp6 TCP sockets (IPv6) 1385 raw6 Raw device statistics (IPv6) 1386 igmp6 IP multicast addresses, which this host joined (IPv6) 1387 if_inet6 List of IPv6 interface addresses 1388 ipv6_route Kernel routing table for IPv6 1389 rt6_stats Global IPv6 routing tables statistics 1390 sockstat6 Socket statistics (IPv6) 1391 snmp6 Snmp data (IPv6) 1392 ========== ===================================================== 1393 1394.. table:: Table 1-9: Network info in /proc/net 1395 1396 ============= ================================================================ 1397 File Content 1398 ============= ================================================================ 1399 arp Kernel ARP table 1400 dev network devices with statistics 1401 dev_mcast the Layer2 multicast groups a device is listening too 1402 (interface index, label, number of references, number of bound 1403 addresses). 1404 dev_stat network device status 1405 ip_fwchains Firewall chain linkage 1406 ip_fwnames Firewall chain names 1407 ip_masq Directory containing the masquerading tables 1408 ip_masquerade Major masquerading table 1409 netstat Network statistics 1410 raw raw device statistics 1411 route Kernel routing table 1412 rpc Directory containing rpc info 1413 rt_cache Routing cache 1414 snmp SNMP data 1415 sockstat Socket statistics 1416 softnet_stat Per-CPU incoming packets queues statistics of online CPUs 1417 tcp TCP sockets 1418 udp UDP sockets 1419 unix UNIX domain sockets 1420 wireless Wireless interface data (Wavelan etc) 1421 igmp IP multicast addresses, which this host joined 1422 psched Global packet scheduler parameters. 1423 netlink List of PF_NETLINK sockets 1424 ip_mr_vifs List of multicast virtual interfaces 1425 ip_mr_cache List of multicast routing cache 1426 ============= ================================================================ 1427 1428You can use this information to see which network devices are available in 1429your system and how much traffic was routed over those devices:: 1430 1431 > cat /proc/net/dev 1432 Inter-|Receive |[... 1433 face |bytes packets errs drop fifo frame compressed multicast|[... 1434 lo: 908188 5596 0 0 0 0 0 0 [... 1435 ppp0:15475140 20721 410 0 0 410 0 0 [... 1436 eth0: 614530 7085 0 0 0 0 0 1 [... 1437 1438 ...] Transmit 1439 ...] bytes packets errs drop fifo colls carrier compressed 1440 ...] 908188 5596 0 0 0 0 0 0 1441 ...] 1375103 17405 0 0 0 0 0 0 1442 ...] 1703981 5535 0 0 0 3 0 0 1443 1444In addition, each Channel Bond interface has its own directory. For 1445example, the bond0 device will have a directory called /proc/net/bond0/. 1446It will contain information that is specific to that bond, such as the 1447current slaves of the bond, the link status of the slaves, and how 1448many times the slaves link has failed. 1449 14501.4 SCSI info 1451------------- 1452 1453If you have a SCSI or ATA host adapter in your system, you'll find a 1454subdirectory named after the driver for this adapter in /proc/scsi. 1455You'll also see a list of all recognized SCSI devices in /proc/scsi:: 1456 1457 >cat /proc/scsi/scsi 1458 Attached devices: 1459 Host: scsi0 Channel: 00 Id: 00 Lun: 00 1460 Vendor: IBM Model: DGHS09U Rev: 03E0 1461 Type: Direct-Access ANSI SCSI revision: 03 1462 Host: scsi0 Channel: 00 Id: 06 Lun: 00 1463 Vendor: PIONEER Model: CD-ROM DR-U06S Rev: 1.04 1464 Type: CD-ROM ANSI SCSI revision: 02 1465 1466 1467The directory named after the driver has one file for each adapter found in 1468the system. These files contain information about the controller, including 1469the used IRQ and the IO address range. The amount of information shown is 1470dependent on the adapter you use. The example shows the output for an Adaptec 1471AHA-2940 SCSI adapter:: 1472 1473 > cat /proc/scsi/aic7xxx/0 1474 1475 Adaptec AIC7xxx driver version: 5.1.19/3.2.4 1476 Compile Options: 1477 TCQ Enabled By Default : Disabled 1478 AIC7XXX_PROC_STATS : Disabled 1479 AIC7XXX_RESET_DELAY : 5 1480 Adapter Configuration: 1481 SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter 1482 Ultra Wide Controller 1483 PCI MMAPed I/O Base: 0xeb001000 1484 Adapter SEEPROM Config: SEEPROM found and used. 1485 Adaptec SCSI BIOS: Enabled 1486 IRQ: 10 1487 SCBs: Active 0, Max Active 2, 1488 Allocated 15, HW 16, Page 255 1489 Interrupts: 160328 1490 BIOS Control Word: 0x18b6 1491 Adapter Control Word: 0x005b 1492 Extended Translation: Enabled 1493 Disconnect Enable Flags: 0xffff 1494 Ultra Enable Flags: 0x0001 1495 Tag Queue Enable Flags: 0x0000 1496 Ordered Queue Tag Flags: 0x0000 1497 Default Tag Queue Depth: 8 1498 Tagged Queue By Device array for aic7xxx host instance 0: 1499 {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} 1500 Actual queue depth per device for aic7xxx host instance 0: 1501 {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} 1502 Statistics: 1503 (scsi0:0:0:0) 1504 Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 1505 Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) 1506 Total transfers 160151 (74577 reads and 85574 writes) 1507 (scsi0:0:6:0) 1508 Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 1509 Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) 1510 Total transfers 0 (0 reads and 0 writes) 1511 1512 15131.5 Parallel port info in /proc/parport 1514--------------------------------------- 1515 1516The directory /proc/parport contains information about the parallel ports of 1517your system. It has one subdirectory for each port, named after the port 1518number (0,1,2,...). 1519 1520These directories contain the four files shown in Table 1-10. 1521 1522 1523.. table:: Table 1-10: Files in /proc/parport 1524 1525 ========= ==================================================================== 1526 File Content 1527 ========= ==================================================================== 1528 autoprobe Any IEEE-1284 device ID information that has been acquired. 1529 devices list of the device drivers using that port. A + will appear by the 1530 name of the device currently using the port (it might not appear 1531 against any). 1532 hardware Parallel port's base address, IRQ line and DMA channel. 1533 irq IRQ that parport is using for that port. This is in a separate 1534 file to allow you to alter it by writing a new value in (IRQ 1535 number or none). 1536 ========= ==================================================================== 1537 15381.6 TTY info in /proc/tty 1539------------------------- 1540 1541Information about the available and actually used tty's can be found in the 1542directory /proc/tty. You'll find entries for drivers and line disciplines in 1543this directory, as shown in Table 1-11. 1544 1545 1546.. table:: Table 1-11: Files in /proc/tty 1547 1548 ============= ============================================== 1549 File Content 1550 ============= ============================================== 1551 drivers list of drivers and their usage 1552 ldiscs registered line disciplines 1553 driver/serial usage statistic and status of single tty lines 1554 ============= ============================================== 1555 1556To see which tty's are currently in use, you can simply look into the file 1557/proc/tty/drivers:: 1558 1559 > cat /proc/tty/drivers 1560 pty_slave /dev/pts 136 0-255 pty:slave 1561 pty_master /dev/ptm 128 0-255 pty:master 1562 pty_slave /dev/ttyp 3 0-255 pty:slave 1563 pty_master /dev/pty 2 0-255 pty:master 1564 serial /dev/cua 5 64-67 serial:callout 1565 serial /dev/ttyS 4 64-67 serial 1566 /dev/tty0 /dev/tty0 4 0 system:vtmaster 1567 /dev/ptmx /dev/ptmx 5 2 system 1568 /dev/console /dev/console 5 1 system:console 1569 /dev/tty /dev/tty 5 0 system:/dev/tty 1570 unknown /dev/tty 4 1-63 console 1571 1572 15731.7 Miscellaneous kernel statistics in /proc/stat 1574------------------------------------------------- 1575 1576Various pieces of information about kernel activity are available in the 1577/proc/stat file. All of the numbers reported in this file are aggregates 1578since the system first booted. For a quick look, simply cat the file:: 1579 1580 > cat /proc/stat 1581 cpu 237902850 368826709 106375398 1873517540 1135548 0 14507935 0 0 0 1582 cpu0 60045249 91891769 26331539 468411416 495718 0 5739640 0 0 0 1583 cpu1 59746288 91759249 26609887 468860630 312281 0 4384817 0 0 0 1584 cpu2 59489247 92985423 26904446 467808813 171668 0 2268998 0 0 0 1585 cpu3 58622065 92190267 26529524 468436680 155879 0 2114478 0 0 0 1586 intr 8688370575 8 3373 0 0 0 0 0 0 1 40791 0 0 353317 0 0 0 0 224789828 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 190974333 41958554 123983334 43 0 224593 0 0 0 <more 0's deleted> 1587 ctxt 22848221062 1588 btime 1605316999 1589 processes 746787147 1590 procs_running 2 1591 procs_blocked 0 1592 softirq 12121874454 100099120 3938138295 127375644 2795979 187870761 0 173808342 3072582055 52608 224184354 1593 1594The very first "cpu" line aggregates the numbers in all of the other "cpuN" 1595lines. These numbers identify the amount of time the CPU has spent performing 1596different kinds of work. Time units are in USER_HZ (typically hundredths of a 1597second). The meanings of the columns are as follows, from left to right: 1598 1599- user: normal processes executing in user mode 1600- nice: niced processes executing in user mode 1601- system: processes executing in kernel mode 1602- idle: twiddling thumbs 1603- iowait: In a word, iowait stands for waiting for I/O to complete. But there 1604 are several problems: 1605 1606 1. CPU will not wait for I/O to complete, iowait is the time that a task is 1607 waiting for I/O to complete. When CPU goes into idle state for 1608 outstanding task I/O, another task will be scheduled on this CPU. 1609 2. In a multi-core CPU, the task waiting for I/O to complete is not running 1610 on any CPU, so the iowait of each CPU is difficult to calculate. 1611 3. The value of iowait field in /proc/stat will decrease in certain 1612 conditions. 1613 1614 So, the iowait is not reliable by reading from /proc/stat. 1615- irq: servicing interrupts 1616- softirq: servicing softirqs 1617- steal: involuntary wait 1618- guest: running a normal guest 1619- guest_nice: running a niced guest 1620 1621The "intr" line gives counts of interrupts serviced since boot time, for each 1622of the possible system interrupts. The first column is the total of all 1623interrupts serviced including unnumbered architecture specific interrupts; 1624each subsequent column is the total for that particular numbered interrupt. 1625Unnumbered interrupts are not shown, only summed into the total. 1626 1627The "ctxt" line gives the total number of context switches across all CPUs. 1628 1629The "btime" line gives the time at which the system booted, in seconds since 1630the Unix epoch. 1631 1632The "processes" line gives the number of processes and threads created, which 1633includes (but is not limited to) those created by calls to the fork() and 1634clone() system calls. 1635 1636The "procs_running" line gives the total number of threads that are 1637running or ready to run (i.e., the total number of runnable threads). 1638 1639The "procs_blocked" line gives the number of processes currently blocked, 1640waiting for I/O to complete. 1641 1642The "softirq" line gives counts of softirqs serviced since boot time, for each 1643of the possible system softirqs. The first column is the total of all 1644softirqs serviced; each subsequent column is the total for that particular 1645softirq. 1646 1647 16481.8 Ext4 file system parameters 1649------------------------------- 1650 1651Information about mounted ext4 file systems can be found in 1652/proc/fs/ext4. Each mounted filesystem will have a directory in 1653/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or 1654/proc/fs/ext4/sda9 or /proc/fs/ext4/dm-0). The files in each per-device 1655directory are shown in Table 1-12, below. 1656 1657.. table:: Table 1-12: Files in /proc/fs/ext4/<devname> 1658 1659 ============== ========================================================== 1660 File Content 1661 mb_groups details of multiblock allocator buddy cache of free blocks 1662 ============== ========================================================== 1663 16641.9 /proc/consoles 1665------------------- 1666Shows registered system console lines. 1667 1668To see which character device lines are currently used for the system console 1669/dev/console, you may simply look into the file /proc/consoles:: 1670 1671 > cat /proc/consoles 1672 tty0 -WU (ECp) 4:7 1673 ttyS0 -W- (Ep) 4:64 1674 1675The columns are: 1676 1677+--------------------+-------------------------------------------------------+ 1678| device | name of the device | 1679+====================+=======================================================+ 1680| operations | * R = can do read operations | 1681| | * W = can do write operations | 1682| | * U = can do unblank | 1683+--------------------+-------------------------------------------------------+ 1684| flags | * E = it is enabled | 1685| | * C = it is preferred console | 1686| | * B = it is primary boot console | 1687| | * p = it is used for printk buffer | 1688| | * b = it is not a TTY but a Braille device | 1689| | * a = it is safe to use when cpu is offline | 1690+--------------------+-------------------------------------------------------+ 1691| major:minor | major and minor number of the device separated by a | 1692| | colon | 1693+--------------------+-------------------------------------------------------+ 1694 1695Summary 1696------- 1697 1698The /proc file system serves information about the running system. It not only 1699allows access to process data but also allows you to request the kernel status 1700by reading files in the hierarchy. 1701 1702The directory structure of /proc reflects the types of information and makes 1703it easy, if not obvious, where to look for specific data. 1704 1705Chapter 2: Modifying System Parameters 1706====================================== 1707 1708In This Chapter 1709--------------- 1710 1711* Modifying kernel parameters by writing into files found in /proc/sys 1712* Exploring the files which modify certain parameters 1713* Review of the /proc/sys file tree 1714 1715------------------------------------------------------------------------------ 1716 1717A very interesting part of /proc is the directory /proc/sys. This is not only 1718a source of information, it also allows you to change parameters within the 1719kernel. Be very careful when attempting this. You can optimize your system, 1720but you can also cause it to crash. Never alter kernel parameters on a 1721production system. Set up a development machine and test to make sure that 1722everything works the way you want it to. You may have no alternative but to 1723reboot the machine once an error has been made. 1724 1725To change a value, simply echo the new value into the file. 1726You need to be root to do this. You can create your own boot script 1727to perform this every time your system boots. 1728 1729The files in /proc/sys can be used to fine tune and monitor miscellaneous and 1730general things in the operation of the Linux kernel. Since some of the files 1731can inadvertently disrupt your system, it is advisable to read both 1732documentation and source before actually making adjustments. In any case, be 1733very careful when writing to any of these files. The entries in /proc may 1734change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt 1735review the kernel documentation in the directory linux/Documentation. 1736This chapter is heavily based on the documentation included in the pre 2.2 1737kernels, and became part of it in version 2.2.1 of the Linux kernel. 1738 1739Please see: Documentation/admin-guide/sysctl/ directory for descriptions of 1740these entries. 1741 1742Summary 1743------- 1744 1745Certain aspects of kernel behavior can be modified at runtime, without the 1746need to recompile the kernel, or even to reboot the system. The files in the 1747/proc/sys tree can not only be read, but also modified. You can use the echo 1748command to write value into these files, thereby changing the default settings 1749of the kernel. 1750 1751 1752Chapter 3: Per-process Parameters 1753================================= 1754 17553.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score 1756-------------------------------------------------------------------------------- 1757 1758These files can be used to adjust the badness heuristic used to select which 1759process gets killed in out of memory (oom) conditions. 1760 1761The badness heuristic assigns a value to each candidate task ranging from 0 1762(never kill) to 1000 (always kill) to determine which process is targeted. The 1763units are roughly a proportion along that range of allowed memory the process 1764may allocate from based on an estimation of its current memory and swap use. 1765For example, if a task is using all allowed memory, its badness score will be 17661000. If it is using half of its allowed memory, its score will be 500. 1767 1768The amount of "allowed" memory depends on the context in which the oom killer 1769was called. If it is due to the memory assigned to the allocating task's cpuset 1770being exhausted, the allowed memory represents the set of mems assigned to that 1771cpuset. If it is due to a mempolicy's node(s) being exhausted, the allowed 1772memory represents the set of mempolicy nodes. If it is due to a memory 1773limit (or swap limit) being reached, the allowed memory is that configured 1774limit. Finally, if it is due to the entire system being out of memory, the 1775allowed memory represents all allocatable resources. 1776 1777The value of /proc/<pid>/oom_score_adj is added to the badness score before it 1778is used to determine which task to kill. Acceptable values range from -1000 1779(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows userspace to 1780polarize the preference for oom killing either by always preferring a certain 1781task or completely disabling it. The lowest possible value, -1000, is 1782equivalent to disabling oom killing entirely for that task since it will always 1783report a badness score of 0. 1784 1785Consequently, it is very simple for userspace to define the amount of memory to 1786consider for each task. Setting a /proc/<pid>/oom_score_adj value of +500, for 1787example, is roughly equivalent to allowing the remainder of tasks sharing the 1788same system, cpuset, mempolicy, or memory controller resources to use at least 178950% more memory. A value of -500, on the other hand, would be roughly 1790equivalent to discounting 50% of the task's allowed memory from being considered 1791as scoring against the task. 1792 1793For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also 1794be used to tune the badness score. Its acceptable values range from -16 1795(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17 1796(OOM_DISABLE) to disable oom killing entirely for that task. Its value is 1797scaled linearly with /proc/<pid>/oom_score_adj. 1798 1799The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last 1800value set by a CAP_SYS_RESOURCE process. To reduce the value any lower 1801requires CAP_SYS_RESOURCE. 1802 1803 18043.2 /proc/<pid>/oom_score - Display current oom-killer score 1805------------------------------------------------------------- 1806 1807This file can be used to check the current score used by the oom-killer for 1808any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which 1809process should be killed in an out-of-memory situation. 1810 1811Please note that the exported value includes oom_score_adj so it is 1812effectively in range [0,2000]. 1813 1814 18153.3 /proc/<pid>/io - Display the IO accounting fields 1816------------------------------------------------------- 1817 1818This file contains IO statistics for each running process. 1819 1820Example 1821~~~~~~~ 1822 1823:: 1824 1825 test:/tmp # dd if=/dev/zero of=/tmp/test.dat & 1826 [1] 3828 1827 1828 test:/tmp # cat /proc/3828/io 1829 rchar: 323934931 1830 wchar: 323929600 1831 syscr: 632687 1832 syscw: 632675 1833 read_bytes: 0 1834 write_bytes: 323932160 1835 cancelled_write_bytes: 0 1836 1837 1838Description 1839~~~~~~~~~~~ 1840 1841rchar 1842^^^^^ 1843 1844I/O counter: chars read 1845The number of bytes which this task has caused to be read from storage. This 1846is simply the sum of bytes which this process passed to read() and pread(). 1847It includes things like tty IO and it is unaffected by whether or not actual 1848physical disk IO was required (the read might have been satisfied from 1849pagecache). 1850 1851 1852wchar 1853^^^^^ 1854 1855I/O counter: chars written 1856The number of bytes which this task has caused, or shall cause to be written 1857to disk. Similar caveats apply here as with rchar. 1858 1859 1860syscr 1861^^^^^ 1862 1863I/O counter: read syscalls 1864Attempt to count the number of read I/O operations, i.e. syscalls like read() 1865and pread(). 1866 1867 1868syscw 1869^^^^^ 1870 1871I/O counter: write syscalls 1872Attempt to count the number of write I/O operations, i.e. syscalls like 1873write() and pwrite(). 1874 1875 1876read_bytes 1877^^^^^^^^^^ 1878 1879I/O counter: bytes read 1880Attempt to count the number of bytes which this process really did cause to 1881be fetched from the storage layer. Done at the submit_bio() level, so it is 1882accurate for block-backed filesystems. <please add status regarding NFS and 1883CIFS at a later time> 1884 1885 1886write_bytes 1887^^^^^^^^^^^ 1888 1889I/O counter: bytes written 1890Attempt to count the number of bytes which this process caused to be sent to 1891the storage layer. This is done at page-dirtying time. 1892 1893 1894cancelled_write_bytes 1895^^^^^^^^^^^^^^^^^^^^^ 1896 1897The big inaccuracy here is truncate. If a process writes 1MB to a file and 1898then deletes the file, it will in fact perform no writeout. But it will have 1899been accounted as having caused 1MB of write. 1900In other words: The number of bytes which this process caused to not happen, 1901by truncating pagecache. A task can cause "negative" IO too. If this task 1902truncates some dirty pagecache, some IO which another task has been accounted 1903for (in its write_bytes) will not be happening. We _could_ just subtract that 1904from the truncating task's write_bytes, but there is information loss in doing 1905that. 1906 1907 1908.. Note:: 1909 1910 At its current implementation state, this is a bit racy on 32-bit machines: 1911 if process A reads process B's /proc/pid/io while process B is updating one 1912 of those 64-bit counters, process A could see an intermediate result. 1913 1914 1915More information about this can be found within the taskstats documentation in 1916Documentation/accounting. 1917 19183.4 /proc/<pid>/coredump_filter - Core dump filtering settings 1919--------------------------------------------------------------- 1920When a process is dumped, all anonymous memory is written to a core file as 1921long as the size of the core file isn't limited. But sometimes we don't want 1922to dump some memory segments, for example, huge shared memory or DAX. 1923Conversely, sometimes we want to save file-backed memory segments into a core 1924file, not only the individual files. 1925 1926/proc/<pid>/coredump_filter allows you to customize which memory segments 1927will be dumped when the <pid> process is dumped. coredump_filter is a bitmask 1928of memory types. If a bit of the bitmask is set, memory segments of the 1929corresponding memory type are dumped, otherwise they are not dumped. 1930 1931The following 9 memory types are supported: 1932 1933 - (bit 0) anonymous private memory 1934 - (bit 1) anonymous shared memory 1935 - (bit 2) file-backed private memory 1936 - (bit 3) file-backed shared memory 1937 - (bit 4) ELF header pages in file-backed private memory areas (it is 1938 effective only if the bit 2 is cleared) 1939 - (bit 5) hugetlb private memory 1940 - (bit 6) hugetlb shared memory 1941 - (bit 7) DAX private memory 1942 - (bit 8) DAX shared memory 1943 1944 Note that MMIO pages such as frame buffer are never dumped and vDSO pages 1945 are always dumped regardless of the bitmask status. 1946 1947 Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is 1948 only affected by bit 5-6, and DAX is only affected by bits 7-8. 1949 1950The default value of coredump_filter is 0x33; this means all anonymous memory 1951segments, ELF header pages and hugetlb private memory are dumped. 1952 1953If you don't want to dump all shared memory segments attached to pid 1234, 1954write 0x31 to the process's proc file:: 1955 1956 $ echo 0x31 > /proc/1234/coredump_filter 1957 1958When a new process is created, the process inherits the bitmask status from its 1959parent. It is useful to set up coredump_filter before the program runs. 1960For example:: 1961 1962 $ echo 0x7 > /proc/self/coredump_filter 1963 $ ./some_program 1964 19653.5 /proc/<pid>/mountinfo - Information about mounts 1966-------------------------------------------------------- 1967 1968This file contains lines of the form:: 1969 1970 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue 1971 (1)(2)(3) (4) (5) (6) (n…m) (m+1)(m+2) (m+3) (m+4) 1972 1973 (1) mount ID: unique identifier of the mount (may be reused after umount) 1974 (2) parent ID: ID of parent (or of self for the top of the mount tree) 1975 (3) major:minor: value of st_dev for files on filesystem 1976 (4) root: root of the mount within the filesystem 1977 (5) mount point: mount point relative to the process's root 1978 (6) mount options: per mount options 1979 (n…m) optional fields: zero or more fields of the form "tag[:value]" 1980 (m+1) separator: marks the end of the optional fields 1981 (m+2) filesystem type: name of filesystem of the form "type[.subtype]" 1982 (m+3) mount source: filesystem specific information or "none" 1983 (m+4) super options: per super block options 1984 1985Parsers should ignore all unrecognised optional fields. Currently the 1986possible optional fields are: 1987 1988================ ============================================================== 1989shared:X mount is shared in peer group X 1990master:X mount is slave to peer group X 1991propagate_from:X mount is slave and receives propagation from peer group X [#]_ 1992unbindable mount is unbindable 1993================ ============================================================== 1994 1995.. [#] X is the closest dominant peer group under the process's root. If 1996 X is the immediate master of the mount, or if there's no dominant peer 1997 group under the same root, then only the "master:X" field is present 1998 and not the "propagate_from:X" field. 1999 2000For more information on mount propagation see: 2001 2002 Documentation/filesystems/sharedsubtree.rst 2003 2004 20053.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 2006-------------------------------------------------------- 2007These files provide a method to access a task's comm value. It also allows for 2008a task to set its own or one of its thread siblings comm value. The comm value 2009is limited in size compared to the cmdline value, so writing anything longer 2010then the kernel's TASK_COMM_LEN (currently 16 chars, including the NUL 2011terminator) will result in a truncated comm value. 2012 2013 20143.7 /proc/<pid>/task/<tid>/children - Information about task children 2015------------------------------------------------------------------------- 2016This file provides a fast way to retrieve first level children pids 2017of a task pointed by <pid>/<tid> pair. The format is a space separated 2018stream of pids. 2019 2020Note the "first level" here -- if a child has its own children they will 2021not be listed here; one needs to read /proc/<children-pid>/task/<tid>/children 2022to obtain the descendants. 2023 2024Since this interface is intended to be fast and cheap it doesn't 2025guarantee to provide precise results and some children might be 2026skipped, especially if they've exited right after we printed their 2027pids, so one needs to either stop or freeze processes being inspected 2028if precise results are needed. 2029 2030 20313.8 /proc/<pid>/fdinfo/<fd> - Information about opened file 2032--------------------------------------------------------------- 2033This file provides information associated with an opened file. The regular 2034files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'. 2035The 'pos' represents the current offset of the opened file in decimal 2036form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the 2037file has been created with [see open(2) for details] and 'mnt_id' represents 2038mount ID of the file system containing the opened file [see 3.5 2039/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of 2040the file. 2041 2042A typical output is:: 2043 2044 pos: 0 2045 flags: 0100002 2046 mnt_id: 19 2047 ino: 63107 2048 2049All locks associated with a file descriptor are shown in its fdinfo too:: 2050 2051 lock: 1: FLOCK ADVISORY WRITE 359 00:13:11691 0 EOF 2052 2053The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags 2054pair provide additional information particular to the objects they represent. 2055 2056Eventfd files 2057~~~~~~~~~~~~~ 2058 2059:: 2060 2061 pos: 0 2062 flags: 04002 2063 mnt_id: 9 2064 ino: 63107 2065 eventfd-count: 5a 2066 2067where 'eventfd-count' is hex value of a counter. 2068 2069Signalfd files 2070~~~~~~~~~~~~~~ 2071 2072:: 2073 2074 pos: 0 2075 flags: 04002 2076 mnt_id: 9 2077 ino: 63107 2078 sigmask: 0000000000000200 2079 2080where 'sigmask' is hex value of the signal mask associated 2081with a file. 2082 2083Epoll files 2084~~~~~~~~~~~ 2085 2086:: 2087 2088 pos: 0 2089 flags: 02 2090 mnt_id: 9 2091 ino: 63107 2092 tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7 2093 2094where 'tfd' is a target file descriptor number in decimal form, 2095'events' is events mask being watched and the 'data' is data 2096associated with a target [see epoll(7) for more details]. 2097 2098The 'pos' is current offset of the target file in decimal form 2099[see lseek(2)], 'ino' and 'sdev' are inode and device numbers 2100where target file resides, all in hex format. 2101 2102Fsnotify files 2103~~~~~~~~~~~~~~ 2104For inotify files the format is the following:: 2105 2106 pos: 0 2107 flags: 02000000 2108 mnt_id: 9 2109 ino: 63107 2110 inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d 2111 2112where 'wd' is a watch descriptor in decimal form, i.e. a target file 2113descriptor number, 'ino' and 'sdev' are inode and device where the 2114target file resides and the 'mask' is the mask of events, all in hex 2115form [see inotify(7) for more details]. 2116 2117If the kernel was built with exportfs support, the path to the target 2118file is encoded as a file handle. The file handle is provided by three 2119fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex 2120format. 2121 2122If the kernel is built without exportfs support the file handle won't be 2123printed out. 2124 2125If there is no inotify mark attached yet the 'inotify' line will be omitted. 2126 2127For fanotify files the format is:: 2128 2129 pos: 0 2130 flags: 02 2131 mnt_id: 9 2132 ino: 63107 2133 fanotify flags:10 event-flags:0 2134 fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003 2135 fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 2136 2137where fanotify 'flags' and 'event-flags' are values used in fanotify_init 2138call, 'mnt_id' is the mount point identifier, 'mflags' is the value of 2139flags associated with mark which are tracked separately from events 2140mask. 'ino' and 'sdev' are target inode and device, 'mask' is the events 2141mask and 'ignored_mask' is the mask of events which are to be ignored. 2142All are in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask' 2143provide information about flags and mask used in fanotify_mark 2144call [see fsnotify manpage for details]. 2145 2146While the first three lines are mandatory and always printed, the rest is 2147optional and may be omitted if no marks created yet. 2148 2149Timerfd files 2150~~~~~~~~~~~~~ 2151 2152:: 2153 2154 pos: 0 2155 flags: 02 2156 mnt_id: 9 2157 ino: 63107 2158 clockid: 0 2159 ticks: 0 2160 settime flags: 01 2161 it_value: (0, 49406829) 2162 it_interval: (1, 0) 2163 2164where 'clockid' is the clock type and 'ticks' is the number of the timer expirations 2165that have occurred [see timerfd_create(2) for details]. 'settime flags' are 2166flags in octal form been used to setup the timer [see timerfd_settime(2) for 2167details]. 'it_value' is remaining time until the timer expiration. 2168'it_interval' is the interval for the timer. Note the timer might be set up 2169with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' 2170still exhibits timer's remaining time. 2171 2172DMA Buffer files 2173~~~~~~~~~~~~~~~~ 2174 2175:: 2176 2177 pos: 0 2178 flags: 04002 2179 mnt_id: 9 2180 ino: 63107 2181 size: 32768 2182 count: 2 2183 exp_name: system-heap 2184 2185where 'size' is the size of the DMA buffer in bytes. 'count' is the file count of 2186the DMA buffer file. 'exp_name' is the name of the DMA buffer exporter. 2187 2188VFIO Device files 2189~~~~~~~~~~~~~~~~~ 2190 2191:: 2192 2193 pos: 0 2194 flags: 02000002 2195 mnt_id: 17 2196 ino: 5122 2197 vfio-device-syspath: /sys/devices/pci0000:e0/0000:e0:01.1/0000:e1:00.0/0000:e2:05.0/0000:e8:00.0 2198 2199where 'vfio-device-syspath' is the sysfs path corresponding to the VFIO device 2200file. 2201 22023.9 /proc/<pid>/map_files - Information about memory mapped files 2203--------------------------------------------------------------------- 2204This directory contains symbolic links which represent memory mapped files 2205the process is maintaining. Example output:: 2206 2207 | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so 2208 | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so 2209 | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so 2210 | ... 2211 | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1 2212 | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls 2213 2214The name of a link represents the virtual memory bounds of a mapping, i.e. 2215vm_area_struct::vm_start - vm_area_struct::vm_end. 2216 2217The main purpose of the map_files is to retrieve a set of memory mapped 2218files in a fast way instead of parsing /proc/<pid>/maps or 2219/proc/<pid>/smaps, both of which contain many more records. At the same 2220time one can open(2) mappings from the listings of two processes and 2221comparing their inode numbers to figure out which anonymous memory areas 2222are actually shared. 2223 22243.10 /proc/<pid>/timerslack_ns - Task timerslack value 2225--------------------------------------------------------- 2226This file provides the value of the task's timerslack value in nanoseconds. 2227This value specifies an amount of time that normal timers may be deferred 2228in order to coalesce timers and avoid unnecessary wakeups. 2229 2230This allows a task's interactivity vs power consumption tradeoff to be 2231adjusted. 2232 2233Writing 0 to the file will set the task's timerslack to the default value. 2234 2235Valid values are from 0 - ULLONG_MAX 2236 2237An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level 2238permissions on the task specified to change its timerslack_ns value. 2239 22403.11 /proc/<pid>/patch_state - Livepatch patch operation state 2241----------------------------------------------------------------- 2242When CONFIG_LIVEPATCH is enabled, this file displays the value of the 2243patch state for the task. 2244 2245A value of '-1' indicates that no patch is in transition. 2246 2247A value of '0' indicates that a patch is in transition and the task is 2248unpatched. If the patch is being enabled, then the task hasn't been 2249patched yet. If the patch is being disabled, then the task has already 2250been unpatched. 2251 2252A value of '1' indicates that a patch is in transition and the task is 2253patched. If the patch is being enabled, then the task has already been 2254patched. If the patch is being disabled, then the task hasn't been 2255unpatched yet. 2256 22573.12 /proc/<pid>/arch_status - task architecture specific status 2258------------------------------------------------------------------- 2259When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the 2260architecture specific status of the task. 2261 2262Example 2263~~~~~~~ 2264 2265:: 2266 2267 $ cat /proc/6753/arch_status 2268 AVX512_elapsed_ms: 8 2269 2270Description 2271~~~~~~~~~~~ 2272 2273x86 specific entries 2274~~~~~~~~~~~~~~~~~~~~~ 2275 2276AVX512_elapsed_ms 2277^^^^^^^^^^^^^^^^^^ 2278 2279 If AVX512 is supported on the machine, this entry shows the milliseconds 2280 elapsed since the last time AVX512 usage was recorded. The recording 2281 happens on a best effort basis when a task is scheduled out. This means 2282 that the value depends on two factors: 2283 2284 1) The time which the task spent on the CPU without being scheduled 2285 out. With CPU isolation and a single runnable task this can take 2286 several seconds. 2287 2288 2) The time since the task was scheduled out last. Depending on the 2289 reason for being scheduled out (time slice exhausted, syscall ...) 2290 this can be arbitrary long time. 2291 2292 As a consequence the value cannot be considered precise and authoritative 2293 information. The application which uses this information has to be aware 2294 of the overall scenario on the system in order to determine whether a 2295 task is a real AVX512 user or not. Precise information can be obtained 2296 with performance counters. 2297 2298 A special value of '-1' indicates that no AVX512 usage was recorded, thus 2299 the task is unlikely an AVX512 user, but depends on the workload and the 2300 scheduling scenario, it also could be a false negative mentioned above. 2301 23023.13 /proc/<pid>/fd - List of symlinks to open files 2303------------------------------------------------------- 2304This directory contains symbolic links which represent open files 2305the process is maintaining. Example output:: 2306 2307 lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null 2308 l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null 2309 lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]' 2310 lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]' 2311 lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]' 2312 2313The number of open files for the process is stored in 'size' member 2314of stat() output for /proc/<pid>/fd for fast access. 2315------------------------------------------------------- 2316 23173.14 /proc/<pid>/ksm_stat - Information about the process's ksm status 2318---------------------------------------------------------------------- 2319When CONFIG_KSM is enabled, each process has this file which displays 2320the information of ksm merging status. 2321 2322Example 2323~~~~~~~ 2324 2325:: 2326 2327 / # cat /proc/self/ksm_stat 2328 ksm_rmap_items 0 2329 ksm_zero_pages 0 2330 ksm_merging_pages 0 2331 ksm_process_profit 0 2332 ksm_merge_any: no 2333 ksm_mergeable: no 2334 2335Description 2336~~~~~~~~~~~ 2337 2338ksm_rmap_items 2339^^^^^^^^^^^^^^ 2340 2341The number of ksm_rmap_item structures in use. The structure 2342ksm_rmap_item stores the reverse mapping information for virtual 2343addresses. KSM will generate a ksm_rmap_item for each ksm-scanned page of 2344the process. 2345 2346ksm_zero_pages 2347^^^^^^^^^^^^^^ 2348 2349When /sys/kernel/mm/ksm/use_zero_pages is enabled, it represent how many 2350empty pages are merged with kernel zero pages by KSM. 2351 2352ksm_merging_pages 2353^^^^^^^^^^^^^^^^^ 2354 2355It represents how many pages of this process are involved in KSM merging 2356(not including ksm_zero_pages). It is the same with what 2357/proc/<pid>/ksm_merging_pages shows. 2358 2359ksm_process_profit 2360^^^^^^^^^^^^^^^^^^ 2361 2362The profit that KSM brings (Saved bytes). KSM can save memory by merging 2363identical pages, but also can consume additional memory, because it needs 2364to generate a number of rmap_items to save each scanned page's brief rmap 2365information. Some of these pages may be merged, but some may not be abled 2366to be merged after being checked several times, which are unprofitable 2367memory consumed. 2368 2369ksm_merge_any 2370^^^^^^^^^^^^^ 2371 2372It specifies whether the process's 'mm is added by prctl() into the 2373candidate list of KSM or not, and if KSM scanning is fully enabled at 2374process level. 2375 2376ksm_mergeable 2377^^^^^^^^^^^^^ 2378 2379It specifies whether any VMAs of the process''s mms are currently 2380applicable to KSM. 2381 2382More information about KSM can be found in 2383Documentation/admin-guide/mm/ksm.rst. 2384 2385 2386Chapter 4: Configuring procfs 2387============================= 2388 23894.1 Mount options 2390--------------------- 2391 2392The following mount options are supported: 2393 2394 ========= ======================================================== 2395 hidepid= Set /proc/<pid>/ access mode. 2396 gid= Set the group authorized to learn processes information. 2397 subset= Show only the specified subset of procfs. 2398 pidns= Specify a the namespace used by this procfs. 2399 ========= ======================================================== 2400 2401hidepid=off or hidepid=0 means classic mode - everybody may access all 2402/proc/<pid>/ directories (default). 2403 2404hidepid=noaccess or hidepid=1 means users may not access any /proc/<pid>/ 2405directories but their own. Sensitive files like cmdline, sched*, status are now 2406protected against other users. This makes it impossible to learn whether any 2407user runs specific program (given the program doesn't reveal itself by its 2408behaviour). As an additional bonus, as /proc/<pid>/cmdline is unaccessible for 2409other users, poorly written programs passing sensitive information via program 2410arguments are now protected against local eavesdroppers. 2411 2412hidepid=invisible or hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be 2413fully invisible to other users. It doesn't mean that it hides a fact whether a 2414process with a specific pid value exists (it can be learned by other means, e.g. 2415by "kill -0 $PID"), but it hides process's uid and gid, which may be learned by 2416stat()'ing /proc/<pid>/ otherwise. It greatly complicates an intruder's task of 2417gathering information about running processes, whether some daemon runs with 2418elevated privileges, whether other user runs some sensitive program, whether 2419other users run any program at all, etc. 2420 2421hidepid=ptraceable or hidepid=4 means that procfs should only contain 2422/proc/<pid>/ directories that the caller can ptrace. 2423 2424gid= defines a group authorized to learn processes information otherwise 2425prohibited by hidepid=. If you use some daemon like identd which needs to learn 2426information about processes information, just add identd to this group. 2427 2428subset=pid hides all top level files and directories in the procfs that 2429are not related to tasks. This option cannot be changed on an existing 2430procfs instance because overmounts that existed before the change could 2431otherwise remain reachable after the top level procfs entries are hidden. 2432 2433pidns= specifies a pid namespace (either as a string path to something like 2434`/proc/$pid/ns/pid`, or a file descriptor when using `FSCONFIG_SET_FD`) that 2435will be used by the procfs instance when translating pids. By default, procfs 2436will use the calling process's active pid namespace. Note that the pid 2437namespace of an existing procfs instance cannot be modified (attempting to do 2438so will give an `-EBUSY` error). 2439 24404.2 Mount restrictions 2441-------------------------- 2442 2443If user namespaces are in use, the kernel additionally checks the instances of 2444procfs available to the mounter and will not allow procfs to be mounted if: 2445 2446 1. This mount is not fully visible unless the new procfs is going to be 2447 mounted with subset=pid option. 2448 2449 a. Its root directory is not the root directory of the filesystem. 2450 b. If any file or non-empty procfs directory is hidden by another mount. 2451 2452 2. A new mount overrides the readonly option or any option from atime family. 2453 2454Chapter 5: Filesystem behavior 2455============================== 2456 2457Originally, before the advent of pid namespace, procfs was a global file 2458system. It means that there was only one procfs instance in the system. 2459 2460When pid namespace was added, a separate procfs instance was mounted in 2461each pid namespace. So, procfs mount options are global among all 2462mountpoints within the same namespace:: 2463 2464 # grep ^proc /proc/mounts 2465 proc /proc proc rw,relatime,hidepid=2 0 0 2466 2467 # strace -e mount mount -o hidepid=1 -t proc proc /tmp/proc 2468 mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0 2469 +++ exited with 0 +++ 2470 2471 # grep ^proc /proc/mounts 2472 proc /proc proc rw,relatime,hidepid=2 0 0 2473 proc /tmp/proc proc rw,relatime,hidepid=2 0 0 2474 2475and only after remounting procfs mount options will change at all 2476mountpoints:: 2477 2478 # mount -o remount,hidepid=1 -t proc proc /tmp/proc 2479 2480 # grep ^proc /proc/mounts 2481 proc /proc proc rw,relatime,hidepid=1 0 0 2482 proc /tmp/proc proc rw,relatime,hidepid=1 0 0 2483 2484This behavior is different from the behavior of other filesystems. 2485 2486The new procfs behavior is more like other filesystems. Each procfs mount 2487creates a new procfs instance. Mount options affect own procfs instance. 2488It means that it became possible to have several procfs instances 2489displaying tasks with different filtering options in one pid namespace:: 2490 2491 # mount -o hidepid=invisible -t proc proc /proc 2492 # mount -o hidepid=noaccess -t proc proc /tmp/proc 2493 # grep ^proc /proc/mounts 2494 proc /proc proc rw,relatime,hidepid=invisible 0 0 2495 proc /tmp/proc proc rw,relatime,hidepid=noaccess 0 0 2496