1c3123552SMauro Carvalho Chehab================ 2c3123552SMauro Carvalho ChehabDelay accounting 3c3123552SMauro Carvalho Chehab================ 4c3123552SMauro Carvalho Chehab 5c3123552SMauro Carvalho ChehabTasks encounter delays in execution when they wait 6c3123552SMauro Carvalho Chehabfor some kernel resource to become available e.g. a 7c3123552SMauro Carvalho Chehabrunnable task may wait for a free CPU to run on. 8c3123552SMauro Carvalho Chehab 9c3123552SMauro Carvalho ChehabThe per-task delay accounting functionality measures 10c3123552SMauro Carvalho Chehabthe delays experienced by a task while 11c3123552SMauro Carvalho Chehab 12c3123552SMauro Carvalho Chehaba) waiting for a CPU (while being runnable) 13c3123552SMauro Carvalho Chehabb) completion of synchronous block I/O initiated by the task 14c3123552SMauro Carvalho Chehabc) swapping in pages 15c3123552SMauro Carvalho Chehabd) memory reclaim 16f347c9d2SYang Yange) thrashing 17ec710aa8Swangyongf) direct compact 18662ce1dcSYang Yangg) write-protect copy 19*a3b2aeacSYang Yangh) IRQ/SOFTIRQ 20c3123552SMauro Carvalho Chehab 21c3123552SMauro Carvalho Chehaband makes these statistics available to userspace through 22c3123552SMauro Carvalho Chehabthe taskstats interface. 23c3123552SMauro Carvalho Chehab 24c3123552SMauro Carvalho ChehabSuch delays provide feedback for setting a task's cpu priority, 25c3123552SMauro Carvalho Chehabio priority and rss limit values appropriately. Long delays for 26c3123552SMauro Carvalho Chehabimportant tasks could be a trigger for raising its corresponding priority. 27c3123552SMauro Carvalho Chehab 28c3123552SMauro Carvalho ChehabThe functionality, through its use of the taskstats interface, also provides 29c3123552SMauro Carvalho Chehabdelay statistics aggregated for all tasks (or threads) belonging to a 30c3123552SMauro Carvalho Chehabthread group (corresponding to a traditional Unix process). This is a commonly 31c3123552SMauro Carvalho Chehabneeded aggregation that is more efficiently done by the kernel. 32c3123552SMauro Carvalho Chehab 33c3123552SMauro Carvalho ChehabUserspace utilities, particularly resource management applications, can also 34c3123552SMauro Carvalho Chehabaggregate delay statistics into arbitrary groups. To enable this, delay 35c3123552SMauro Carvalho Chehabstatistics of a task are available both during its lifetime as well as on its 36c3123552SMauro Carvalho Chehabexit, ensuring continuous and complete monitoring can be done. 37c3123552SMauro Carvalho Chehab 38c3123552SMauro Carvalho Chehab 39c3123552SMauro Carvalho ChehabInterface 40c3123552SMauro Carvalho Chehab--------- 41c3123552SMauro Carvalho Chehab 42c3123552SMauro Carvalho ChehabDelay accounting uses the taskstats interface which is described 43c3123552SMauro Carvalho Chehabin detail in a separate document in this directory. Taskstats returns a 44c3123552SMauro Carvalho Chehabgeneric data structure to userspace corresponding to per-pid and per-tgid 45c3123552SMauro Carvalho Chehabstatistics. The delay accounting functionality populates specific fields of 46c3123552SMauro Carvalho Chehabthis structure. See 47c3123552SMauro Carvalho Chehab 48ec710aa8Swangyong include/uapi/linux/taskstats.h 49c3123552SMauro Carvalho Chehab 50c3123552SMauro Carvalho Chehabfor a description of the fields pertaining to delay accounting. 51c3123552SMauro Carvalho ChehabIt will generally be in the form of counters returning the cumulative 52ec710aa8Swangyongdelay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page 53*a3b2aeacSYang Yangcache, direct compact, write-protect copy, IRQ/SOFTIRQ etc. 54c3123552SMauro Carvalho Chehab 55c3123552SMauro Carvalho ChehabTaking the difference of two successive readings of a given 56c3123552SMauro Carvalho Chehabcounter (say cpu_delay_total) for a task will give the delay 57c3123552SMauro Carvalho Chehabexperienced by the task waiting for the corresponding resource 58c3123552SMauro Carvalho Chehabin that interval. 59c3123552SMauro Carvalho Chehab 60c3123552SMauro Carvalho ChehabWhen a task exits, records containing the per-task statistics 61c3123552SMauro Carvalho Chehabare sent to userspace without requiring a command. If it is the last exiting 62c3123552SMauro Carvalho Chehabtask of a thread group, the per-tgid statistics are also sent. More details 63c3123552SMauro Carvalho Chehabare given in the taskstats interface description. 64c3123552SMauro Carvalho Chehab 65c3123552SMauro Carvalho ChehabThe getdelays.c userspace utility in tools/accounting directory allows simple 66c3123552SMauro Carvalho Chehabcommands to be run and the corresponding delay statistics to be displayed. It 67c3123552SMauro Carvalho Chehabalso serves as an example of using the taskstats interface. 68c3123552SMauro Carvalho Chehab 69c3123552SMauro Carvalho ChehabUsage 70c3123552SMauro Carvalho Chehab----- 71c3123552SMauro Carvalho Chehab 72c3123552SMauro Carvalho ChehabCompile the kernel with:: 73c3123552SMauro Carvalho Chehab 74c3123552SMauro Carvalho Chehab CONFIG_TASK_DELAY_ACCT=y 75c3123552SMauro Carvalho Chehab CONFIG_TASKSTATS=y 76c3123552SMauro Carvalho Chehab 77e4042ad4SPeter ZijlstraDelay accounting is disabled by default at boot up. 78e4042ad4SPeter ZijlstraTo enable, add:: 79c3123552SMauro Carvalho Chehab 80e4042ad4SPeter Zijlstra delayacct 81c3123552SMauro Carvalho Chehab 820cd7c741SPeter Zijlstrato the kernel boot options. The rest of the instructions below assume this has 830cd7c741SPeter Zijlstrabeen done. Alternatively, use sysctl kernel.task_delayacct to switch the state 840cd7c741SPeter Zijlstraat runtime. Note however that only tasks started after enabling it will have 850cd7c741SPeter Zijlstradelayacct information. 86c3123552SMauro Carvalho Chehab 87c3123552SMauro Carvalho ChehabAfter the system has booted up, use a utility 88c3123552SMauro Carvalho Chehabsimilar to getdelays.c to access the delays 89c3123552SMauro Carvalho Chehabseen by a given task or a task group (tgid). 90c3123552SMauro Carvalho ChehabThe utility also allows a given command to be 91c3123552SMauro Carvalho Chehabexecuted and the corresponding delays to be 92c3123552SMauro Carvalho Chehabseen. 93c3123552SMauro Carvalho Chehab 94c3123552SMauro Carvalho ChehabGeneral format of the getdelays command:: 95c3123552SMauro Carvalho Chehab 96ec710aa8Swangyong getdelays [-dilv] [-t tgid] [-p pid] 97c3123552SMauro Carvalho Chehab 98c3123552SMauro Carvalho ChehabGet delays, since system boot, for pid 10:: 99c3123552SMauro Carvalho Chehab 100ec710aa8Swangyong # ./getdelays -d -p 10 101c3123552SMauro Carvalho Chehab (output similar to next case) 102c3123552SMauro Carvalho Chehab 103c3123552SMauro Carvalho ChehabGet sum of delays, since system boot, for all pids with tgid 5:: 104c3123552SMauro Carvalho Chehab 105ec710aa8Swangyong # ./getdelays -d -t 5 106ec710aa8Swangyong print delayacct stats ON 107ec710aa8Swangyong TGID 5 108c3123552SMauro Carvalho Chehab 109c3123552SMauro Carvalho Chehab 110ec710aa8Swangyong CPU count real total virtual total delay total delay average 111ec710aa8Swangyong 8 7000000 6872122 3382277 0.423ms 112ec710aa8Swangyong IO count delay total delay average 113eca7de7cSWang Yong 0 0 0.000ms 114ec710aa8Swangyong SWAP count delay total delay average 115eca7de7cSWang Yong 0 0 0.000ms 116ec710aa8Swangyong RECLAIM count delay total delay average 117eca7de7cSWang Yong 0 0 0.000ms 118ec710aa8Swangyong THRASHING count delay total delay average 119eca7de7cSWang Yong 0 0 0.000ms 120ec710aa8Swangyong COMPACT count delay total delay average 121eca7de7cSWang Yong 0 0 0.000ms 122662ce1dcSYang Yang WPCOPY count delay total delay average 123eca7de7cSWang Yong 0 0 0.000ms 124*a3b2aeacSYang Yang IRQ count delay total delay average 125*a3b2aeacSYang Yang 0 0 0.000ms 126c3123552SMauro Carvalho Chehab 127ec710aa8SwangyongGet IO accounting for pid 1, it works only with -p:: 128c3123552SMauro Carvalho Chehab 129ec710aa8Swangyong # ./getdelays -i -p 1 130ec710aa8Swangyong printing IO accounting 131ec710aa8Swangyong linuxrc: read=65536, write=0, cancelled_write=0 132c3123552SMauro Carvalho Chehab 133ec710aa8SwangyongThe above command can be used with -v to get more debug information. 134