1==================== 2Scheduler Statistics 3==================== 4 5Version 17 of schedstats removed 'lb_imbalance' field as it has no 6significance anymore and instead added more relevant fields namely 7'lb_imbalance_load', 'lb_imbalance_util', 'lb_imbalance_task' and 8'lb_imbalance_misfit'. The domain field prints the name of the 9corresponding sched domain from this version onwards. 10 11Version 16 of schedstats changed the order of definitions within 12'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES] 13columns in show_schedstat(). In particular the position of CPU_IDLE 14and __CPU_NOT_IDLE changed places. The size of the array is unchanged. 15 16Version 15 of schedstats dropped counters for some sched_yield: 17yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is 18identical to version 14. Details are available at 19 20 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-stats.txt?id=1e1dbb259c79b 21 22Version 14 of schedstats includes support for sched_domains, which hit the 23mainline kernel in 2.6.20 although it is identical to the stats from version 2412 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel 25release). Some counters make more sense to be per-runqueue; other to be 26per-domain. Note that domains (and their associated information) will only 27be pertinent and available on machines utilizing CONFIG_SMP. 28 29In version 14 of schedstat, there is at least one level of domain 30statistics for each cpu listed, and there may well be more than one 31domain. Domains have no particular names in this implementation, but 32the highest numbered one typically arbitrates balancing across all the 33cpus on the machine, while domain0 is the most tightly focused domain, 34sometimes balancing only between pairs of cpus. At this time, there 35are no architectures which need more than three domain levels. The first 36field in the domain stats is a bit map indicating which cpus are affected 37by that domain. Details are available at 38 39 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/sched-stats.txt?id=b762f3ffb797c 40 41The schedstat documentation is maintained version 10 onwards and is not 42updated for version 11 and 12. The details for version 10 are available at 43 44 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/sched-stats.txt?id=1da177e4c3f4 45 46These fields are counters, and only increment. Programs which make use 47of these will need to start with a baseline observation and then calculate 48the change in the counters at each subsequent observation. A perl script 49which does this for many of the fields is available at 50 51 http://eaglet.pdxhosts.com/rick/linux/schedstat/ 52 53Note that any such script will necessarily be version-specific, as the main 54reason to change versions is changes in the output format. For those wishing 55to write their own scripts, the fields are described here. 56 57CPU statistics 58-------------- 59cpu<N> 1 2 3 4 5 6 7 8 9 60 61First field is a sched_yield() statistic: 62 63 1) # of times sched_yield() was called 64 65Next three are schedule() statistics: 66 67 2) This field is a legacy array expiration count field used in the O(1) 68 scheduler. We kept it for ABI compatibility, but it is always set to zero. 69 3) # of times schedule() was called 70 4) # of times schedule() left the processor idle 71 72Next two are try_to_wake_up() statistics: 73 74 5) # of times try_to_wake_up() was called 75 6) # of times try_to_wake_up() was called to wake up the local cpu 76 77Next three are statistics describing scheduling latency: 78 79 7) sum of all time spent running by tasks on this processor (in nanoseconds) 80 8) sum of all time spent waiting to run by tasks on this processor (in 81 nanoseconds) 82 9) # of timeslices run on this cpu 83 84 85Domain statistics 86----------------- 87One of these is produced per domain for each cpu described. (Note that if 88CONFIG_SMP is not defined, *no* domains are utilized and these lines 89will not appear in the output. <name> is an extension to the domain field 90that prints the name of the corresponding sched domain. It can appear in 91schedstat version 17 and above, and requires CONFIG_SCHED_DEBUG.) 92 93domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 94 95The first field is a bit mask indicating what cpus this domain operates over. 96 97The next 33 are a variety of sched_balance_rq() statistics in grouped into types 98of idleness (busy, idle and newly idle): 99 100 1) # of times in this domain sched_balance_rq() was called when the 101 cpu was busy 102 2) # of times in this domain sched_balance_rq() checked but found the 103 load did not require balancing when busy 104 3) # of times in this domain sched_balance_rq() tried to move one or 105 more tasks and failed, when the cpu was busy 106 4) Total imbalance in load when the cpu was busy 107 5) Total imbalance in utilization when the cpu was busy 108 6) Total imbalance in number of tasks when the cpu was busy 109 7) Total imbalance due to misfit tasks when the cpu was busy 110 8) # of times in this domain pull_task() was called when busy 111 9) # of times in this domain pull_task() was called even though the 112 target task was cache-hot when busy 113 10) # of times in this domain sched_balance_rq() was called but did not 114 find a busier queue while the cpu was busy 115 11) # of times in this domain a busier queue was found while the cpu 116 was busy but no busier group was found 117 118 12) # of times in this domain sched_balance_rq() was called when the 119 cpu was idle 120 13) # of times in this domain sched_balance_rq() checked but found 121 the load did not require balancing when the cpu was idle 122 14) # of times in this domain sched_balance_rq() tried to move one or 123 more tasks and failed, when the cpu was idle 124 15) Total imbalance in load when the cpu was idle 125 16) Total imbalance in utilization when the cpu was idle 126 17) Total imbalance in number of tasks when the cpu was idle 127 18) Total imbalance due to misfit tasks when the cpu was idle 128 19) # of times in this domain pull_task() was called when the cpu 129 was idle 130 20) # of times in this domain pull_task() was called even though 131 the target task was cache-hot when idle 132 21) # of times in this domain sched_balance_rq() was called but did 133 not find a busier queue while the cpu was idle 134 22) # of times in this domain a busier queue was found while the 135 cpu was idle but no busier group was found 136 137 23) # of times in this domain sched_balance_rq() was called when the 138 was just becoming idle 139 24) # of times in this domain sched_balance_rq() checked but found the 140 load did not require balancing when the cpu was just becoming idle 141 25) # of times in this domain sched_balance_rq() tried to move one or more 142 tasks and failed, when the cpu was just becoming idle 143 26) Total imbalance in load when the cpu was just becoming idle 144 27) Total imbalance in utilization when the cpu was just becoming idle 145 28) Total imbalance in number of tasks when the cpu was just becoming idle 146 29) Total imbalance due to misfit tasks when the cpu was just becoming idle 147 30) # of times in this domain pull_task() was called when newly idle 148 31) # of times in this domain pull_task() was called even though the 149 target task was cache-hot when just becoming idle 150 32) # of times in this domain sched_balance_rq() was called but did not 151 find a busier queue while the cpu was just becoming idle 152 33) # of times in this domain a busier queue was found while the cpu 153 was just becoming idle but no busier group was found 154 155 Next three are active_load_balance() statistics: 156 157 34) # of times active_load_balance() was called 158 35) # of times active_load_balance() tried to move a task and failed 159 36) # of times active_load_balance() successfully moved a task 160 161 Next three are sched_balance_exec() statistics: 162 163 37) sbe_cnt is not used 164 38) sbe_balanced is not used 165 39) sbe_pushed is not used 166 167 Next three are sched_balance_fork() statistics: 168 169 40) sbf_cnt is not used 170 41) sbf_balanced is not used 171 42) sbf_pushed is not used 172 173 Next three are try_to_wake_up() statistics: 174 175 43) # of times in this domain try_to_wake_up() awoke a task that 176 last ran on a different cpu in this domain 177 44) # of times in this domain try_to_wake_up() moved a task to the 178 waking cpu because it was cache-cold on its own cpu anyway 179 45) # of times in this domain try_to_wake_up() started passive balancing 180 181/proc/<pid>/schedstat 182--------------------- 183schedstats also adds a new /proc/<pid>/schedstat file to include some of 184the same information on a per-process level. There are three fields in 185this file correlating for that process to: 186 187 1) time spent on the cpu (in nanoseconds) 188 2) time spent waiting on a runqueue (in nanoseconds) 189 3) # of timeslices run on this cpu 190 191A program could be easily written to make use of these extra fields to 192report on how well a particular process or set of processes is faring 193under the scheduler's policies. A simple version of such a program is 194available at 195 196 http://eaglet.pdxhosts.com/rick/linux/schedstat/v12/latency.c 197