1.\" Copyright (c) 2000-2001 John H. Baldwin <jhb@FreeBSD.org> 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR 13.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 14.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 15.\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, 16.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 17.\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 18.\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 19.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 20.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 21.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 22.\" 23.Dd November 3, 2000 24.Dt SCHEDULER 9 25.Os 26.Sh NAME 27.Nm curpriority_cmp , 28.Nm maybe_resched , 29.Nm resetpriority , 30.Nm roundrobin , 31.Nm roundrobin_interval , 32.Nm sched_setup , 33.Nm schedclock , 34.Nm schedcpu , 35.Nm setrunnable , 36.Nm updatepri 37.Nd perform round-robin scheduling of runnable processes 38.Sh SYNOPSIS 39.In sys/param.h 40.In sys/proc.h 41.Ft int 42.Fn curpriority_cmp "struct proc *p" 43.Ft void 44.Fn maybe_resched "struct thread *td" 45.Ft void 46.Fn propagate_priority "struct proc *p" 47.Ft void 48.Fn resetpriority "struct ksegrp *kg" 49.Ft void 50.Fn roundrobin "void *arg" 51.Ft int 52.Fn roundrobin_interval "void" 53.Ft void 54.Fn sched_setup "void *dummy" 55.Ft void 56.Fn schedclock "struct thread *td" 57.Ft void 58.Fn schedcpu "void *arg" 59.Ft void 60.Fn setrunnable "struct thread *td" 61.Ft void 62.Fn updatepri "struct thread *td" 63.Sh DESCRIPTION 64Each process has three different priorities stored in 65.Vt "struct proc" : 66.Va p_usrpri , 67.Va p_nativepri , 68and 69.Va p_priority . 70.Pp 71The 72.Va p_usrpri 73member is the user priority of the process calculated from a process' 74estimated CPU time and nice level. 75.Pp 76The 77.Va p_nativepri 78member is the saved priority used by 79.Fn propagate_priority . 80When a process obtains a mutex, its priority is saved in 81.Va p_nativepri . 82While it holds the mutex, the process's priority may be bumped by another 83process that blocks on the mutex. 84When the process releases the mutex, then its priority is restored to the 85priority saved in 86.Va p_nativepri . 87.Pp 88The 89.Va p_priority 90member is the actual priority of the process and is used to determine what 91.Xr runqueue 9 92it runs on, for example. 93.Pp 94The 95.Fn curpriority_cmp 96function compares the cached priority of the currently running process with 97process 98.Fa p . 99If the currently running process has a higher priority, then it will return 100a value less than zero. 101If the current process has a lower priority, then it will return a value 102greater than zero. 103If the current process has the same priority as 104.Fa p , 105then 106.Fn curpriority_cmp 107will return zero. 108The cached priority of the currently running process is updated when a process 109resumes from 110.Xr tsleep 9 111or returns to userland in 112.Fn userret 113and is stored in the private variable 114.Va curpriority . 115.Pp 116The 117.Fn maybe_resched 118function compares the priorities of the current thread and 119.Fa td . 120If 121.Fa td 122has a higher priority than the current thread, then a context switch is 123needed, and 124.Dv KEF_NEEDRESCHED 125is set. 126.Pp 127The 128.Fn propagate_priority 129looks at the process that owns the mutex 130.Fa p 131is blocked on. 132That process's priority is bumped to the priority of 133.Fa p 134if needed. 135If the process is currently running, then the function returns. 136If the process is on a 137.Xr runqueue 9 , 138then the process is moved to the appropriate 139.Xr runqueue 9 140for its new priority. 141If the process is blocked on a mutex, its position in the list of 142processes blocked on the mutex in question is updated to reflect its new 143priority. 144Then, the function repeats the procedure using the process that owns the 145mutex just encountered. 146Note that a process's priorities are only bumped to the priority of the 147original process 148.Fa p , 149not to the priority of the previously encountered process. 150.Pp 151The 152.Fn resetpriority 153function recomputes the user priority of the ksegrp 154.Fa kg 155(stored in 156.Va kg_user_pri ) 157and calls 158.Fn maybe_resched 159to force a reschedule of each thread in the group if needed. 160.Pp 161The 162.Fn roundrobin 163function is used as a 164.Xr timeout 9 165function to force a reschedule every 166.Va sched_quantum 167ticks. 168.Pp 169The 170.Fn roundrobin_interval 171function simply returns the number of clock ticks in between reschedules 172triggered by 173.Fn roundrobin . 174Thus, all it does is return the current value of 175.Va sched_quantum . 176.Pp 177The 178.Fn sched_setup 179function is a 180.Xr SYSINIT 9 181that is called to start the callout driven scheduler functions. 182It just calls the 183.Fn roundrobin 184and 185.Fn schedcpu 186functions for the first time. 187After the initial call, the two functions will propagate themselves by 188registering their callout event again at the completion of the respective 189function. 190.Pp 191The 192.Fn schedclock 193function is called by 194.Fn statclock 195to adjust the priority of the currently running thread's ksegrp. 196It updates the group's estimated CPU time and then adjusts the priority via 197.Fn resetpriority . 198.Pp 199The 200.Fn schedcpu 201function updates all process priorities. 202First, it updates statistics that track how long processes have been in various 203process states. 204Secondly, it updates the estimated CPU time for the current process such 205that about 90% of the CPU usage is forgotten in 5 * load average seconds. 206For example, if the load average is 2.00, 207then at least 90% of the estimated CPU time for the process should be based 208on the amount of CPU time the process has had in the last 10 seconds. 209It then recomputes the priority of the process and moves it to the 210appropriate 211.Xr runqueue 9 212if necessary. 213Thirdly, it updates the %CPU estimate used by utilities such as 214.Xr ps 1 215and 216.Xr top 1 217so that 95% of the CPU usage is forgotten in 60 seconds. 218Once all process priorities have been updated, 219.Fn schedcpu 220calls 221.Fn vmmeter 222to update various other statistics including the load average. 223Finally, it schedules itself to run again in 224.Va hz 225clock ticks. 226.Pp 227The 228.Fn setrunnable 229function is used to change a process's state to be runnable. 230The process is placed on a 231.Xr runqueue 9 232if needed, and the swapper process is woken up and told to swap the process in 233if the process is swapped out. 234If the process has been asleep for at least one run of 235.Fn schedcpu , 236then 237.Fn updatepri 238is used to adjust the priority of the process. 239.Pp 240The 241.Fn updatepri 242function is used to adjust the priority of a process that has been asleep. 243It retroactively decays the estimated CPU time of the process for each 244.Fn schedcpu 245event that the process was asleep. 246Finally, it calls 247.Fn resetpriority 248to adjust the priority of the process. 249.Sh SEE ALSO 250.Xr mi_switch 9 , 251.Xr runqueue 9 , 252.Xr sleepqueue 9 , 253.Xr tsleep 9 254.Sh BUGS 255The 256.Va curpriority 257variable really should be per-CPU. 258In addition, 259.Fn maybe_resched 260should compare the priority of 261.Fa chk 262with that of each CPU, and then send an IPI to the processor with the lowest 263priority to trigger a reschedule if needed. 264.Pp 265Priority propagation is broken and is thus disabled by default. 266The 267.Va p_nativepri 268variable is only updated if a process does not obtain a sleep mutex on the 269first try. 270Also, if a process obtains more than one sleep mutex in this manner, and 271had its priority bumped in between, then 272.Va p_nativepri 273will be clobbered. 274