1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd April 6, 2007 28.Dt POLLING 4 29.Os 30.Sh NAME 31.Nm polling 32.Nd device polling support 33.Sh SYNOPSIS 34.Cd "options DEVICE_POLLING" 35.Sh DESCRIPTION 36Device polling 37.Nm ( 38for brevity) refers to a technique that 39lets the operating system periodically poll devices, instead of 40relying on the devices to generate interrupts when they need attention. 41This might seem inefficient and counterintuitive, but when done 42properly, 43.Nm 44gives more control to the operating system on 45when and how to handle devices, with a number of advantages in terms 46of system responsiveness and performance. 47.Pp 48In particular, 49.Nm 50reduces the overhead for context 51switches which is incurred when servicing interrupts, and 52gives more control on the scheduling of the CPU between various 53tasks (user processes, software interrupts, device handling) 54which ultimately reduces the chances of livelock in the system. 55.Ss Principles of Operation 56In the normal, interrupt-based mode, devices generate an interrupt 57whenever they need attention. 58This in turn causes a 59context switch and the execution of an interrupt handler 60which performs whatever processing is needed by the device. 61The duration of the interrupt handler is potentially unbounded 62unless the device driver has been programmed with real-time 63concerns in mind (which is generally not the case for 64.Fx 65drivers). 66Furthermore, under heavy traffic load, the system might be 67persistently processing interrupts without being able to 68complete other work, either in the kernel or in userland. 69.Pp 70Device polling disables interrupts by polling devices at appropriate 71times, i.e., on clock interrupts and within the idle loop. 72This way, the context switch overhead is removed. 73Furthermore, 74the operating system can control accurately how much work to spend 75in handling device events, and thus prevent livelock by reserving 76some amount of CPU to other tasks. 77.Pp 78Enabling 79.Nm 80also changes the way software network interrupts 81are scheduled, so there is never the risk of livelock because 82packets are not processed to completion. 83.Ss Enabling polling 84Currently only network interface drivers support the 85.Nm 86feature. 87It is turned on and off with help of 88.Xr ifconfig 8 89command. 90.Pp 91The historic 92.Va kern.polling.enable , 93which enabled polling for all interfaces, can be replaced with the following 94code: 95.Bd -literal 96for i in `ifconfig -l` ; 97 do ifconfig $i polling; # use -polling to disable 98done 99.Ed 100.Ss MIB Variables 101The operation of 102.Nm 103is controlled by the following 104.Xr sysctl 8 105MIB variables: 106.Pp 107.Bl -tag -width indent -compact 108.It Va kern.polling.user_frac 109When 110.Nm 111is enabled, and provided that there is some work to do, 112up to this percent of the CPU cycles is reserved to userland tasks, 113the remaining fraction being available for 114.Nm 115processing. 116Default is 50. 117.Pp 118.It Va kern.polling.burst 119Maximum number of packets grabbed from each network interface in 120each timer tick. 121This number is dynamically adjusted by the kernel, 122according to the programmed 123.Va user_frac , burst_max , 124CPU speed, and system load. 125.Pp 126.It Va kern.polling.each_burst 127The burst above is split into smaller chunks of this number of 128packets, going round-robin among all interfaces registered for 129.Nm . 130This prevents the case that a large burst from a single interface 131can saturate the IP interrupt queue 132.Pq Va net.inet.ip.intr_queue_maxlen . 133Default is 5. 134.Pp 135.It Va kern.polling.burst_max 136Upper bound for 137.Va kern.polling.burst . 138Note that when 139.Nm 140is enabled, each interface can receive at most 141.Pq Va HZ No * Va burst_max 142packets per second unless there are spare CPU cycles available for 143.Nm 144in the idle loop. 145This number should be tuned to match the expected load 146(which can be quite high with GigE cards). 147Default is 150 which is adequate for 100Mbit network and HZ=1000. 148.Pp 149.It Va kern.polling.idle_poll 150Controls if 151.Nm 152is enabled in the idle loop. 153There are no reasons (other than power saving or bugs in the scheduler's 154handling of idle priority kernel threads) to disable this. 155.Pp 156.It Va kern.polling.reg_frac 157Controls how often (every 158.Va reg_frac No / Va HZ 159seconds) the status registers of the device are checked for error 160conditions and the like. 161Increasing this value reduces the load on the bus, but also delays 162the error detection. 163Default is 20. 164.Pp 165.It Va kern.polling.handlers 166How many active devices have registered for 167.Nm . 168.Pp 169.It Va kern.polling.short_ticks 170.It Va kern.polling.lost_polls 171.It Va kern.polling.pending_polls 172.It Va kern.polling.residual_burst 173.It Va kern.polling.phase 174.It Va kern.polling.suspect 175.It Va kern.polling.stalled 176Debugging variables. 177.El 178.Sh SUPPORTED DEVICES 179Device polling requires explicit modifications to the device drivers. 180As of this writing, the 181.Xr bge 4 , 182.Xr dc 4 , 183.Xr em 4 , 184.Xr fwe 4 , 185.Xr fwip 4 , 186.Xr fxp 4 , 187.Xr igb 4 , 188.Xr nfe 4 , 189.Xr nge 4 , 190.Xr re 4 , 191.Xr rl 4 , 192.Xr sf 4 , 193.Xr sis 4 , 194.Xr ste 4 , 195.Xr stge 4 , 196.Xr vge 4 , 197.Xr vr 4 , 198and 199.Xr xl 4 200devices are supported, with others in the works. 201The modifications are rather straightforward, consisting in 202the extraction of the inner part of the interrupt service routine 203and writing a callback function, 204.Fn *_poll , 205which is invoked 206to probe the device for events and process them. 207(See the 208conditionally compiled sections of the devices mentioned above 209for more details.) 210.Pp 211As in the worst case the devices are only polled on clock interrupts, 212in order to reduce the latency in processing packets, it is not advisable 213to decrease the frequency of the clock below 1000 Hz. 214.Sh HISTORY 215Device polling first appeared in 216.Fx 4.6 217and 218.Fx 5.0 . 219.Sh AUTHORS 220Device polling was written by 221.An Luigi Rizzo Aq Mt luigi@iet.unipi.it . 222