1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd April 6, 2007 28.Dt POLLING 4 29.Os 30.Sh NAME 31.Nm polling 32.Nd device polling support 33.Sh SYNOPSIS 34.Cd "options DEVICE_POLLING" 35.Sh DESCRIPTION 36Device polling 37.Nm ( 38for brevity) refers to a technique that 39lets the operating system periodically poll devices, instead of 40relying on the devices to generate interrupts when they need attention. 41This might seem inefficient and counterintuitive, but when done 42properly, 43.Nm 44gives more control to the operating system on 45when and how to handle devices, with a number of advantages in terms 46of system responsiveness and performance. 47.Pp 48In particular, 49.Nm 50reduces the overhead for context 51switches which is incurred when servicing interrupts, and 52gives more control on the scheduling of the CPU between various 53tasks (user processes, software interrupts, device handling) 54which ultimately reduces the chances of livelock in the system. 55.Ss Principles of Operation 56In the normal, interrupt-based mode, devices generate an interrupt 57whenever they need attention. 58This in turn causes a 59context switch and the execution of an interrupt handler 60which performs whatever processing is needed by the device. 61The duration of the interrupt handler is potentially unbounded 62unless the device driver has been programmed with real-time 63concerns in mind (which is generally not the case for 64.Fx 65drivers). 66Furthermore, under heavy traffic load, the system might be 67persistently processing interrupts without being able to 68complete other work, either in the kernel or in userland. 69.Pp 70Device polling disables interrupts by polling devices at appropriate 71times, i.e., on clock interrupts and within the idle loop. 72This way, the context switch overhead is removed. 73Furthermore, 74the operating system can control accurately how much work to spend 75in handling device events, and thus prevent livelock by reserving 76some amount of CPU to other tasks. 77.Pp 78Enabling 79.Nm 80also changes the way software network interrupts 81are scheduled, so there is never the risk of livelock because 82packets are not processed to completion. 83.Ss Enabling polling 84Currently only network interface drivers support the 85.Nm 86feature. 87It is turned on and off with help of 88.Xr ifconfig 8 89command. 90.Ss MIB Variables 91The operation of 92.Nm 93is controlled by the following 94.Xr sysctl 8 95MIB variables: 96.Pp 97.Bl -tag -width indent -compact 98.It Va kern.polling.user_frac 99When 100.Nm 101is enabled, and provided that there is some work to do, 102up to this percent of the CPU cycles is reserved to userland tasks, 103the remaining fraction being available for 104.Nm 105processing. 106Default is 50. 107.Pp 108.It Va kern.polling.burst 109Maximum number of packets grabbed from each network interface in 110each timer tick. 111This number is dynamically adjusted by the kernel, 112according to the programmed 113.Va user_frac , burst_max , 114CPU speed, and system load. 115.Pp 116.It Va kern.polling.each_burst 117The burst above is split into smaller chunks of this number of 118packets, going round-robin among all interfaces registered for 119.Nm . 120This prevents the case that a large burst from a single interface 121can saturate the IP interrupt queue 122.Pq Va net.inet.ip.intr_queue_maxlen . 123Default is 5. 124.Pp 125.It Va kern.polling.burst_max 126Upper bound for 127.Va kern.polling.burst . 128Note that when 129.Nm 130is enabled, each interface can receive at most 131.Pq Va HZ No * Va burst_max 132packets per second unless there are spare CPU cycles available for 133.Nm 134in the idle loop. 135This number should be tuned to match the expected load 136(which can be quite high with GigE cards). 137Default is 150 which is adequate for 100Mbit network and HZ=1000. 138.Pp 139.It Va kern.polling.idle_poll 140Controls if 141.Nm 142is enabled in the idle loop. 143There are no reasons (other than power saving or bugs in the scheduler's 144handling of idle priority kernel threads) to disable this. 145.Pp 146.It Va kern.polling.reg_frac 147Controls how often (every 148.Va reg_frac No / Va HZ 149seconds) the status registers of the device are checked for error 150conditions and the like. 151Increasing this value reduces the load on the bus, but also delays 152the error detection. 153Default is 20. 154.Pp 155.It Va kern.polling.handlers 156How many active devices have registered for 157.Nm . 158.Pp 159.It Va kern.polling.enable 160Legacy MIB, that was used to enable or disable polling globally. 161Currently if set to 1, 162.Nm 163is enabled on all capable interfaces. 164If set to 0, 165.Nm 166is disabled on all interfaces. 167.Pp 168.It Va kern.polling.short_ticks 169.It Va kern.polling.lost_polls 170.It Va kern.polling.pending_polls 171.It Va kern.polling.residual_burst 172.It Va kern.polling.phase 173.It Va kern.polling.suspect 174.It Va kern.polling.stalled 175Debugging variables. 176.El 177.Sh SUPPORTED DEVICES 178Device polling requires explicit modifications to the device drivers. 179As of this writing, the 180.Xr bge 4 , 181.Xr dc 4 , 182.Xr em 4 , 183.Xr fwe 4 , 184.Xr fwip 4 , 185.Xr fxp 4 , 186.Xr ixgb 4 , 187.Xr nfe 4 , 188.Xr nge 4 , 189.Xr re 4 , 190.Xr rl 4 , 191.Xr sf 4 , 192.Xr sis 4 , 193.Xr ste 4 , 194.Xr stge 4 , 195.Xr vge 4 , 196.Xr vr 4 , 197and 198.Xr xl 4 199devices are supported, with others in the works. 200The modifications are rather straightforward, consisting in 201the extraction of the inner part of the interrupt service routine 202and writing a callback function, 203.Fn *_poll , 204which is invoked 205to probe the device for events and process them. 206(See the 207conditionally compiled sections of the devices mentioned above 208for more details.) 209.Pp 210As in the worst case the devices are only polled on clock interrupts, 211in order to reduce the latency in processing packets, it is not advisable 212to decrease the frequency of the clock below 1000 Hz. 213.Sh HISTORY 214Device polling first appeared in 215.Fx 4.6 216and 217.Fx 5.0 . 218.Sh AUTHORS 219Device polling was written by 220.An Luigi Rizzo Aq luigi@iet.unipi.it . 221