1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd April 5, 2004 28.Dt POLLING 4 29.Os 30.Sh NAME 31.Nm polling 32.Nd device polling support 33.Sh SYNOPSIS 34.Cd "options DEVICE_POLLING" 35.Cd "options HZ=1000" 36.Sh DESCRIPTION 37Device polling 38.Nm ( 39for brevity) refers to a technique to 40handle devices that does not rely on the latter to generate 41interrupts when they need attention, but rather lets the CPU poll 42devices to service their needs. 43This might seem inefficient and counterintuitive, but when done 44properly, 45.Nm 46gives more control to the operating system on 47when and how to handle devices, with a number of advantages in terms 48of system responsiveness and performance. 49.Pp 50In particular, 51.Nm 52reduces the overhead for context 53switches which is incurred when servicing interrupts, and 54gives more control on the scheduling of the CPU between various 55tasks (user processes, software interrupts, device handling) 56which ultimately reduces the chances of livelock in the system. 57.Ss Principles of Operation 58In the normal, interrupt-based mode, devices generate an interrupt 59whenever they need attention. 60This in turn causes a 61context switch and the execution of an interrupt handler 62which performs whatever processing is needed by the device. 63The duration of the interrupt handler is potentially unbounded 64unless the device driver has been programmed with real-time 65concerns in mind (which is generally not the case for 66.Fx 67drivers). 68Furthermore, under heavy traffic load, the system might be 69persistently processing interrupts without being able to 70complete other work, either in the kernel or in userland. 71.Pp 72Device polling disables interrupts by polling devices at appropriate 73times, i.e., on clock interrupts, system calls and within the idle loop. 74This way, the context switch overhead is removed. 75Furthermore, 76the operating system can control accurately how much work to spend 77in handling device events, and thus prevent livelock by reserving 78some amount of CPU to other tasks. 79.Pp 80Enabling 81.Nm 82also changes the way software network interrupts 83are scheduled, so there is never the risk of livelock because 84packets are not processed to completion. 85.Ss MIB Variables 86The operation of 87.Nm 88is controlled by the following 89.Xr sysctl 8 90MIB variables: 91.Pp 92.Bl -tag -width indent -compact 93.It Va kern.polling.enable 94If set to non-zero, 95.Nm 96is enabled. 97Default is disabled. 98.Pp 99.It Va kern.polling.user_frac 100When 101.Nm 102is enabled, and provided that there is some work to do, 103up to this percent of the CPU cycles is reserved to userland tasks, 104the remaining fraction being available for 105.Nm 106processing. 107Default is 50. 108.Pp 109.It Va kern.polling.burst 110Maximum number of packets grabbed from each network interface in 111each timer tick. 112This number is dynamically adjusted by the kernel, 113according to the programmed 114.Va user_frac , burst_max , 115CPU speed, and system load. 116.Pp 117.It Va kern.polling.each_burst 118The burst above is split into smaller chunks of this number of 119packets, going round-robin among all interfaces registered for 120.Nm . 121This prevents the case that a large burst from a single interface 122can saturate the IP interrupt queue 123.Pq Va net.inet.ip.intr_queue_maxlen . 124Default is 5. 125.Pp 126.It Va kern.polling.burst_max 127Upper bound for 128.Va kern.polling.burst . 129Note that when 130.Nm 131is enabled, each interface can receive at most 132.Pq Va HZ No * Va burst_max 133packets per second unless there are spare CPU cycles available for 134.Nm 135in the idle loop. 136This number should be tuned to match the expected load 137(which can be quite high with GigE cards). 138Default is 150 which is adequate for 100Mbit network and HZ=1000. 139.Pp 140.It Va kern.polling.idle_poll 141Controls if 142.Nm 143is enabled in the idle loop. 144There are no reasons (other than power saving or bugs in the scheduler's 145handling of idle priority kernel threads) to disable this. 146Note that -CURRENT apparently has some problems in this respect now, 147so default is disabled. 148.Pp 149.It Va kern.polling.poll_in_trap 150Controls if 151.Nm 152is enabled during hardware traps. 153Enabling this can be useful to improve the network responsiveness 154of boxes with 100% CPU usage. 155Default is disabled. 156.Pp 157.It Va kern.polling.reg_frac 158Controls how often (every 159.Va reg_frac No / Va HZ 160seconds) the status registers of the device are checked for error 161conditions and the like. 162Increasing this value reduces the load on the bus, but also delays 163the error detection. 164Default is 20. 165.Pp 166.It Va kern.polling.handlers 167How many active devices have registered for 168.Nm . 169.Pp 170.It Va kern.polling.short_ticks 171.It Va kern.polling.lost_polls 172.It Va kern.polling.pending_polls 173.It Va kern.polling.residual_burst 174.It Va kern.polling.phase 175.It Va kern.polling.suspect 176.It Va kern.polling.stalled 177Debugging variables. 178.El 179.Sh SUPPORTED DEVICES 180Device polling requires explicit modifications to the device drivers. 181As of this writing, the 182.Xr dc 4 , 183.Xr em 4 , 184.Xr fwe 4 , 185.Xr fxp 4 , 186.Xr nge 4 , 187.Xr re 4 , 188.Xr rl 4 , 189.Xr sis 4 , 190.Xr ste 4 , 191and 192.Xr vr 4 193devices are supported, with others in the works. 194The modifications are rather straightforward, consisting in 195the extraction of the inner part of the interrupt service routine 196and writing a callback function, 197.Fn *_poll , 198which is invoked 199to probe the device for events and process them. 200(See the 201conditionally compiled sections of the devices mentioned above 202for more details.) 203.Pp 204As in the worst case the devices are only polled on 205clock interrupts, in order to reduce the latency in processing 206packets, it is advisable to increase the frequency of the clock 207to at least 1000 HZ. 208.Sh HISTORY 209Device polling first appeared in 210.Fx 4.6 211and 212.Fx 5.0 . 213.Sh AUTHORS 214Device polling was written by 215.An Luigi Rizzo Aq luigi@iet.unipi.it . 216