xref: /freebsd/share/man/man4/polling.4 (revision 7afc53b8dfcc7d5897920ce6cc7e842fbb4ab813)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd March 26, 2005
28.Dt POLLING 4
29.Os
30.Sh NAME
31.Nm polling
32.Nd device polling support
33.Sh SYNOPSIS
34.Cd "options DEVICE_POLLING"
35.Cd "options HZ=1000"
36.Sh DESCRIPTION
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
47of system responsiveness and performance.
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
53gives more control on the scheduling of the CPU between various
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
56.Ss Principles of Operation
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
65.Fx
66drivers).
67Furthermore, under heavy traffic load, the system might be
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
71Device polling disables interrupts by polling devices at appropriate
72times, i.e., on clock interrupts, system calls and within the idle loop.
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss MIB Variables
85The operation of
86.Nm
87is controlled by the following
88.Xr sysctl 8
89MIB variables:
90.Pp
91.Bl -tag -width indent -compact
92.It Va kern.polling.enable
93If set to non-zero,
94.Nm
95is enabled.
96Default is disabled.
97.Pp
98.It Va kern.polling.user_frac
99When
100.Nm
101is enabled, and provided that there is some work to do,
102up to this percent of the CPU cycles is reserved to userland tasks,
103the remaining fraction being available for
104.Nm
105processing.
106Default is 50.
107.Pp
108.It Va kern.polling.burst
109Maximum number of packets grabbed from each network interface in
110each timer tick.
111This number is dynamically adjusted by the kernel,
112according to the programmed
113.Va user_frac , burst_max ,
114CPU speed, and system load.
115.Pp
116.It Va kern.polling.each_burst
117The burst above is split into smaller chunks of this number of
118packets, going round-robin among all interfaces registered for
119.Nm .
120This prevents the case that a large burst from a single interface
121can saturate the IP interrupt queue
122.Pq Va net.inet.ip.intr_queue_maxlen .
123Default is 5.
124.Pp
125.It Va kern.polling.burst_max
126Upper bound for
127.Va kern.polling.burst .
128Note that when
129.Nm
130is enabled, each interface can receive at most
131.Pq Va HZ No * Va burst_max
132packets per second unless there are spare CPU cycles available for
133.Nm
134in the idle loop.
135This number should be tuned to match the expected load
136(which can be quite high with GigE cards).
137Default is 150 which is adequate for 100Mbit network and HZ=1000.
138.Pp
139.It Va kern.polling.idle_poll
140Controls if
141.Nm
142is enabled in the idle loop.
143There are no reasons (other than power saving or bugs in the scheduler's
144handling of idle priority kernel threads) to disable this.
145Note that -CURRENT apparently has some problems in this respect now,
146so default is disabled.
147.Pp
148.It Va kern.polling.poll_in_trap
149Controls if
150.Nm
151is enabled during hardware traps.
152Enabling this can be useful to improve the network responsiveness
153of boxes with 100% CPU usage.
154Default is disabled.
155.Pp
156.It Va kern.polling.reg_frac
157Controls how often (every
158.Va reg_frac No / Va HZ
159seconds) the status registers of the device are checked for error
160conditions and the like.
161Increasing this value reduces the load on the bus, but also delays
162the error detection.
163Default is 20.
164.Pp
165.It Va kern.polling.handlers
166How many active devices have registered for
167.Nm .
168.Pp
169.It Va kern.polling.short_ticks
170.It Va kern.polling.lost_polls
171.It Va kern.polling.pending_polls
172.It Va kern.polling.residual_burst
173.It Va kern.polling.phase
174.It Va kern.polling.suspect
175.It Va kern.polling.stalled
176Debugging variables.
177.El
178.Sh SUPPORTED DEVICES
179Device polling requires explicit modifications to the device drivers.
180As of this writing, the
181.Xr dc 4 ,
182.Xr em 4 ,
183.Xr fwe 4 ,
184.Xr fwip 4 ,
185.Xr fxp 4 ,
186.Xr ixgb 4 ,
187.Xr nge 4 ,
188.Xr re 4 ,
189.Xr rl 4 ,
190.Xr sf 4 ,
191.Xr sis 4 ,
192.Xr ste 4 ,
193.Xr vge 4 ,
194.Xr vr 4 ,
195and
196.Xr xl 4
197devices are supported, with others in the works.
198The modifications are rather straightforward, consisting in
199the extraction of the inner part of the interrupt service routine
200and writing a callback function,
201.Fn *_poll ,
202which is invoked
203to probe the device for events and process them.
204(See the
205conditionally compiled sections of the devices mentioned above
206for more details.)
207.Pp
208As in the worst case the devices are only polled on
209clock interrupts, in order to reduce the latency in processing
210packets, it is advisable to increase the frequency of the clock
211to at least 1000 HZ.
212.Sh HISTORY
213Device polling first appeared in
214.Fx 4.6
215and
216.Fx 5.0 .
217.Sh AUTHORS
218Device polling was written by
219.An Luigi Rizzo Aq luigi@iet.unipi.it .
220