xref: /freebsd/share/man/man4/polling.4 (revision 6b3455a7665208c366849f0b2b3bc916fb97516e)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd April 5, 2004
28.Dt POLLING 4
29.Os
30.Sh NAME
31.Nm polling
32.Nd device polling support
33.Sh SYNOPSIS
34.Cd "options DEVICE_POLLING"
35.Cd "options HZ=1000"
36.Sh DESCRIPTION
37Device polling
38.Nm (
39for brevity) refers to a technique to
40handle devices that does not rely on the latter to generate
41interrupts when they need attention, but rather lets the CPU poll
42devices to service their needs.
43This might seem inefficient and counterintuitive, but when done
44properly,
45.Nm
46gives more control to the operating system on
47when and how to handle devices, with a number of advantages in terms
48of system responsiveness and performance.
49.Pp
50In particular,
51.Nm
52reduces the overhead for context
53switches which is incurred when servicing interrupts, and
54gives more control on the scheduling of the CPU between various
55tasks (user processes, software interrupts, device handling)
56which ultimately reduces the chances of livelock in the system.
57.Ss Principles of Operation
58In the normal, interrupt-based mode, devices generate an interrupt
59whenever they need attention.
60This in turn causes a
61context switch and the execution of an interrupt handler
62which performs whatever processing is needed by the device.
63The duration of the interrupt handler is potentially unbounded
64unless the device driver has been programmed with real-time
65concerns in mind (which is generally not the case for
66.Fx
67drivers).
68Furthermore, under heavy traffic load, the system might be
69persistently processing interrupts without being able to
70complete other work, either in the kernel or in userland.
71.Pp
72Device polling disables interrupts by polling devices at appropriate
73times, i.e., on clock interrupts, system calls and within the idle loop.
74This way, the context switch overhead is removed.
75Furthermore,
76the operating system can control accurately how much work to spend
77in handling device events, and thus prevent livelock by reserving
78some amount of CPU to other tasks.
79.Pp
80Enabling
81.Nm
82also changes the way software network interrupts
83are scheduled, so there is never the risk of livelock because
84packets are not processed to completion.
85.Ss MIB Variables
86The operation of
87.Nm
88is controlled by the following
89.Xr sysctl 8
90MIB variables:
91.Pp
92.Bl -tag -width indent -compact
93.It Va kern.polling.enable
94If set to non-zero,
95.Nm
96is enabled.
97Default is disabled.
98.Pp
99.It Va kern.polling.user_frac
100When
101.Nm
102is enabled, and provided that there is some work to do,
103up to this percent of the CPU cycles is reserved to userland tasks,
104the remaining fraction being available for
105.Nm
106processing.
107Default is 50.
108.Pp
109.It Va kern.polling.burst
110Maximum number of packets grabbed from each network interface in
111each timer tick.
112This number is dynamically adjusted by the kernel,
113according to the programmed
114.Va user_frac , burst_max ,
115CPU speed, and system load.
116.Pp
117.It Va kern.polling.each_burst
118The burst above is split into smaller chunks of this number of
119packets, going round-robin among all interfaces registered for
120.Nm .
121This prevents the case that a large burst from a single interface
122can saturate the IP interrupt queue
123.Pq Va net.inet.ip.intr_queue_maxlen .
124Default is 5.
125.Pp
126.It Va kern.polling.burst_max
127Upper bound for
128.Va kern.polling.burst .
129Note that when
130.Nm
131is enabled, each interface can receive at most
132.Pq Va HZ No * Va burst_max
133packets per second unless there are spare CPU cycles available for
134.Nm
135in the idle loop.
136This number should be tuned to match the expected load
137(which can be quite high with GigE cards).
138Default is 150 which is adequate for 100Mbit network and HZ=1000.
139.Pp
140.It Va kern.polling.idle_poll
141Controls if
142.Nm
143is enabled in the idle loop.
144There are no reasons (other than power saving or bugs in the scheduler's
145handling of idle priority kernel threads) to disable this.
146Note that -CURRENT apparently has some problems in this respect now,
147so default is disabled.
148.Pp
149.It Va kern.polling.poll_in_trap
150Controls if
151.Nm
152is enabled during hardware traps.
153Enabling this can be useful to improve the network responsiveness
154of boxes with 100% CPU usage.
155Default is disabled.
156.Pp
157.It Va kern.polling.reg_frac
158Controls how often (every
159.Va reg_frac No / Va HZ
160seconds) the status registers of the device are checked for error
161conditions and the like.
162Increasing this value reduces the load on the bus, but also delays
163the error detection.
164Default is 20.
165.Pp
166.It Va kern.polling.handlers
167How many active devices have registered for
168.Nm .
169.Pp
170.It Va kern.polling.short_ticks
171.It Va kern.polling.lost_polls
172.It Va kern.polling.pending_polls
173.It Va kern.polling.residual_burst
174.It Va kern.polling.phase
175.It Va kern.polling.suspect
176.It Va kern.polling.stalled
177Debugging variables.
178.El
179.Sh SUPPORTED DEVICES
180Device polling requires explicit modifications to the device drivers.
181As of this writing, the
182.Xr dc 4 ,
183.Xr em 4 ,
184.Xr fwe 4 ,
185.Xr fxp 4 ,
186.Xr nge 4 ,
187.Xr re 4 ,
188.Xr rl 4 ,
189.Xr sis 4 ,
190.Xr ste 4 ,
191and
192.Xr vr 4
193devices are supported, with others in the works.
194The modifications are rather straightforward, consisting in
195the extraction of the inner part of the interrupt service routine
196and writing a callback function,
197.Fn *_poll ,
198which is invoked
199to probe the device for events and process them.
200(See the
201conditionally compiled sections of the devices mentioned above
202for more details.)
203.Pp
204As in the worst case the devices are only polled on
205clock interrupts, in order to reduce the latency in processing
206packets, it is advisable to increase the frequency of the clock
207to at least 1000 HZ.
208.Sh HISTORY
209Device polling first appeared in
210.Fx 4.6
211and
212.Fx 5.0 .
213.Sh AUTHORS
214Device polling was written by
215.An Luigi Rizzo Aq luigi@iet.unipi.it .
216