xref: /freebsd/share/man/man4/bpf.4 (revision 2be1a816b9ff69588e55be0a84cbe2a31efc0f2f)
1.\" Copyright (c) 2007 Seccuris Inc.
2.\" All rights reserved.
3.\"
4.\" This sofware was developed by Robert N. M. Watson under contract to
5.\" Seccuris Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\" Copyright (c) 1990 The Regents of the University of California.
29.\" All rights reserved.
30.\"
31.\" Redistribution and use in source and binary forms, with or without
32.\" modification, are permitted provided that: (1) source code distributions
33.\" retain the above copyright notice and this paragraph in its entirety, (2)
34.\" distributions including binary code include the above copyright notice and
35.\" this paragraph in its entirety in the documentation or other materials
36.\" provided with the distribution, and (3) all advertising materials mentioning
37.\" features or use of this software display the following acknowledgement:
38.\" ``This product includes software developed by the University of California,
39.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
40.\" the University nor the names of its contributors may be used to endorse
41.\" or promote products derived from this software without specific prior
42.\" written permission.
43.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
44.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
45.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
46.\"
47.\" This document is derived in part from the enet man page (enet.4)
48.\" distributed with 4.3BSD Unix.
49.\"
50.\" $FreeBSD$
51.\"
52.Dd February 26, 2007
53.Dt BPF 4
54.Os
55.Sh NAME
56.Nm bpf
57.Nd Berkeley Packet Filter
58.Sh SYNOPSIS
59.Cd device bpf
60.Sh DESCRIPTION
61The Berkeley Packet Filter
62provides a raw interface to data link layers in a protocol
63independent fashion.
64All packets on the network, even those destined for other hosts,
65are accessible through this mechanism.
66.Pp
67The packet filter appears as a character special device,
68.Pa /dev/bpf0 ,
69.Pa /dev/bpf1 ,
70etc.
71After opening the device, the file descriptor must be bound to a
72specific network interface with the
73.Dv BIOCSETIF
74ioctl.
75A given interface can be shared by multiple listeners, and the filter
76underlying each descriptor will see an identical packet stream.
77.Pp
78A separate device file is required for each minor device.
79If a file is in use, the open will fail and
80.Va errno
81will be set to
82.Er EBUSY .
83.Pp
84Associated with each open instance of a
85.Nm
86file is a user-settable packet filter.
87Whenever a packet is received by an interface,
88all file descriptors listening on that interface apply their filter.
89Each descriptor that accepts the packet receives its own copy.
90.Pp
91The packet filter will support any link level protocol that has fixed length
92headers.
93Currently, only Ethernet,
94.Tn SLIP ,
95and
96.Tn PPP
97drivers have been modified to interact with
98.Nm .
99.Pp
100Since packet data is in network byte order, applications should use the
101.Xr byteorder 3
102macros to extract multi-byte values.
103.Pp
104A packet can be sent out on the network by writing to a
105.Nm
106file descriptor.
107The writes are unbuffered, meaning only one packet can be processed per write.
108Currently, only writes to Ethernets and
109.Tn SLIP
110links are supported.
111.Sh BUFFER MODES
112.Nm
113devices deliver packet data to the application via memory buffers provided by
114the application.
115The buffer mode is set using the
116.Dv BIOCSETBUFMODE
117ioctl, and read using the
118.Dv BIOCGETBUFMODE
119ioctl.
120.Ss Buffered read mode
121By default,
122.Nm
123devices operate in the
124.Dv BPF_BUFMODE_BUFFER
125mode, in which packet data is copied explicitly from kernel to user memory
126using the
127.Xr read 2
128system call.
129The user process will declare a fixed buffer size that will be used both for
130sizing internal buffers and for all
131.Xr read 2
132operations on the file.
133This size is queried using the
134.Dv BIOCGBLEN
135ioctl, and is set using the
136.Dv BIOCSBLEN
137ioctl.
138Note that an individual packet larger than the buffer size is necessarily
139truncated.
140.Ss Zero-copy buffer mode
141.Nm
142devices may also operate in the
143.Dv BPF_BUFMODE_ZEROCOPY
144mode, in which packet data is written directly into two user memory buffers
145by the kernel, avoiding both system call and copying overhead.
146Buffers are of fixed (and equal) size, page-aligned, and an even multiple of
147the page size.
148The maximum zero-copy buffer size is returned by the
149.Dv BIOCGETZMAX
150ioctl.
151Note that an individual packet larger than the buffer size is necessarily
152truncated.
153.Pp
154The user process registers two memory buffers using the
155.Dv BIOCSETZBUF
156ioctl, which accepts a
157.Vt struct bpf_zbuf
158pointer as an argument:
159.Bd -literal
160struct bpf_zbuf {
161	void *bz_bufa;
162	void *bz_bufb;
163	size_t bz_buflen;
164};
165.Ed
166.Pp
167.Vt bz_bufa
168is a pointer to the userspace address of the first buffer that will be
169filled, and
170.Vt bz_bufb
171is a pointer to the second buffer.
172.Nm
173will then cycle between the two buffers as they fill and are acknowledged.
174.Pp
175Each buffer begins with a fixed-length header to hold synchronization and
176data length information for the buffer:
177.Bd -literal
178struct bpf_zbuf_header {
179	volatile u_int  bzh_kernel_gen;	/* Kernel generation number. */
180	volatile u_int  bzh_kernel_len;	/* Length of data in the buffer. */
181	volatile u_int  bzh_user_gen;	/* User generation number. */
182	/* ...padding for future use... */
183};
184.Ed
185.Pp
186The header structure of each buffer, including all padding, should be zeroed
187before it is configured using
188.Dv BIOCSETZBUF .
189Remaining space in the buffer will be used by the kernel to store packet
190data, laid out in the same format as with buffered read mode.
191.Pp
192The kernel and the user process follow a simple acknowledgement protocol via
193the buffer header to synchronize access to the buffer: when the header
194generation numbers,
195.Vt bzh_kernel_gen
196and
197.Vt bzh_user_gen ,
198hold the same value, the kernel owns the buffer, and when they differ,
199userspace owns the buffer.
200.Pp
201While the kernel owns the buffer, the contents are unstable and may change
202asynchronously; while the user process owns the buffer, its contents are
203stable and will not be changed until the buffer has been acknowledged.
204.Pp
205Initializing the buffer headers to all 0's before registering the buffer has
206the effect of assigning initial ownership of both buffers to the kernel.
207The kernel signals that a buffer has been assigned to userspace by modifying
208.Vt bzh_kernel_gen ,
209and userspace acknowledges the buffer and returns it to the kernel by setting
210the value of
211.Vt bzh_user_gen
212to the value of
213.Vt bzh_kernel_gen .
214.Pp
215In order to avoid caching and memory re-ordering effects, the user process
216must use atomic operations and memory barriers when checking for and
217acknowledging buffers:
218.Bd -literal
219#include <machine/atomic.h>
220
221/*
222 * Return ownership of a buffer to the kernel for reuse.
223 */
224static void
225buffer_acknowledge(struct bpf_zbuf_header *bzh)
226{
227
228	atomic_store_rel_int(&bzh->bzh_user_gen, bzh->bzh_kernel_gen);
229}
230
231/*
232 * Check whether a buffer has been assigned to userspace by the kernel.
233 * Return true if userspace owns the buffer, and false otherwise.
234 */
235static int
236buffer_check(struct bpf_zbuf_header *bzh)
237{
238
239	return (bzh->bzh_user_gen !=
240	    atomic_load_acq_int(&bzh->bzh_kernel_gen));
241}
242.Ed
243.Pp
244The user process may force the assignment of the next buffer, if any data
245is pending, to userspace using the
246.Dv BIOCROTZBUF
247ioctl.
248This allows the user process to retrieve data in a partially filled buffer
249before the buffer is full, such as following a timeout; the process must
250recheck for buffer ownership using the header generation numbers, as the
251buffer will not be assigned to userspace if no data was present.
252.Pp
253As in the buffered read mode,
254.Xr kqueue 2 ,
255.Xr poll 2 ,
256and
257.Xr select 2
258may be used to sleep awaiting the availbility of a completed buffer.
259They will return a readable file descriptor when ownership of the next buffer
260is assigned to user space.
261.Pp
262In the current implementation, the kernel may assign zero, one, or both
263buffers to the user process; however, an earlier implementation maintained
264the invariant that at most one buffer could be assigned to the user process
265at a time.
266In order to both ensure progress and high performance, user processes should
267acknowledge a completely processed buffer as quickly as possible, returning
268it for reuse, and not block waiting on a second buffer while holding another
269buffer.
270.Sh IOCTLS
271The
272.Xr ioctl 2
273command codes below are defined in
274.In net/bpf.h .
275All commands require
276these includes:
277.Bd -literal
278	#include <sys/types.h>
279	#include <sys/time.h>
280	#include <sys/ioctl.h>
281	#include <net/bpf.h>
282.Ed
283.Pp
284Additionally,
285.Dv BIOCGETIF
286and
287.Dv BIOCSETIF
288require
289.In sys/socket.h
290and
291.In net/if.h .
292.Pp
293In addition to
294.Dv FIONREAD
295and
296.Dv SIOCGIFADDR ,
297the following commands may be applied to any open
298.Nm
299file.
300The (third) argument to
301.Xr ioctl 2
302should be a pointer to the type indicated.
303.Bl -tag -width BIOCGETBUFMODE
304.It Dv BIOCGBLEN
305.Pq Li u_int
306Returns the required buffer length for reads on
307.Nm
308files.
309.It Dv BIOCSBLEN
310.Pq Li u_int
311Sets the buffer length for reads on
312.Nm
313files.
314The buffer must be set before the file is attached to an interface
315with
316.Dv BIOCSETIF .
317If the requested buffer size cannot be accommodated, the closest
318allowable size will be set and returned in the argument.
319A read call will result in
320.Er EIO
321if it is passed a buffer that is not this size.
322.It Dv BIOCGDLT
323.Pq Li u_int
324Returns the type of the data link layer underlying the attached interface.
325.Er EINVAL
326is returned if no interface has been specified.
327The device types, prefixed with
328.Dq Li DLT_ ,
329are defined in
330.In net/bpf.h .
331.It Dv BIOCPROMISC
332Forces the interface into promiscuous mode.
333All packets, not just those destined for the local host, are processed.
334Since more than one file can be listening on a given interface,
335a listener that opened its interface non-promiscuously may receive
336packets promiscuously.
337This problem can be remedied with an appropriate filter.
338.It Dv BIOCFLUSH
339Flushes the buffer of incoming packets,
340and resets the statistics that are returned by BIOCGSTATS.
341.It Dv BIOCGETIF
342.Pq Li "struct ifreq"
343Returns the name of the hardware interface that the file is listening on.
344The name is returned in the ifr_name field of
345the
346.Li ifreq
347structure.
348All other fields are undefined.
349.It Dv BIOCSETIF
350.Pq Li "struct ifreq"
351Sets the hardware interface associate with the file.
352This
353command must be performed before any packets can be read.
354The device is indicated by name using the
355.Li ifr_name
356field of the
357.Li ifreq
358structure.
359Additionally, performs the actions of
360.Dv BIOCFLUSH .
361.It Dv BIOCSRTIMEOUT
362.It Dv BIOCGRTIMEOUT
363.Pq Li "struct timeval"
364Set or get the read timeout parameter.
365The argument
366specifies the length of time to wait before timing
367out on a read request.
368This parameter is initialized to zero by
369.Xr open 2 ,
370indicating no timeout.
371.It Dv BIOCGSTATS
372.Pq Li "struct bpf_stat"
373Returns the following structure of packet statistics:
374.Bd -literal
375struct bpf_stat {
376	u_int bs_recv;    /* number of packets received */
377	u_int bs_drop;    /* number of packets dropped */
378};
379.Ed
380.Pp
381The fields are:
382.Bl -hang -offset indent
383.It Li bs_recv
384the number of packets received by the descriptor since opened or reset
385(including any buffered since the last read call);
386and
387.It Li bs_drop
388the number of packets which were accepted by the filter but dropped by the
389kernel because of buffer overflows
390(i.e., the application's reads are not keeping up with the packet traffic).
391.El
392.It Dv BIOCIMMEDIATE
393.Pq Li u_int
394Enable or disable
395.Dq immediate mode ,
396based on the truth value of the argument.
397When immediate mode is enabled, reads return immediately upon packet
398reception.
399Otherwise, a read will block until either the kernel buffer
400becomes full or a timeout occurs.
401This is useful for programs like
402.Xr rarpd 8
403which must respond to messages in real time.
404The default for a new file is off.
405.It Dv BIOCSETF
406.Pq Li "struct bpf_program"
407Sets the read filter program used by the kernel to discard uninteresting
408packets.
409An array of instructions and its length is passed in using
410the following structure:
411.Bd -literal
412struct bpf_program {
413	int bf_len;
414	struct bpf_insn *bf_insns;
415};
416.Ed
417.Pp
418The filter program is pointed to by the
419.Li bf_insns
420field while its length in units of
421.Sq Li struct bpf_insn
422is given by the
423.Li bf_len
424field.
425Also, the actions of
426.Dv BIOCFLUSH
427are performed.
428See section
429.Sx "FILTER MACHINE"
430for an explanation of the filter language.
431.It Dv BIOCSETWF
432.Pq Li "struct bpf_program"
433Sets the write filter program used by the kernel to control what type of
434packets can be written to the interface.
435See the
436.Dv BIOCSETF
437command for more
438information on the
439.Nm
440filter program.
441.It Dv BIOCVERSION
442.Pq Li "struct bpf_version"
443Returns the major and minor version numbers of the filter language currently
444recognized by the kernel.
445Before installing a filter, applications must check
446that the current version is compatible with the running kernel.
447Version numbers are compatible if the major numbers match and the application minor
448is less than or equal to the kernel minor.
449The kernel version number is returned in the following structure:
450.Bd -literal
451struct bpf_version {
452        u_short bv_major;
453        u_short bv_minor;
454};
455.Ed
456.Pp
457The current version numbers are given by
458.Dv BPF_MAJOR_VERSION
459and
460.Dv BPF_MINOR_VERSION
461from
462.In net/bpf.h .
463An incompatible filter
464may result in undefined behavior (most likely, an error returned by
465.Fn ioctl
466or haphazard packet matching).
467.It Dv BIOCSHDRCMPLT
468.It Dv BIOCGHDRCMPLT
469.Pq Li u_int
470Set or get the status of the
471.Dq header complete
472flag.
473Set to zero if the link level source address should be filled in automatically
474by the interface output routine.
475Set to one if the link level source
476address will be written, as provided, to the wire.
477This flag is initialized to zero by default.
478.It Dv BIOCSSEESENT
479.It Dv BIOCGSEESENT
480.Pq Li u_int
481These commands are obsolete but left for compatibility.
482Use
483.Dv BIOCSDIRECTION
484and
485.Dv BIOCGDIRECTION
486instead.
487Set or get the flag determining whether locally generated packets on the
488interface should be returned by BPF.
489Set to zero to see only incoming packets on the interface.
490Set to one to see packets originating locally and remotely on the interface.
491This flag is initialized to one by default.
492.It Dv BIOCSDIRECTION
493.It Dv BIOCGDIRECTION
494.Pq Li u_int
495Set or get the setting determining whether incoming, outgoing, or all packets
496on the interface should be returned by BPF.
497Set to
498.Dv BPF_D_IN
499to see only incoming packets on the interface.
500Set to
501.Dv BPF_D_INOUT
502to see packets originating locally and remotely on the interface.
503Set to
504.Dv BPF_D_OUT
505to see only outgoing packets on the interface.
506This setting is initialized to
507.Dv BPF_D_INOUT
508by default.
509.It Dv BIOCFEEDBACK
510.Pq Li u_int
511Set packet feedback mode.
512This allows injected packets to be fed back as input to the interface when
513output via the interface is successful.
514When
515.Dv BPF_D_INOUT
516direction is set, injected outgoing packet is not returned by BPF to avoid
517duplication. This flag is initialized to zero by default.
518.It Dv BIOCLOCK
519Set the locked flag on the
520.Nm
521descriptor.
522This prevents the execution of
523ioctl commands which could change the underlying operating parameters of
524the device.
525.It Dv BIOCGETBUFMODE
526.It Dv BIOCSETBUFMODE
527.Pq Li u_int
528Get or set the current
529.Nm
530buffering mode; possible values are
531.Dv BPF_BUFMODE_BUFFER ,
532buffered read mode, and
533.Dv BPF_BUFMODE_ZBUF ,
534zero-copy buffer mode.
535.It Dv BIOCSETZBUF
536.Pq Li struct bpf_zbuf
537Set the current zero-copy buffer locations; buffer locations may be
538set only once zero-copy buffer mode has been selected, and prior to attaching
539to an interface.
540Buffers must be of identical size, page-aligned, and an integer multiple of
541pages in size.
542The three fields
543.Vt bz_bufa ,
544.Vt bz_bufb ,
545and
546.Vt bz_buflen
547must be filled out.
548If buffers have already been set for this device, the ioctl will fail.
549.It Dv BIOCGETZMAX
550.Pq Li size_t
551Get the largest individual zero-copy buffer size allowed.
552As two buffers are used in zero-copy buffer mode, the limit (in practice) is
553twice the returned size.
554As zero-copy buffers consume kernel address space, conservative selection of
555buffer size is suggested, especially when there are multiple
556.Nm
557descriptors in use on 32-bit systems.
558.It Dv BIOCROTZBUF
559Force ownership of the next buffer to be assigned to userspace, if any data
560present in the buffer.
561If no data is present, the buffer will remain owned by the kernel.
562This allows consumers of zero-copy buffering to implement timeouts and
563retrieve partially filled buffers.
564In order to handle the case where no data is present in the buffer and
565therefore ownership is not assigned, the user process must check
566.Vt bzh_kernel_gen
567against
568.Vt bzh_user_gen .
569.El
570.Sh BPF HEADER
571The following structure is prepended to each packet returned by
572.Xr read 2
573or via a zero-copy buffer:
574.Bd -literal
575struct bpf_hdr {
576        struct timeval bh_tstamp;     /* time stamp */
577        u_long bh_caplen;             /* length of captured portion */
578        u_long bh_datalen;            /* original length of packet */
579        u_short bh_hdrlen;            /* length of bpf header (this struct
580					 plus alignment padding */
581};
582.Ed
583.Pp
584The fields, whose values are stored in host order, and are:
585.Pp
586.Bl -tag -compact -width bh_datalen
587.It Li bh_tstamp
588The time at which the packet was processed by the packet filter.
589.It Li bh_caplen
590The length of the captured portion of the packet.
591This is the minimum of
592the truncation amount specified by the filter and the length of the packet.
593.It Li bh_datalen
594The length of the packet off the wire.
595This value is independent of the truncation amount specified by the filter.
596.It Li bh_hdrlen
597The length of the
598.Nm
599header, which may not be equal to
600.\" XXX - not really a function call
601.Fn sizeof "struct bpf_hdr" .
602.El
603.Pp
604The
605.Li bh_hdrlen
606field exists to account for
607padding between the header and the link level protocol.
608The purpose here is to guarantee proper alignment of the packet
609data structures, which is required on alignment sensitive
610architectures and improves performance on many other architectures.
611The packet filter insures that the
612.Li bpf_hdr
613and the network layer
614header will be word aligned.
615Suitable precautions
616must be taken when accessing the link layer protocol fields on alignment
617restricted machines.
618(This is not a problem on an Ethernet, since
619the type field is a short falling on an even offset,
620and the addresses are probably accessed in a bytewise fashion).
621.Pp
622Additionally, individual packets are padded so that each starts
623on a word boundary.
624This requires that an application
625has some knowledge of how to get from packet to packet.
626The macro
627.Dv BPF_WORDALIGN
628is defined in
629.In net/bpf.h
630to facilitate
631this process.
632It rounds up its argument to the nearest word aligned value (where a word is
633.Dv BPF_ALIGNMENT
634bytes wide).
635.Pp
636For example, if
637.Sq Li p
638points to the start of a packet, this expression
639will advance it to the next packet:
640.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
641.Pp
642For the alignment mechanisms to work properly, the
643buffer passed to
644.Xr read 2
645must itself be word aligned.
646The
647.Xr malloc 3
648function
649will always return an aligned buffer.
650.Sh FILTER MACHINE
651A filter program is an array of instructions, with all branches forwardly
652directed, terminated by a
653.Em return
654instruction.
655Each instruction performs some action on the pseudo-machine state,
656which consists of an accumulator, index register, scratch memory store,
657and implicit program counter.
658.Pp
659The following structure defines the instruction format:
660.Bd -literal
661struct bpf_insn {
662	u_short	code;
663	u_char 	jt;
664	u_char 	jf;
665	u_long k;
666};
667.Ed
668.Pp
669The
670.Li k
671field is used in different ways by different instructions,
672and the
673.Li jt
674and
675.Li jf
676fields are used as offsets
677by the branch instructions.
678The opcodes are encoded in a semi-hierarchical fashion.
679There are eight classes of instructions:
680.Dv BPF_LD ,
681.Dv BPF_LDX ,
682.Dv BPF_ST ,
683.Dv BPF_STX ,
684.Dv BPF_ALU ,
685.Dv BPF_JMP ,
686.Dv BPF_RET ,
687and
688.Dv BPF_MISC .
689Various other mode and
690operator bits are or'd into the class to give the actual instructions.
691The classes and modes are defined in
692.In net/bpf.h .
693.Pp
694Below are the semantics for each defined
695.Nm
696instruction.
697We use the convention that A is the accumulator, X is the index register,
698P[] packet data, and M[] scratch memory store.
699P[i:n] gives the data at byte offset
700.Dq i
701in the packet,
702interpreted as a word (n=4),
703unsigned halfword (n=2), or unsigned byte (n=1).
704M[i] gives the i'th word in the scratch memory store, which is only
705addressed in word units.
706The memory store is indexed from 0 to
707.Dv BPF_MEMWORDS
708- 1.
709.Li k ,
710.Li jt ,
711and
712.Li jf
713are the corresponding fields in the
714instruction definition.
715.Dq len
716refers to the length of the packet.
717.Pp
718.Bl -tag -width BPF_STXx
719.It Dv BPF_LD
720These instructions copy a value into the accumulator.
721The type of the source operand is specified by an
722.Dq addressing mode
723and can be a constant
724.Pq Dv BPF_IMM ,
725packet data at a fixed offset
726.Pq Dv BPF_ABS ,
727packet data at a variable offset
728.Pq Dv BPF_IND ,
729the packet length
730.Pq Dv BPF_LEN ,
731or a word in the scratch memory store
732.Pq Dv BPF_MEM .
733For
734.Dv BPF_IND
735and
736.Dv BPF_ABS ,
737the data size must be specified as a word
738.Pq Dv BPF_W ,
739halfword
740.Pq Dv BPF_H ,
741or byte
742.Pq Dv BPF_B .
743The semantics of all the recognized
744.Dv BPF_LD
745instructions follow.
746.Pp
747.Bd -literal
748BPF_LD+BPF_W+BPF_ABS	A <- P[k:4]
749BPF_LD+BPF_H+BPF_ABS	A <- P[k:2]
750BPF_LD+BPF_B+BPF_ABS	A <- P[k:1]
751BPF_LD+BPF_W+BPF_IND	A <- P[X+k:4]
752BPF_LD+BPF_H+BPF_IND	A <- P[X+k:2]
753BPF_LD+BPF_B+BPF_IND	A <- P[X+k:1]
754BPF_LD+BPF_W+BPF_LEN	A <- len
755BPF_LD+BPF_IMM		A <- k
756BPF_LD+BPF_MEM		A <- M[k]
757.Ed
758.It Dv BPF_LDX
759These instructions load a value into the index register.
760Note that
761the addressing modes are more restrictive than those of the accumulator loads,
762but they include
763.Dv BPF_MSH ,
764a hack for efficiently loading the IP header length.
765.Pp
766.Bd -literal
767BPF_LDX+BPF_W+BPF_IMM	X <- k
768BPF_LDX+BPF_W+BPF_MEM	X <- M[k]
769BPF_LDX+BPF_W+BPF_LEN	X <- len
770BPF_LDX+BPF_B+BPF_MSH	X <- 4*(P[k:1]&0xf)
771.Ed
772.It Dv BPF_ST
773This instruction stores the accumulator into the scratch memory.
774We do not need an addressing mode since there is only one possibility
775for the destination.
776.Pp
777.Bd -literal
778BPF_ST			M[k] <- A
779.Ed
780.It Dv BPF_STX
781This instruction stores the index register in the scratch memory store.
782.Pp
783.Bd -literal
784BPF_STX			M[k] <- X
785.Ed
786.It Dv BPF_ALU
787The alu instructions perform operations between the accumulator and
788index register or constant, and store the result back in the accumulator.
789For binary operations, a source mode is required
790.Dv ( BPF_K
791or
792.Dv BPF_X ) .
793.Pp
794.Bd -literal
795BPF_ALU+BPF_ADD+BPF_K	A <- A + k
796BPF_ALU+BPF_SUB+BPF_K	A <- A - k
797BPF_ALU+BPF_MUL+BPF_K	A <- A * k
798BPF_ALU+BPF_DIV+BPF_K	A <- A / k
799BPF_ALU+BPF_AND+BPF_K	A <- A & k
800BPF_ALU+BPF_OR+BPF_K	A <- A | k
801BPF_ALU+BPF_LSH+BPF_K	A <- A << k
802BPF_ALU+BPF_RSH+BPF_K	A <- A >> k
803BPF_ALU+BPF_ADD+BPF_X	A <- A + X
804BPF_ALU+BPF_SUB+BPF_X	A <- A - X
805BPF_ALU+BPF_MUL+BPF_X	A <- A * X
806BPF_ALU+BPF_DIV+BPF_X	A <- A / X
807BPF_ALU+BPF_AND+BPF_X	A <- A & X
808BPF_ALU+BPF_OR+BPF_X	A <- A | X
809BPF_ALU+BPF_LSH+BPF_X	A <- A << X
810BPF_ALU+BPF_RSH+BPF_X	A <- A >> X
811BPF_ALU+BPF_NEG		A <- -A
812.Ed
813.It Dv BPF_JMP
814The jump instructions alter flow of control.
815Conditional jumps
816compare the accumulator against a constant
817.Pq Dv BPF_K
818or the index register
819.Pq Dv BPF_X .
820If the result is true (or non-zero),
821the true branch is taken, otherwise the false branch is taken.
822Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
823However, the jump always
824.Pq Dv BPF_JA
825opcode uses the 32 bit
826.Li k
827field as the offset, allowing arbitrarily distant destinations.
828All conditionals use unsigned comparison conventions.
829.Pp
830.Bd -literal
831BPF_JMP+BPF_JA		pc += k
832BPF_JMP+BPF_JGT+BPF_K	pc += (A > k) ? jt : jf
833BPF_JMP+BPF_JGE+BPF_K	pc += (A >= k) ? jt : jf
834BPF_JMP+BPF_JEQ+BPF_K	pc += (A == k) ? jt : jf
835BPF_JMP+BPF_JSET+BPF_K	pc += (A & k) ? jt : jf
836BPF_JMP+BPF_JGT+BPF_X	pc += (A > X) ? jt : jf
837BPF_JMP+BPF_JGE+BPF_X	pc += (A >= X) ? jt : jf
838BPF_JMP+BPF_JEQ+BPF_X	pc += (A == X) ? jt : jf
839BPF_JMP+BPF_JSET+BPF_X	pc += (A & X) ? jt : jf
840.Ed
841.It Dv BPF_RET
842The return instructions terminate the filter program and specify the amount
843of packet to accept (i.e., they return the truncation amount).
844A return value of zero indicates that the packet should be ignored.
845The return value is either a constant
846.Pq Dv BPF_K
847or the accumulator
848.Pq Dv BPF_A .
849.Pp
850.Bd -literal
851BPF_RET+BPF_A		accept A bytes
852BPF_RET+BPF_K		accept k bytes
853.Ed
854.It Dv BPF_MISC
855The miscellaneous category was created for anything that does not
856fit into the above classes, and for any new instructions that might need to
857be added.
858Currently, these are the register transfer instructions
859that copy the index register to the accumulator or vice versa.
860.Pp
861.Bd -literal
862BPF_MISC+BPF_TAX	X <- A
863BPF_MISC+BPF_TXA	A <- X
864.Ed
865.El
866.Pp
867The
868.Nm
869interface provides the following macros to facilitate
870array initializers:
871.Fn BPF_STMT opcode operand
872and
873.Fn BPF_JUMP opcode operand true_offset false_offset .
874.Sh FILES
875.Bl -tag -compact -width /dev/bpfXXX
876.It Pa /dev/bpf Ns Sy n
877the packet filter device
878.El
879.Sh EXAMPLES
880The following filter is taken from the Reverse ARP Daemon.
881It accepts only Reverse ARP requests.
882.Bd -literal
883struct bpf_insn insns[] = {
884	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
885	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
886	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
887	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
888	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
889		 sizeof(struct ether_header)),
890	BPF_STMT(BPF_RET+BPF_K, 0),
891};
892.Ed
893.Pp
894This filter accepts only IP packets between host 128.3.112.15 and
895128.3.112.35.
896.Bd -literal
897struct bpf_insn insns[] = {
898	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
899	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
900	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
901	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
902	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
903	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
904	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
905	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
906	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
907	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
908	BPF_STMT(BPF_RET+BPF_K, 0),
909};
910.Ed
911.Pp
912Finally, this filter returns only TCP finger packets.
913We must parse the IP header to reach the TCP header.
914The
915.Dv BPF_JSET
916instruction
917checks that the IP fragment offset is 0 so we are sure
918that we have a TCP header.
919.Bd -literal
920struct bpf_insn insns[] = {
921	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
922	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
923	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
924	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
925	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
926	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
927	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
928	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
929	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
930	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
931	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
932	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
933	BPF_STMT(BPF_RET+BPF_K, 0),
934};
935.Ed
936.Sh SEE ALSO
937.Xr tcpdump 1 ,
938.Xr ioctl 2 ,
939.Xr kqueue 2 ,
940.Xr poll 2 ,
941.Xr select 2 ,
942.Xr byteorder 3 ,
943.Xr ng_bpf 4 ,
944.Xr bpf 9
945.Rs
946.%A McCanne, S.
947.%A Jacobson V.
948.%T "An efficient, extensible, and portable network monitor"
949.Re
950.Sh HISTORY
951The Enet packet filter was created in 1980 by Mike Accetta and
952Rick Rashid at Carnegie-Mellon University.
953Jeffrey Mogul, at
954Stanford, ported the code to
955.Bx
956and continued its development from
9571983 on.
958Since then, it has evolved into the Ultrix Packet Filter at
959.Tn DEC ,
960a
961.Tn STREAMS
962.Tn NIT
963module under
964.Tn SunOS 4.1 ,
965and
966.Tn BPF .
967.Sh AUTHORS
968.An -nosplit
969.An Steven McCanne ,
970of Lawrence Berkeley Laboratory, implemented BPF in
971Summer 1990.
972Much of the design is due to
973.An Van Jacobson .
974.Pp
975Support for zero-copy buffers was added by
976.An Robert N. M. Watson
977under contract to Seccuris Inc.
978.Sh BUGS
979The read buffer must be of a fixed size (returned by the
980.Dv BIOCGBLEN
981ioctl).
982.Pp
983A file that does not request promiscuous mode may receive promiscuously
984received packets as a side effect of another file requesting this
985mode on the same hardware interface.
986This could be fixed in the kernel with additional processing overhead.
987However, we favor the model where
988all files must assume that the interface is promiscuous, and if
989so desired, must utilize a filter to reject foreign packets.
990.Pp
991Data link protocols with variable length headers are not currently supported.
992.Pp
993The
994.Dv SEESENT ,
995.Dv DIRECTION ,
996and
997.Dv FEEDBACK
998settings have been observed to work incorrectly on some interface
999types, including those with hardware loopback rather than software loopback,
1000and point-to-point interfaces.
1001They appear to function correctly on a
1002broad range of Ethernet-style interfaces.
1003