xref: /freebsd/share/man/man4/bpf.4 (revision cacdd70cc751fb68dec4b86c5e5b8c969b6e26ef)
1.\" Copyright (c) 2007 Seccuris Inc.
2.\" All rights reserved.
3.\"
4.\" This sofware was developed by Robert N. M. Watson under contract to
5.\" Seccuris Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\" Copyright (c) 1990 The Regents of the University of California.
29.\" All rights reserved.
30.\"
31.\" Redistribution and use in source and binary forms, with or without
32.\" modification, are permitted provided that: (1) source code distributions
33.\" retain the above copyright notice and this paragraph in its entirety, (2)
34.\" distributions including binary code include the above copyright notice and
35.\" this paragraph in its entirety in the documentation or other materials
36.\" provided with the distribution, and (3) all advertising materials mentioning
37.\" features or use of this software display the following acknowledgement:
38.\" ``This product includes software developed by the University of California,
39.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
40.\" the University nor the names of its contributors may be used to endorse
41.\" or promote products derived from this software without specific prior
42.\" written permission.
43.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
44.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
45.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
46.\"
47.\" This document is derived in part from the enet man page (enet.4)
48.\" distributed with 4.3BSD Unix.
49.\"
50.\" $FreeBSD$
51.\"
52.Dd February 26, 2007
53.Dt BPF 4
54.Os
55.Sh NAME
56.Nm bpf
57.Nd Berkeley Packet Filter
58.Sh SYNOPSIS
59.Cd device bpf
60.Sh DESCRIPTION
61The Berkeley Packet Filter
62provides a raw interface to data link layers in a protocol
63independent fashion.
64All packets on the network, even those destined for other hosts,
65are accessible through this mechanism.
66.Pp
67The packet filter appears as a character special device,
68.Pa /dev/bpf0 ,
69.Pa /dev/bpf1 ,
70etc.
71After opening the device, the file descriptor must be bound to a
72specific network interface with the
73.Dv BIOCSETIF
74ioctl.
75A given interface can be shared by multiple listeners, and the filter
76underlying each descriptor will see an identical packet stream.
77.Pp
78A separate device file is required for each minor device.
79If a file is in use, the open will fail and
80.Va errno
81will be set to
82.Er EBUSY .
83.Pp
84Associated with each open instance of a
85.Nm
86file is a user-settable packet filter.
87Whenever a packet is received by an interface,
88all file descriptors listening on that interface apply their filter.
89Each descriptor that accepts the packet receives its own copy.
90.Pp
91The packet filter will support any link level protocol that has fixed length
92headers.
93Currently, only Ethernet,
94.Tn SLIP ,
95and
96.Tn PPP
97drivers have been modified to interact with
98.Nm .
99.Pp
100Since packet data is in network byte order, applications should use the
101.Xr byteorder 3
102macros to extract multi-byte values.
103.Pp
104A packet can be sent out on the network by writing to a
105.Nm
106file descriptor.
107The writes are unbuffered, meaning only one packet can be processed per write.
108Currently, only writes to Ethernets and
109.Tn SLIP
110links are supported.
111.Sh BUFFER MODES
112.Nm
113devices deliver packet data to the application via memory buffers provided by
114the application.
115The buffer mode is set using the
116.Dv BIOCSETBUFMODE
117ioctl, and read using the
118.Dv BIOCGETBUFMODE
119ioctl.
120.Ss Buffered read mode
121By default,
122.Nm
123devices operate in the
124.Dv BPF_BUFMODE_BUFFER
125mode, in which packet data is copied explicitly from kernel to user memory
126using the
127.Xr read 2
128system call.
129The user process will declare a fixed buffer size that will be used both for
130sizing internal buffers and for all
131.Xr read 2
132operations on the file.
133This size is queried using the
134.Dv BIOCGBLEN
135ioctl, and is set using the
136.Dv BIOCSBLEN
137ioctl.
138Note that an individual packet larger than the buffer size is necessarily
139truncated.
140.Ss Zero-copy buffer mode
141.Nm
142devices may also operate in the
143.Dv BPF_BUFMODE_ZEROCOPY
144mode, in which packet data is written directly into two user memory buffers
145by the kernel, avoiding both system call and copying overhead.
146Buffers are of fixed (and equal) size, page-aligned, and an even multiple of
147the page size.
148The maximum zero-copy buffer size is returned by the
149.Dv BIOCGETZMAX
150ioctl.
151Note that an individual packet larger than the buffer size is necessarily
152truncated.
153.Pp
154The user process registers two memory buffers using the
155.Dv BIOCSETZBUF
156ioctl, which accepts a
157.Vt struct bpf_zbuf
158pointer as an argument:
159.Bd -literal
160struct bpf_zbuf {
161	void *bz_bufa;
162	void *bz_bufb;
163	size_t bz_buflen;
164};
165.Ed
166.Pp
167.Vt bz_bufa
168is a pointer to the userspace address of the first buffer that will be
169filled, and
170.Vt bz_bufb
171is a pointer to the second buffer.
172.Nm
173will then cycle between the two buffers as they fill and are acknowledged.
174.Pp
175Each buffer begins with a fixed-length header to hold synchronization and
176data length information for the buffer:
177.Bd -literal
178struct bpf_zbuf_header {
179	volatile u_int  bzh_kernel_gen;	/* Kernel generation number. */
180	volatile u_int  bzh_kernel_len;	/* Length of data in the buffer. */
181	volatile u_int  bzh_user_gen;	/* User generation number. */
182	/* ...padding for future use... */
183};
184.Ed
185.Pp
186The header structure of each buffer, including all padding, should be zeroed
187before it is configured using
188.Dv BIOCSETZBUF .
189Remaining space in the buffer will be used by the kernel to store packet
190data, laid out in the same format as with buffered read mode.
191.Pp
192The kernel and the user process follow a simple acknowledgement protocol via
193the buffer header to synchronize access to the buffer: when the header
194generation numbers,
195.Vt bzh_kernel_gen
196and
197.Vt bzh_user_gen ,
198hold the same value, the kernel owns the buffer, and when they differ,
199userspace owns the buffer.
200.Pp
201While the kernel owns the buffer, the contents are unstable and may change
202asynchronously; while the user process owns the buffer, its contents are
203stable and will not be changed until the buffer has been acknowledged.
204.Pp
205Initializing the buffer headers to all 0's before registering the buffer has
206the effect of assigning initial ownership of both buffers to the kernel.
207The kernel signals that a buffer has been assigned to userspace by modifying
208.Vt bzh_kernel_gen ,
209and userspace acknowledges the buffer and returns it to the kernel by setting
210the value of
211.Vt bzh_user_gen
212to the value of
213.Vt bzh_kernel_gen .
214.Pp
215In order to avoid caching and memory re-ordering effects, the user process
216must use atomic operations and memory barriers when checking for and
217acknowledging buffers:
218.Bd -literal
219#include <machine/atomic.h>
220
221/*
222 * Return ownership of a buffer to the kernel for reuse.
223 */
224static void
225buffer_acknowledge(struct bpf_zbuf_header *bzh)
226{
227
228	atomic_store_rel_int(&bzh->bzh_user_gen, bzh->bzh_kernel_gen);
229}
230
231/*
232 * Check whether a buffer has been assigned to userspace by the kernel.
233 * Return true if userspace owns the buffer, and false otherwise.
234 */
235static int
236buffer_check(struct bpf_zbuf_header *bzh)
237{
238
239	return (bzh->bzh_user_gen !=
240	    atomic_load_acq_int(&bzh->bzh_kernel_gen));
241}
242.Ed
243.Pp
244The user process may force the assignment of the next buffer, if any data
245is pending, to userspace using the
246.Dv BIOCROTZBUF
247ioctl.
248This allows the user process to retrieve data in a partially filled buffer
249before the buffer is full, such as following a timeout; the process must
250recheck for buffer ownership using the header generation numbers, as the
251buffer will not be assigned to userspace if no data was present.
252.Pp
253As in the buffered read mode,
254.Xr kqueue 2 ,
255.Xr poll 2 ,
256and
257.Xr select 2
258may be used to sleep awaiting the availbility of a completed buffer.
259They will return a readable file descriptor when ownership of the next buffer
260is assigned to user space.
261.Pp
262In the current implementation, the kernel may assign zero, one, or both
263buffers to the user process; however, an earlier implementation maintained
264the invariant that at most one buffer could be assigned to the user process
265at a time.
266In order to both ensure progress and high performance, user processes should
267acknowledge a completely processed buffer as quickly as possible, returning
268it for reuse, and not block waiting on a second buffer while holding another
269buffer.
270.Sh IOCTLS
271The
272.Xr ioctl 2
273command codes below are defined in
274.In net/bpf.h .
275All commands require
276these includes:
277.Bd -literal
278	#include <sys/types.h>
279	#include <sys/time.h>
280	#include <sys/ioctl.h>
281	#include <net/bpf.h>
282.Ed
283.Pp
284Additionally,
285.Dv BIOCGETIF
286and
287.Dv BIOCSETIF
288require
289.In sys/socket.h
290and
291.In net/if.h .
292.Pp
293In addition to
294.Dv FIONREAD
295and
296.Dv SIOCGIFADDR ,
297the following commands may be applied to any open
298.Nm
299file.
300The (third) argument to
301.Xr ioctl 2
302should be a pointer to the type indicated.
303.Bl -tag -width BIOCGETBUFMODE
304.It Dv BIOCGBLEN
305.Pq Li u_int
306Returns the required buffer length for reads on
307.Nm
308files.
309.It Dv BIOCSBLEN
310.Pq Li u_int
311Sets the buffer length for reads on
312.Nm
313files.
314The buffer must be set before the file is attached to an interface
315with
316.Dv BIOCSETIF .
317If the requested buffer size cannot be accommodated, the closest
318allowable size will be set and returned in the argument.
319A read call will result in
320.Er EIO
321if it is passed a buffer that is not this size.
322.It Dv BIOCGDLT
323.Pq Li u_int
324Returns the type of the data link layer underlying the attached interface.
325.Er EINVAL
326is returned if no interface has been specified.
327The device types, prefixed with
328.Dq Li DLT_ ,
329are defined in
330.In net/bpf.h .
331.It Dv BIOCPROMISC
332Forces the interface into promiscuous mode.
333All packets, not just those destined for the local host, are processed.
334Since more than one file can be listening on a given interface,
335a listener that opened its interface non-promiscuously may receive
336packets promiscuously.
337This problem can be remedied with an appropriate filter.
338.It Dv BIOCFLUSH
339Flushes the buffer of incoming packets,
340and resets the statistics that are returned by BIOCGSTATS.
341.It Dv BIOCGETIF
342.Pq Li "struct ifreq"
343Returns the name of the hardware interface that the file is listening on.
344The name is returned in the ifr_name field of
345the
346.Li ifreq
347structure.
348All other fields are undefined.
349.It Dv BIOCSETIF
350.Pq Li "struct ifreq"
351Sets the hardware interface associate with the file.
352This
353command must be performed before any packets can be read.
354The device is indicated by name using the
355.Li ifr_name
356field of the
357.Li ifreq
358structure.
359Additionally, performs the actions of
360.Dv BIOCFLUSH .
361.It Dv BIOCSRTIMEOUT
362.It Dv BIOCGRTIMEOUT
363.Pq Li "struct timeval"
364Set or get the read timeout parameter.
365The argument
366specifies the length of time to wait before timing
367out on a read request.
368This parameter is initialized to zero by
369.Xr open 2 ,
370indicating no timeout.
371.It Dv BIOCGSTATS
372.Pq Li "struct bpf_stat"
373Returns the following structure of packet statistics:
374.Bd -literal
375struct bpf_stat {
376	u_int bs_recv;    /* number of packets received */
377	u_int bs_drop;    /* number of packets dropped */
378};
379.Ed
380.Pp
381The fields are:
382.Bl -hang -offset indent
383.It Li bs_recv
384the number of packets received by the descriptor since opened or reset
385(including any buffered since the last read call);
386and
387.It Li bs_drop
388the number of packets which were accepted by the filter but dropped by the
389kernel because of buffer overflows
390(i.e., the application's reads are not keeping up with the packet traffic).
391.El
392.It Dv BIOCIMMEDIATE
393.Pq Li u_int
394Enable or disable
395.Dq immediate mode ,
396based on the truth value of the argument.
397When immediate mode is enabled, reads return immediately upon packet
398reception.
399Otherwise, a read will block until either the kernel buffer
400becomes full or a timeout occurs.
401This is useful for programs like
402.Xr rarpd 8
403which must respond to messages in real time.
404The default for a new file is off.
405.It Dv BIOCSETF
406.It Dv BIOCSETFNR
407.Pq Li "struct bpf_program"
408Sets the read filter program used by the kernel to discard uninteresting
409packets.
410An array of instructions and its length is passed in using
411the following structure:
412.Bd -literal
413struct bpf_program {
414	int bf_len;
415	struct bpf_insn *bf_insns;
416};
417.Ed
418.Pp
419The filter program is pointed to by the
420.Li bf_insns
421field while its length in units of
422.Sq Li struct bpf_insn
423is given by the
424.Li bf_len
425field.
426See section
427.Sx "FILTER MACHINE"
428for an explanation of the filter language.
429The only difference between
430.Dv BIOCSETF
431and
432.Dv BIOCSETFNR
433is
434.Dv BIOCSETF
435performs the actions of
436.Dv BIOCFLUSH
437while
438.Dv BIOCSETFNR
439does not.
440.It Dv BIOCSETWF
441.Pq Li "struct bpf_program"
442Sets the write filter program used by the kernel to control what type of
443packets can be written to the interface.
444See the
445.Dv BIOCSETF
446command for more
447information on the
448.Nm
449filter program.
450.It Dv BIOCVERSION
451.Pq Li "struct bpf_version"
452Returns the major and minor version numbers of the filter language currently
453recognized by the kernel.
454Before installing a filter, applications must check
455that the current version is compatible with the running kernel.
456Version numbers are compatible if the major numbers match and the application minor
457is less than or equal to the kernel minor.
458The kernel version number is returned in the following structure:
459.Bd -literal
460struct bpf_version {
461        u_short bv_major;
462        u_short bv_minor;
463};
464.Ed
465.Pp
466The current version numbers are given by
467.Dv BPF_MAJOR_VERSION
468and
469.Dv BPF_MINOR_VERSION
470from
471.In net/bpf.h .
472An incompatible filter
473may result in undefined behavior (most likely, an error returned by
474.Fn ioctl
475or haphazard packet matching).
476.It Dv BIOCSHDRCMPLT
477.It Dv BIOCGHDRCMPLT
478.Pq Li u_int
479Set or get the status of the
480.Dq header complete
481flag.
482Set to zero if the link level source address should be filled in automatically
483by the interface output routine.
484Set to one if the link level source
485address will be written, as provided, to the wire.
486This flag is initialized to zero by default.
487.It Dv BIOCSSEESENT
488.It Dv BIOCGSEESENT
489.Pq Li u_int
490These commands are obsolete but left for compatibility.
491Use
492.Dv BIOCSDIRECTION
493and
494.Dv BIOCGDIRECTION
495instead.
496Set or get the flag determining whether locally generated packets on the
497interface should be returned by BPF.
498Set to zero to see only incoming packets on the interface.
499Set to one to see packets originating locally and remotely on the interface.
500This flag is initialized to one by default.
501.It Dv BIOCSDIRECTION
502.It Dv BIOCGDIRECTION
503.Pq Li u_int
504Set or get the setting determining whether incoming, outgoing, or all packets
505on the interface should be returned by BPF.
506Set to
507.Dv BPF_D_IN
508to see only incoming packets on the interface.
509Set to
510.Dv BPF_D_INOUT
511to see packets originating locally and remotely on the interface.
512Set to
513.Dv BPF_D_OUT
514to see only outgoing packets on the interface.
515This setting is initialized to
516.Dv BPF_D_INOUT
517by default.
518.It Dv BIOCFEEDBACK
519.Pq Li u_int
520Set packet feedback mode.
521This allows injected packets to be fed back as input to the interface when
522output via the interface is successful.
523When
524.Dv BPF_D_INOUT
525direction is set, injected outgoing packet is not returned by BPF to avoid
526duplication. This flag is initialized to zero by default.
527.It Dv BIOCLOCK
528Set the locked flag on the
529.Nm
530descriptor.
531This prevents the execution of
532ioctl commands which could change the underlying operating parameters of
533the device.
534.It Dv BIOCGETBUFMODE
535.It Dv BIOCSETBUFMODE
536.Pq Li u_int
537Get or set the current
538.Nm
539buffering mode; possible values are
540.Dv BPF_BUFMODE_BUFFER ,
541buffered read mode, and
542.Dv BPF_BUFMODE_ZBUF ,
543zero-copy buffer mode.
544.It Dv BIOCSETZBUF
545.Pq Li struct bpf_zbuf
546Set the current zero-copy buffer locations; buffer locations may be
547set only once zero-copy buffer mode has been selected, and prior to attaching
548to an interface.
549Buffers must be of identical size, page-aligned, and an integer multiple of
550pages in size.
551The three fields
552.Vt bz_bufa ,
553.Vt bz_bufb ,
554and
555.Vt bz_buflen
556must be filled out.
557If buffers have already been set for this device, the ioctl will fail.
558.It Dv BIOCGETZMAX
559.Pq Li size_t
560Get the largest individual zero-copy buffer size allowed.
561As two buffers are used in zero-copy buffer mode, the limit (in practice) is
562twice the returned size.
563As zero-copy buffers consume kernel address space, conservative selection of
564buffer size is suggested, especially when there are multiple
565.Nm
566descriptors in use on 32-bit systems.
567.It Dv BIOCROTZBUF
568Force ownership of the next buffer to be assigned to userspace, if any data
569present in the buffer.
570If no data is present, the buffer will remain owned by the kernel.
571This allows consumers of zero-copy buffering to implement timeouts and
572retrieve partially filled buffers.
573In order to handle the case where no data is present in the buffer and
574therefore ownership is not assigned, the user process must check
575.Vt bzh_kernel_gen
576against
577.Vt bzh_user_gen .
578.El
579.Sh BPF HEADER
580The following structure is prepended to each packet returned by
581.Xr read 2
582or via a zero-copy buffer:
583.Bd -literal
584struct bpf_hdr {
585        struct timeval bh_tstamp;     /* time stamp */
586        u_long bh_caplen;             /* length of captured portion */
587        u_long bh_datalen;            /* original length of packet */
588        u_short bh_hdrlen;            /* length of bpf header (this struct
589					 plus alignment padding */
590};
591.Ed
592.Pp
593The fields, whose values are stored in host order, and are:
594.Pp
595.Bl -tag -compact -width bh_datalen
596.It Li bh_tstamp
597The time at which the packet was processed by the packet filter.
598.It Li bh_caplen
599The length of the captured portion of the packet.
600This is the minimum of
601the truncation amount specified by the filter and the length of the packet.
602.It Li bh_datalen
603The length of the packet off the wire.
604This value is independent of the truncation amount specified by the filter.
605.It Li bh_hdrlen
606The length of the
607.Nm
608header, which may not be equal to
609.\" XXX - not really a function call
610.Fn sizeof "struct bpf_hdr" .
611.El
612.Pp
613The
614.Li bh_hdrlen
615field exists to account for
616padding between the header and the link level protocol.
617The purpose here is to guarantee proper alignment of the packet
618data structures, which is required on alignment sensitive
619architectures and improves performance on many other architectures.
620The packet filter insures that the
621.Li bpf_hdr
622and the network layer
623header will be word aligned.
624Suitable precautions
625must be taken when accessing the link layer protocol fields on alignment
626restricted machines.
627(This is not a problem on an Ethernet, since
628the type field is a short falling on an even offset,
629and the addresses are probably accessed in a bytewise fashion).
630.Pp
631Additionally, individual packets are padded so that each starts
632on a word boundary.
633This requires that an application
634has some knowledge of how to get from packet to packet.
635The macro
636.Dv BPF_WORDALIGN
637is defined in
638.In net/bpf.h
639to facilitate
640this process.
641It rounds up its argument to the nearest word aligned value (where a word is
642.Dv BPF_ALIGNMENT
643bytes wide).
644.Pp
645For example, if
646.Sq Li p
647points to the start of a packet, this expression
648will advance it to the next packet:
649.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
650.Pp
651For the alignment mechanisms to work properly, the
652buffer passed to
653.Xr read 2
654must itself be word aligned.
655The
656.Xr malloc 3
657function
658will always return an aligned buffer.
659.Sh FILTER MACHINE
660A filter program is an array of instructions, with all branches forwardly
661directed, terminated by a
662.Em return
663instruction.
664Each instruction performs some action on the pseudo-machine state,
665which consists of an accumulator, index register, scratch memory store,
666and implicit program counter.
667.Pp
668The following structure defines the instruction format:
669.Bd -literal
670struct bpf_insn {
671	u_short	code;
672	u_char 	jt;
673	u_char 	jf;
674	u_long k;
675};
676.Ed
677.Pp
678The
679.Li k
680field is used in different ways by different instructions,
681and the
682.Li jt
683and
684.Li jf
685fields are used as offsets
686by the branch instructions.
687The opcodes are encoded in a semi-hierarchical fashion.
688There are eight classes of instructions:
689.Dv BPF_LD ,
690.Dv BPF_LDX ,
691.Dv BPF_ST ,
692.Dv BPF_STX ,
693.Dv BPF_ALU ,
694.Dv BPF_JMP ,
695.Dv BPF_RET ,
696and
697.Dv BPF_MISC .
698Various other mode and
699operator bits are or'd into the class to give the actual instructions.
700The classes and modes are defined in
701.In net/bpf.h .
702.Pp
703Below are the semantics for each defined
704.Nm
705instruction.
706We use the convention that A is the accumulator, X is the index register,
707P[] packet data, and M[] scratch memory store.
708P[i:n] gives the data at byte offset
709.Dq i
710in the packet,
711interpreted as a word (n=4),
712unsigned halfword (n=2), or unsigned byte (n=1).
713M[i] gives the i'th word in the scratch memory store, which is only
714addressed in word units.
715The memory store is indexed from 0 to
716.Dv BPF_MEMWORDS
717- 1.
718.Li k ,
719.Li jt ,
720and
721.Li jf
722are the corresponding fields in the
723instruction definition.
724.Dq len
725refers to the length of the packet.
726.Pp
727.Bl -tag -width BPF_STXx
728.It Dv BPF_LD
729These instructions copy a value into the accumulator.
730The type of the source operand is specified by an
731.Dq addressing mode
732and can be a constant
733.Pq Dv BPF_IMM ,
734packet data at a fixed offset
735.Pq Dv BPF_ABS ,
736packet data at a variable offset
737.Pq Dv BPF_IND ,
738the packet length
739.Pq Dv BPF_LEN ,
740or a word in the scratch memory store
741.Pq Dv BPF_MEM .
742For
743.Dv BPF_IND
744and
745.Dv BPF_ABS ,
746the data size must be specified as a word
747.Pq Dv BPF_W ,
748halfword
749.Pq Dv BPF_H ,
750or byte
751.Pq Dv BPF_B .
752The semantics of all the recognized
753.Dv BPF_LD
754instructions follow.
755.Pp
756.Bd -literal
757BPF_LD+BPF_W+BPF_ABS	A <- P[k:4]
758BPF_LD+BPF_H+BPF_ABS	A <- P[k:2]
759BPF_LD+BPF_B+BPF_ABS	A <- P[k:1]
760BPF_LD+BPF_W+BPF_IND	A <- P[X+k:4]
761BPF_LD+BPF_H+BPF_IND	A <- P[X+k:2]
762BPF_LD+BPF_B+BPF_IND	A <- P[X+k:1]
763BPF_LD+BPF_W+BPF_LEN	A <- len
764BPF_LD+BPF_IMM		A <- k
765BPF_LD+BPF_MEM		A <- M[k]
766.Ed
767.It Dv BPF_LDX
768These instructions load a value into the index register.
769Note that
770the addressing modes are more restrictive than those of the accumulator loads,
771but they include
772.Dv BPF_MSH ,
773a hack for efficiently loading the IP header length.
774.Pp
775.Bd -literal
776BPF_LDX+BPF_W+BPF_IMM	X <- k
777BPF_LDX+BPF_W+BPF_MEM	X <- M[k]
778BPF_LDX+BPF_W+BPF_LEN	X <- len
779BPF_LDX+BPF_B+BPF_MSH	X <- 4*(P[k:1]&0xf)
780.Ed
781.It Dv BPF_ST
782This instruction stores the accumulator into the scratch memory.
783We do not need an addressing mode since there is only one possibility
784for the destination.
785.Pp
786.Bd -literal
787BPF_ST			M[k] <- A
788.Ed
789.It Dv BPF_STX
790This instruction stores the index register in the scratch memory store.
791.Pp
792.Bd -literal
793BPF_STX			M[k] <- X
794.Ed
795.It Dv BPF_ALU
796The alu instructions perform operations between the accumulator and
797index register or constant, and store the result back in the accumulator.
798For binary operations, a source mode is required
799.Dv ( BPF_K
800or
801.Dv BPF_X ) .
802.Pp
803.Bd -literal
804BPF_ALU+BPF_ADD+BPF_K	A <- A + k
805BPF_ALU+BPF_SUB+BPF_K	A <- A - k
806BPF_ALU+BPF_MUL+BPF_K	A <- A * k
807BPF_ALU+BPF_DIV+BPF_K	A <- A / k
808BPF_ALU+BPF_AND+BPF_K	A <- A & k
809BPF_ALU+BPF_OR+BPF_K	A <- A | k
810BPF_ALU+BPF_LSH+BPF_K	A <- A << k
811BPF_ALU+BPF_RSH+BPF_K	A <- A >> k
812BPF_ALU+BPF_ADD+BPF_X	A <- A + X
813BPF_ALU+BPF_SUB+BPF_X	A <- A - X
814BPF_ALU+BPF_MUL+BPF_X	A <- A * X
815BPF_ALU+BPF_DIV+BPF_X	A <- A / X
816BPF_ALU+BPF_AND+BPF_X	A <- A & X
817BPF_ALU+BPF_OR+BPF_X	A <- A | X
818BPF_ALU+BPF_LSH+BPF_X	A <- A << X
819BPF_ALU+BPF_RSH+BPF_X	A <- A >> X
820BPF_ALU+BPF_NEG		A <- -A
821.Ed
822.It Dv BPF_JMP
823The jump instructions alter flow of control.
824Conditional jumps
825compare the accumulator against a constant
826.Pq Dv BPF_K
827or the index register
828.Pq Dv BPF_X .
829If the result is true (or non-zero),
830the true branch is taken, otherwise the false branch is taken.
831Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
832However, the jump always
833.Pq Dv BPF_JA
834opcode uses the 32 bit
835.Li k
836field as the offset, allowing arbitrarily distant destinations.
837All conditionals use unsigned comparison conventions.
838.Pp
839.Bd -literal
840BPF_JMP+BPF_JA		pc += k
841BPF_JMP+BPF_JGT+BPF_K	pc += (A > k) ? jt : jf
842BPF_JMP+BPF_JGE+BPF_K	pc += (A >= k) ? jt : jf
843BPF_JMP+BPF_JEQ+BPF_K	pc += (A == k) ? jt : jf
844BPF_JMP+BPF_JSET+BPF_K	pc += (A & k) ? jt : jf
845BPF_JMP+BPF_JGT+BPF_X	pc += (A > X) ? jt : jf
846BPF_JMP+BPF_JGE+BPF_X	pc += (A >= X) ? jt : jf
847BPF_JMP+BPF_JEQ+BPF_X	pc += (A == X) ? jt : jf
848BPF_JMP+BPF_JSET+BPF_X	pc += (A & X) ? jt : jf
849.Ed
850.It Dv BPF_RET
851The return instructions terminate the filter program and specify the amount
852of packet to accept (i.e., they return the truncation amount).
853A return value of zero indicates that the packet should be ignored.
854The return value is either a constant
855.Pq Dv BPF_K
856or the accumulator
857.Pq Dv BPF_A .
858.Pp
859.Bd -literal
860BPF_RET+BPF_A		accept A bytes
861BPF_RET+BPF_K		accept k bytes
862.Ed
863.It Dv BPF_MISC
864The miscellaneous category was created for anything that does not
865fit into the above classes, and for any new instructions that might need to
866be added.
867Currently, these are the register transfer instructions
868that copy the index register to the accumulator or vice versa.
869.Pp
870.Bd -literal
871BPF_MISC+BPF_TAX	X <- A
872BPF_MISC+BPF_TXA	A <- X
873.Ed
874.El
875.Pp
876The
877.Nm
878interface provides the following macros to facilitate
879array initializers:
880.Fn BPF_STMT opcode operand
881and
882.Fn BPF_JUMP opcode operand true_offset false_offset .
883.Sh FILES
884.Bl -tag -compact -width /dev/bpfXXX
885.It Pa /dev/bpf Ns Sy n
886the packet filter device
887.El
888.Sh EXAMPLES
889The following filter is taken from the Reverse ARP Daemon.
890It accepts only Reverse ARP requests.
891.Bd -literal
892struct bpf_insn insns[] = {
893	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
894	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
895	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
896	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
897	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
898		 sizeof(struct ether_header)),
899	BPF_STMT(BPF_RET+BPF_K, 0),
900};
901.Ed
902.Pp
903This filter accepts only IP packets between host 128.3.112.15 and
904128.3.112.35.
905.Bd -literal
906struct bpf_insn insns[] = {
907	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
908	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
909	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
910	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
911	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
912	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
913	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
914	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
915	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
916	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
917	BPF_STMT(BPF_RET+BPF_K, 0),
918};
919.Ed
920.Pp
921Finally, this filter returns only TCP finger packets.
922We must parse the IP header to reach the TCP header.
923The
924.Dv BPF_JSET
925instruction
926checks that the IP fragment offset is 0 so we are sure
927that we have a TCP header.
928.Bd -literal
929struct bpf_insn insns[] = {
930	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
931	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
932	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
933	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
934	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
935	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
936	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
937	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
938	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
939	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
940	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
941	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
942	BPF_STMT(BPF_RET+BPF_K, 0),
943};
944.Ed
945.Sh SEE ALSO
946.Xr tcpdump 1 ,
947.Xr ioctl 2 ,
948.Xr kqueue 2 ,
949.Xr poll 2 ,
950.Xr select 2 ,
951.Xr byteorder 3 ,
952.Xr ng_bpf 4 ,
953.Xr bpf 9
954.Rs
955.%A McCanne, S.
956.%A Jacobson V.
957.%T "An efficient, extensible, and portable network monitor"
958.Re
959.Sh HISTORY
960The Enet packet filter was created in 1980 by Mike Accetta and
961Rick Rashid at Carnegie-Mellon University.
962Jeffrey Mogul, at
963Stanford, ported the code to
964.Bx
965and continued its development from
9661983 on.
967Since then, it has evolved into the Ultrix Packet Filter at
968.Tn DEC ,
969a
970.Tn STREAMS
971.Tn NIT
972module under
973.Tn SunOS 4.1 ,
974and
975.Tn BPF .
976.Sh AUTHORS
977.An -nosplit
978.An Steven McCanne ,
979of Lawrence Berkeley Laboratory, implemented BPF in
980Summer 1990.
981Much of the design is due to
982.An Van Jacobson .
983.Pp
984Support for zero-copy buffers was added by
985.An Robert N. M. Watson
986under contract to Seccuris Inc.
987.Sh BUGS
988The read buffer must be of a fixed size (returned by the
989.Dv BIOCGBLEN
990ioctl).
991.Pp
992A file that does not request promiscuous mode may receive promiscuously
993received packets as a side effect of another file requesting this
994mode on the same hardware interface.
995This could be fixed in the kernel with additional processing overhead.
996However, we favor the model where
997all files must assume that the interface is promiscuous, and if
998so desired, must utilize a filter to reject foreign packets.
999.Pp
1000Data link protocols with variable length headers are not currently supported.
1001.Pp
1002The
1003.Dv SEESENT ,
1004.Dv DIRECTION ,
1005and
1006.Dv FEEDBACK
1007settings have been observed to work incorrectly on some interface
1008types, including those with hardware loopback rather than software loopback,
1009and point-to-point interfaces.
1010They appear to function correctly on a
1011broad range of Ethernet-style interfaces.
1012