1.\" Copyright (c) 2007 Seccuris Inc. 2.\" All rights reserved. 3.\" 4.\" This sofware was developed by Robert N. M. Watson under contract to 5.\" Seccuris Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" Copyright (c) 1990 The Regents of the University of California. 29.\" All rights reserved. 30.\" 31.\" Redistribution and use in source and binary forms, with or without 32.\" modification, are permitted provided that: (1) source code distributions 33.\" retain the above copyright notice and this paragraph in its entirety, (2) 34.\" distributions including binary code include the above copyright notice and 35.\" this paragraph in its entirety in the documentation or other materials 36.\" provided with the distribution, and (3) all advertising materials mentioning 37.\" features or use of this software display the following acknowledgement: 38.\" ``This product includes software developed by the University of California, 39.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 40.\" the University nor the names of its contributors may be used to endorse 41.\" or promote products derived from this software without specific prior 42.\" written permission. 43.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 44.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 45.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 46.\" 47.\" This document is derived in part from the enet man page (enet.4) 48.\" distributed with 4.3BSD Unix. 49.\" 50.\" $FreeBSD$ 51.\" 52.Dd February 26, 2007 53.Dt BPF 4 54.Os 55.Sh NAME 56.Nm bpf 57.Nd Berkeley Packet Filter 58.Sh SYNOPSIS 59.Cd device bpf 60.Sh DESCRIPTION 61The Berkeley Packet Filter 62provides a raw interface to data link layers in a protocol 63independent fashion. 64All packets on the network, even those destined for other hosts, 65are accessible through this mechanism. 66.Pp 67The packet filter appears as a character special device, 68.Pa /dev/bpf0 , 69.Pa /dev/bpf1 , 70etc. 71After opening the device, the file descriptor must be bound to a 72specific network interface with the 73.Dv BIOCSETIF 74ioctl. 75A given interface can be shared by multiple listeners, and the filter 76underlying each descriptor will see an identical packet stream. 77.Pp 78A separate device file is required for each minor device. 79If a file is in use, the open will fail and 80.Va errno 81will be set to 82.Er EBUSY . 83.Pp 84Associated with each open instance of a 85.Nm 86file is a user-settable packet filter. 87Whenever a packet is received by an interface, 88all file descriptors listening on that interface apply their filter. 89Each descriptor that accepts the packet receives its own copy. 90.Pp 91The packet filter will support any link level protocol that has fixed length 92headers. 93Currently, only Ethernet, 94.Tn SLIP , 95and 96.Tn PPP 97drivers have been modified to interact with 98.Nm . 99.Pp 100Since packet data is in network byte order, applications should use the 101.Xr byteorder 3 102macros to extract multi-byte values. 103.Pp 104A packet can be sent out on the network by writing to a 105.Nm 106file descriptor. 107The writes are unbuffered, meaning only one packet can be processed per write. 108Currently, only writes to Ethernets and 109.Tn SLIP 110links are supported. 111.Sh BUFFER MODES 112.Nm 113devices deliver packet data to the application via memory buffers provided by 114the application. 115The buffer mode is set using the 116.Dv BIOCSETBUFMODE 117ioctl, and read using the 118.Dv BIOCGETBUFMODE 119ioctl. 120.Ss Buffered read mode 121By default, 122.Nm 123devices operate in the 124.Dv BPF_BUFMODE_BUFFER 125mode, in which packet data is copied explicitly from kernel to user memory 126using the 127.Xr read 2 128system call. 129The user process will declare a fixed buffer size that will be used both for 130sizing internal buffers and for all 131.Xr read 2 132operations on the file. 133This size is queried using the 134.Dv BIOCGBLEN 135ioctl, and is set using the 136.Dv BIOCSBLEN 137ioctl. 138Note that an individual packet larger than the buffer size is necessarily 139truncated. 140.Ss Zero-copy buffer mode 141.Nm 142devices may also operate in the 143.Dv BPF_BUFMODE_ZEROCOPY 144mode, in which packet data is written directly into two user memory buffers 145by the kernel, avoiding both system call and copying overhead. 146Buffers are of fixed (and equal) size, page-aligned, and an even multiple of 147the page size. 148The maximum zero-copy buffer size is returned by the 149.Dv BIOCGETZMAX 150ioctl. 151Note that an individual packet larger than the buffer size is necessarily 152truncated. 153.Pp 154The user process registers two memory buffers using the 155.Dv BIOCSETZBUF 156ioctl, which accepts a 157.Vt struct bpf_zbuf 158pointer as an argument: 159.Bd -literal 160struct bpf_zbuf { 161 void *bz_bufa; 162 void *bz_bufb; 163 size_t bz_buflen; 164}; 165.Ed 166.Pp 167.Vt bz_bufa 168is a pointer to the userspace address of the first buffer that will be 169filled, and 170.Vt bz_bufb 171is a pointer to the second buffer. 172.Nm 173will then cycle between the two buffers as they fill and are acknowledged. 174.Pp 175Each buffer begins with a fixed-length header to hold synchronization and 176data length information for the buffer: 177.Bd -literal 178struct bpf_zbuf_header { 179 volatile u_int bzh_kernel_gen; /* Kernel generation number. */ 180 volatile u_int bzh_kernel_len; /* Length of data in the buffer. */ 181 volatile u_int bzh_user_gen; /* User generation number. */ 182 /* ...padding for future use... */ 183}; 184.Ed 185.Pp 186The header structure of each buffer, including all padding, should be zeroed 187before it is configured using 188.Dv BIOCSETZBUF . 189Remaining space in the buffer will be used by the kernel to store packet 190data, laid out in the same format as with buffered read mode. 191.Pp 192The kernel and the user process follow a simple acknowledgement protocol via 193the buffer header to synchronize access to the buffer: when the header 194generation numbers, 195.Vt bzh_kernel_gen 196and 197.Vt bzh_user_gen , 198hold the same value, the kernel owns the buffer, and when they differ, 199userspace owns the buffer. 200.Pp 201While the kernel owns the buffer, the contents are unstable and may change 202asynchronously; while the user process owns the buffer, its contents are 203stable and will not be changed until the buffer has been acknowledged. 204.Pp 205Initializing the buffer headers to all 0's before registering the buffer has 206the effect of assigning initial ownership of both buffers to the kernel. 207The kernel signals that a buffer has been assigned to userspace by modifying 208.Vt bzh_kernel_gen , 209and userspace acknowledges the buffer and returns it to the kernel by setting 210the value of 211.Vt bzh_user_gen 212to the value of 213.Vt bzh_kernel_gen . 214.Pp 215In order to avoid caching and memory re-ordering effects, the user process 216must use atomic operations and memory barriers when checking for and 217acknowledging buffers: 218.Bd -literal 219#include <machine/atomic.h> 220 221/* 222 * Return ownership of a buffer to the kernel for reuse. 223 */ 224static void 225buffer_acknowledge(struct bpf_zbuf_header *bzh) 226{ 227 228 atomic_store_rel_int(&bzh->bzh_user_gen, bzh->bzh_kernel_gen); 229} 230 231/* 232 * Check whether a buffer has been assigned to userspace by the kernel. 233 * Return true if userspace owns the buffer, and false otherwise. 234 */ 235static int 236buffer_check(struct bpf_zbuf_header *bzh) 237{ 238 239 return (bzh->bzh_user_gen != 240 atomic_load_acq_int(&bzh->bzh_kernel_gen)); 241} 242.Ed 243.Pp 244The user process may force the assignment of the next buffer, if any data 245is pending, to userspace using the 246.Dv BIOCROTZBUF 247ioctl. 248This allows the user process to retrieve data in a partially filled buffer 249before the buffer is full, such as following a timeout; the process must 250recheck for buffer ownership using the header generation numbers, as the 251buffer will not be assigned to userspace if no data was present. 252.Pp 253As in the buffered read mode, 254.Xr kqueue 2 , 255.Xr poll 2 , 256and 257.Xr select 2 258may be used to sleep awaiting the availbility of a completed buffer. 259They will return a readable file descriptor when ownership of the next buffer 260is assigned to user space. 261.Pp 262In the current implementation, the kernel may assign zero, one, or both 263buffers to the user process; however, an earlier implementation maintained 264the invariant that at most one buffer could be assigned to the user process 265at a time. 266In order to both ensure progress and high performance, user processes should 267acknowledge a completely processed buffer as quickly as possible, returning 268it for reuse, and not block waiting on a second buffer while holding another 269buffer. 270.Sh IOCTLS 271The 272.Xr ioctl 2 273command codes below are defined in 274.In net/bpf.h . 275All commands require 276these includes: 277.Bd -literal 278 #include <sys/types.h> 279 #include <sys/time.h> 280 #include <sys/ioctl.h> 281 #include <net/bpf.h> 282.Ed 283.Pp 284Additionally, 285.Dv BIOCGETIF 286and 287.Dv BIOCSETIF 288require 289.In sys/socket.h 290and 291.In net/if.h . 292.Pp 293In addition to 294.Dv FIONREAD 295and 296.Dv SIOCGIFADDR , 297the following commands may be applied to any open 298.Nm 299file. 300The (third) argument to 301.Xr ioctl 2 302should be a pointer to the type indicated. 303.Bl -tag -width BIOCGETBUFMODE 304.It Dv BIOCGBLEN 305.Pq Li u_int 306Returns the required buffer length for reads on 307.Nm 308files. 309.It Dv BIOCSBLEN 310.Pq Li u_int 311Sets the buffer length for reads on 312.Nm 313files. 314The buffer must be set before the file is attached to an interface 315with 316.Dv BIOCSETIF . 317If the requested buffer size cannot be accommodated, the closest 318allowable size will be set and returned in the argument. 319A read call will result in 320.Er EIO 321if it is passed a buffer that is not this size. 322.It Dv BIOCGDLT 323.Pq Li u_int 324Returns the type of the data link layer underlying the attached interface. 325.Er EINVAL 326is returned if no interface has been specified. 327The device types, prefixed with 328.Dq Li DLT_ , 329are defined in 330.In net/bpf.h . 331.It Dv BIOCPROMISC 332Forces the interface into promiscuous mode. 333All packets, not just those destined for the local host, are processed. 334Since more than one file can be listening on a given interface, 335a listener that opened its interface non-promiscuously may receive 336packets promiscuously. 337This problem can be remedied with an appropriate filter. 338.It Dv BIOCFLUSH 339Flushes the buffer of incoming packets, 340and resets the statistics that are returned by BIOCGSTATS. 341.It Dv BIOCGETIF 342.Pq Li "struct ifreq" 343Returns the name of the hardware interface that the file is listening on. 344The name is returned in the ifr_name field of 345the 346.Li ifreq 347structure. 348All other fields are undefined. 349.It Dv BIOCSETIF 350.Pq Li "struct ifreq" 351Sets the hardware interface associate with the file. 352This 353command must be performed before any packets can be read. 354The device is indicated by name using the 355.Li ifr_name 356field of the 357.Li ifreq 358structure. 359Additionally, performs the actions of 360.Dv BIOCFLUSH . 361.It Dv BIOCSRTIMEOUT 362.It Dv BIOCGRTIMEOUT 363.Pq Li "struct timeval" 364Set or get the read timeout parameter. 365The argument 366specifies the length of time to wait before timing 367out on a read request. 368This parameter is initialized to zero by 369.Xr open 2 , 370indicating no timeout. 371.It Dv BIOCGSTATS 372.Pq Li "struct bpf_stat" 373Returns the following structure of packet statistics: 374.Bd -literal 375struct bpf_stat { 376 u_int bs_recv; /* number of packets received */ 377 u_int bs_drop; /* number of packets dropped */ 378}; 379.Ed 380.Pp 381The fields are: 382.Bl -hang -offset indent 383.It Li bs_recv 384the number of packets received by the descriptor since opened or reset 385(including any buffered since the last read call); 386and 387.It Li bs_drop 388the number of packets which were accepted by the filter but dropped by the 389kernel because of buffer overflows 390(i.e., the application's reads are not keeping up with the packet traffic). 391.El 392.It Dv BIOCIMMEDIATE 393.Pq Li u_int 394Enable or disable 395.Dq immediate mode , 396based on the truth value of the argument. 397When immediate mode is enabled, reads return immediately upon packet 398reception. 399Otherwise, a read will block until either the kernel buffer 400becomes full or a timeout occurs. 401This is useful for programs like 402.Xr rarpd 8 403which must respond to messages in real time. 404The default for a new file is off. 405.It Dv BIOCSETF 406.It Dv BIOCSETFNR 407.Pq Li "struct bpf_program" 408Sets the read filter program used by the kernel to discard uninteresting 409packets. 410An array of instructions and its length is passed in using 411the following structure: 412.Bd -literal 413struct bpf_program { 414 int bf_len; 415 struct bpf_insn *bf_insns; 416}; 417.Ed 418.Pp 419The filter program is pointed to by the 420.Li bf_insns 421field while its length in units of 422.Sq Li struct bpf_insn 423is given by the 424.Li bf_len 425field. 426See section 427.Sx "FILTER MACHINE" 428for an explanation of the filter language. 429The only difference between 430.Dv BIOCSETF 431and 432.Dv BIOCSETFNR 433is 434.Dv BIOCSETF 435performs the actions of 436.Dv BIOCFLUSH 437while 438.Dv BIOCSETFNR 439does not. 440.It Dv BIOCSETWF 441.Pq Li "struct bpf_program" 442Sets the write filter program used by the kernel to control what type of 443packets can be written to the interface. 444See the 445.Dv BIOCSETF 446command for more 447information on the 448.Nm 449filter program. 450.It Dv BIOCVERSION 451.Pq Li "struct bpf_version" 452Returns the major and minor version numbers of the filter language currently 453recognized by the kernel. 454Before installing a filter, applications must check 455that the current version is compatible with the running kernel. 456Version numbers are compatible if the major numbers match and the application minor 457is less than or equal to the kernel minor. 458The kernel version number is returned in the following structure: 459.Bd -literal 460struct bpf_version { 461 u_short bv_major; 462 u_short bv_minor; 463}; 464.Ed 465.Pp 466The current version numbers are given by 467.Dv BPF_MAJOR_VERSION 468and 469.Dv BPF_MINOR_VERSION 470from 471.In net/bpf.h . 472An incompatible filter 473may result in undefined behavior (most likely, an error returned by 474.Fn ioctl 475or haphazard packet matching). 476.It Dv BIOCSHDRCMPLT 477.It Dv BIOCGHDRCMPLT 478.Pq Li u_int 479Set or get the status of the 480.Dq header complete 481flag. 482Set to zero if the link level source address should be filled in automatically 483by the interface output routine. 484Set to one if the link level source 485address will be written, as provided, to the wire. 486This flag is initialized to zero by default. 487.It Dv BIOCSSEESENT 488.It Dv BIOCGSEESENT 489.Pq Li u_int 490These commands are obsolete but left for compatibility. 491Use 492.Dv BIOCSDIRECTION 493and 494.Dv BIOCGDIRECTION 495instead. 496Set or get the flag determining whether locally generated packets on the 497interface should be returned by BPF. 498Set to zero to see only incoming packets on the interface. 499Set to one to see packets originating locally and remotely on the interface. 500This flag is initialized to one by default. 501.It Dv BIOCSDIRECTION 502.It Dv BIOCGDIRECTION 503.Pq Li u_int 504Set or get the setting determining whether incoming, outgoing, or all packets 505on the interface should be returned by BPF. 506Set to 507.Dv BPF_D_IN 508to see only incoming packets on the interface. 509Set to 510.Dv BPF_D_INOUT 511to see packets originating locally and remotely on the interface. 512Set to 513.Dv BPF_D_OUT 514to see only outgoing packets on the interface. 515This setting is initialized to 516.Dv BPF_D_INOUT 517by default. 518.It Dv BIOCFEEDBACK 519.Pq Li u_int 520Set packet feedback mode. 521This allows injected packets to be fed back as input to the interface when 522output via the interface is successful. 523When 524.Dv BPF_D_INOUT 525direction is set, injected outgoing packet is not returned by BPF to avoid 526duplication. This flag is initialized to zero by default. 527.It Dv BIOCLOCK 528Set the locked flag on the 529.Nm 530descriptor. 531This prevents the execution of 532ioctl commands which could change the underlying operating parameters of 533the device. 534.It Dv BIOCGETBUFMODE 535.It Dv BIOCSETBUFMODE 536.Pq Li u_int 537Get or set the current 538.Nm 539buffering mode; possible values are 540.Dv BPF_BUFMODE_BUFFER , 541buffered read mode, and 542.Dv BPF_BUFMODE_ZBUF , 543zero-copy buffer mode. 544.It Dv BIOCSETZBUF 545.Pq Li struct bpf_zbuf 546Set the current zero-copy buffer locations; buffer locations may be 547set only once zero-copy buffer mode has been selected, and prior to attaching 548to an interface. 549Buffers must be of identical size, page-aligned, and an integer multiple of 550pages in size. 551The three fields 552.Vt bz_bufa , 553.Vt bz_bufb , 554and 555.Vt bz_buflen 556must be filled out. 557If buffers have already been set for this device, the ioctl will fail. 558.It Dv BIOCGETZMAX 559.Pq Li size_t 560Get the largest individual zero-copy buffer size allowed. 561As two buffers are used in zero-copy buffer mode, the limit (in practice) is 562twice the returned size. 563As zero-copy buffers consume kernel address space, conservative selection of 564buffer size is suggested, especially when there are multiple 565.Nm 566descriptors in use on 32-bit systems. 567.It Dv BIOCROTZBUF 568Force ownership of the next buffer to be assigned to userspace, if any data 569present in the buffer. 570If no data is present, the buffer will remain owned by the kernel. 571This allows consumers of zero-copy buffering to implement timeouts and 572retrieve partially filled buffers. 573In order to handle the case where no data is present in the buffer and 574therefore ownership is not assigned, the user process must check 575.Vt bzh_kernel_gen 576against 577.Vt bzh_user_gen . 578.El 579.Sh BPF HEADER 580The following structure is prepended to each packet returned by 581.Xr read 2 582or via a zero-copy buffer: 583.Bd -literal 584struct bpf_hdr { 585 struct timeval bh_tstamp; /* time stamp */ 586 u_long bh_caplen; /* length of captured portion */ 587 u_long bh_datalen; /* original length of packet */ 588 u_short bh_hdrlen; /* length of bpf header (this struct 589 plus alignment padding */ 590}; 591.Ed 592.Pp 593The fields, whose values are stored in host order, and are: 594.Pp 595.Bl -tag -compact -width bh_datalen 596.It Li bh_tstamp 597The time at which the packet was processed by the packet filter. 598.It Li bh_caplen 599The length of the captured portion of the packet. 600This is the minimum of 601the truncation amount specified by the filter and the length of the packet. 602.It Li bh_datalen 603The length of the packet off the wire. 604This value is independent of the truncation amount specified by the filter. 605.It Li bh_hdrlen 606The length of the 607.Nm 608header, which may not be equal to 609.\" XXX - not really a function call 610.Fn sizeof "struct bpf_hdr" . 611.El 612.Pp 613The 614.Li bh_hdrlen 615field exists to account for 616padding between the header and the link level protocol. 617The purpose here is to guarantee proper alignment of the packet 618data structures, which is required on alignment sensitive 619architectures and improves performance on many other architectures. 620The packet filter insures that the 621.Li bpf_hdr 622and the network layer 623header will be word aligned. 624Suitable precautions 625must be taken when accessing the link layer protocol fields on alignment 626restricted machines. 627(This is not a problem on an Ethernet, since 628the type field is a short falling on an even offset, 629and the addresses are probably accessed in a bytewise fashion). 630.Pp 631Additionally, individual packets are padded so that each starts 632on a word boundary. 633This requires that an application 634has some knowledge of how to get from packet to packet. 635The macro 636.Dv BPF_WORDALIGN 637is defined in 638.In net/bpf.h 639to facilitate 640this process. 641It rounds up its argument to the nearest word aligned value (where a word is 642.Dv BPF_ALIGNMENT 643bytes wide). 644.Pp 645For example, if 646.Sq Li p 647points to the start of a packet, this expression 648will advance it to the next packet: 649.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen) 650.Pp 651For the alignment mechanisms to work properly, the 652buffer passed to 653.Xr read 2 654must itself be word aligned. 655The 656.Xr malloc 3 657function 658will always return an aligned buffer. 659.Sh FILTER MACHINE 660A filter program is an array of instructions, with all branches forwardly 661directed, terminated by a 662.Em return 663instruction. 664Each instruction performs some action on the pseudo-machine state, 665which consists of an accumulator, index register, scratch memory store, 666and implicit program counter. 667.Pp 668The following structure defines the instruction format: 669.Bd -literal 670struct bpf_insn { 671 u_short code; 672 u_char jt; 673 u_char jf; 674 u_long k; 675}; 676.Ed 677.Pp 678The 679.Li k 680field is used in different ways by different instructions, 681and the 682.Li jt 683and 684.Li jf 685fields are used as offsets 686by the branch instructions. 687The opcodes are encoded in a semi-hierarchical fashion. 688There are eight classes of instructions: 689.Dv BPF_LD , 690.Dv BPF_LDX , 691.Dv BPF_ST , 692.Dv BPF_STX , 693.Dv BPF_ALU , 694.Dv BPF_JMP , 695.Dv BPF_RET , 696and 697.Dv BPF_MISC . 698Various other mode and 699operator bits are or'd into the class to give the actual instructions. 700The classes and modes are defined in 701.In net/bpf.h . 702.Pp 703Below are the semantics for each defined 704.Nm 705instruction. 706We use the convention that A is the accumulator, X is the index register, 707P[] packet data, and M[] scratch memory store. 708P[i:n] gives the data at byte offset 709.Dq i 710in the packet, 711interpreted as a word (n=4), 712unsigned halfword (n=2), or unsigned byte (n=1). 713M[i] gives the i'th word in the scratch memory store, which is only 714addressed in word units. 715The memory store is indexed from 0 to 716.Dv BPF_MEMWORDS 717- 1. 718.Li k , 719.Li jt , 720and 721.Li jf 722are the corresponding fields in the 723instruction definition. 724.Dq len 725refers to the length of the packet. 726.Pp 727.Bl -tag -width BPF_STXx 728.It Dv BPF_LD 729These instructions copy a value into the accumulator. 730The type of the source operand is specified by an 731.Dq addressing mode 732and can be a constant 733.Pq Dv BPF_IMM , 734packet data at a fixed offset 735.Pq Dv BPF_ABS , 736packet data at a variable offset 737.Pq Dv BPF_IND , 738the packet length 739.Pq Dv BPF_LEN , 740or a word in the scratch memory store 741.Pq Dv BPF_MEM . 742For 743.Dv BPF_IND 744and 745.Dv BPF_ABS , 746the data size must be specified as a word 747.Pq Dv BPF_W , 748halfword 749.Pq Dv BPF_H , 750or byte 751.Pq Dv BPF_B . 752The semantics of all the recognized 753.Dv BPF_LD 754instructions follow. 755.Pp 756.Bd -literal 757BPF_LD+BPF_W+BPF_ABS A <- P[k:4] 758BPF_LD+BPF_H+BPF_ABS A <- P[k:2] 759BPF_LD+BPF_B+BPF_ABS A <- P[k:1] 760BPF_LD+BPF_W+BPF_IND A <- P[X+k:4] 761BPF_LD+BPF_H+BPF_IND A <- P[X+k:2] 762BPF_LD+BPF_B+BPF_IND A <- P[X+k:1] 763BPF_LD+BPF_W+BPF_LEN A <- len 764BPF_LD+BPF_IMM A <- k 765BPF_LD+BPF_MEM A <- M[k] 766.Ed 767.It Dv BPF_LDX 768These instructions load a value into the index register. 769Note that 770the addressing modes are more restrictive than those of the accumulator loads, 771but they include 772.Dv BPF_MSH , 773a hack for efficiently loading the IP header length. 774.Pp 775.Bd -literal 776BPF_LDX+BPF_W+BPF_IMM X <- k 777BPF_LDX+BPF_W+BPF_MEM X <- M[k] 778BPF_LDX+BPF_W+BPF_LEN X <- len 779BPF_LDX+BPF_B+BPF_MSH X <- 4*(P[k:1]&0xf) 780.Ed 781.It Dv BPF_ST 782This instruction stores the accumulator into the scratch memory. 783We do not need an addressing mode since there is only one possibility 784for the destination. 785.Pp 786.Bd -literal 787BPF_ST M[k] <- A 788.Ed 789.It Dv BPF_STX 790This instruction stores the index register in the scratch memory store. 791.Pp 792.Bd -literal 793BPF_STX M[k] <- X 794.Ed 795.It Dv BPF_ALU 796The alu instructions perform operations between the accumulator and 797index register or constant, and store the result back in the accumulator. 798For binary operations, a source mode is required 799.Dv ( BPF_K 800or 801.Dv BPF_X ) . 802.Pp 803.Bd -literal 804BPF_ALU+BPF_ADD+BPF_K A <- A + k 805BPF_ALU+BPF_SUB+BPF_K A <- A - k 806BPF_ALU+BPF_MUL+BPF_K A <- A * k 807BPF_ALU+BPF_DIV+BPF_K A <- A / k 808BPF_ALU+BPF_AND+BPF_K A <- A & k 809BPF_ALU+BPF_OR+BPF_K A <- A | k 810BPF_ALU+BPF_LSH+BPF_K A <- A << k 811BPF_ALU+BPF_RSH+BPF_K A <- A >> k 812BPF_ALU+BPF_ADD+BPF_X A <- A + X 813BPF_ALU+BPF_SUB+BPF_X A <- A - X 814BPF_ALU+BPF_MUL+BPF_X A <- A * X 815BPF_ALU+BPF_DIV+BPF_X A <- A / X 816BPF_ALU+BPF_AND+BPF_X A <- A & X 817BPF_ALU+BPF_OR+BPF_X A <- A | X 818BPF_ALU+BPF_LSH+BPF_X A <- A << X 819BPF_ALU+BPF_RSH+BPF_X A <- A >> X 820BPF_ALU+BPF_NEG A <- -A 821.Ed 822.It Dv BPF_JMP 823The jump instructions alter flow of control. 824Conditional jumps 825compare the accumulator against a constant 826.Pq Dv BPF_K 827or the index register 828.Pq Dv BPF_X . 829If the result is true (or non-zero), 830the true branch is taken, otherwise the false branch is taken. 831Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 832However, the jump always 833.Pq Dv BPF_JA 834opcode uses the 32 bit 835.Li k 836field as the offset, allowing arbitrarily distant destinations. 837All conditionals use unsigned comparison conventions. 838.Pp 839.Bd -literal 840BPF_JMP+BPF_JA pc += k 841BPF_JMP+BPF_JGT+BPF_K pc += (A > k) ? jt : jf 842BPF_JMP+BPF_JGE+BPF_K pc += (A >= k) ? jt : jf 843BPF_JMP+BPF_JEQ+BPF_K pc += (A == k) ? jt : jf 844BPF_JMP+BPF_JSET+BPF_K pc += (A & k) ? jt : jf 845BPF_JMP+BPF_JGT+BPF_X pc += (A > X) ? jt : jf 846BPF_JMP+BPF_JGE+BPF_X pc += (A >= X) ? jt : jf 847BPF_JMP+BPF_JEQ+BPF_X pc += (A == X) ? jt : jf 848BPF_JMP+BPF_JSET+BPF_X pc += (A & X) ? jt : jf 849.Ed 850.It Dv BPF_RET 851The return instructions terminate the filter program and specify the amount 852of packet to accept (i.e., they return the truncation amount). 853A return value of zero indicates that the packet should be ignored. 854The return value is either a constant 855.Pq Dv BPF_K 856or the accumulator 857.Pq Dv BPF_A . 858.Pp 859.Bd -literal 860BPF_RET+BPF_A accept A bytes 861BPF_RET+BPF_K accept k bytes 862.Ed 863.It Dv BPF_MISC 864The miscellaneous category was created for anything that does not 865fit into the above classes, and for any new instructions that might need to 866be added. 867Currently, these are the register transfer instructions 868that copy the index register to the accumulator or vice versa. 869.Pp 870.Bd -literal 871BPF_MISC+BPF_TAX X <- A 872BPF_MISC+BPF_TXA A <- X 873.Ed 874.El 875.Pp 876The 877.Nm 878interface provides the following macros to facilitate 879array initializers: 880.Fn BPF_STMT opcode operand 881and 882.Fn BPF_JUMP opcode operand true_offset false_offset . 883.Sh FILES 884.Bl -tag -compact -width /dev/bpfXXX 885.It Pa /dev/bpf Ns Sy n 886the packet filter device 887.El 888.Sh EXAMPLES 889The following filter is taken from the Reverse ARP Daemon. 890It accepts only Reverse ARP requests. 891.Bd -literal 892struct bpf_insn insns[] = { 893 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 894 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 895 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 896 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 897 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 898 sizeof(struct ether_header)), 899 BPF_STMT(BPF_RET+BPF_K, 0), 900}; 901.Ed 902.Pp 903This filter accepts only IP packets between host 128.3.112.15 and 904128.3.112.35. 905.Bd -literal 906struct bpf_insn insns[] = { 907 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 908 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 909 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 910 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 911 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 912 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 913 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 914 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 915 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 916 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 917 BPF_STMT(BPF_RET+BPF_K, 0), 918}; 919.Ed 920.Pp 921Finally, this filter returns only TCP finger packets. 922We must parse the IP header to reach the TCP header. 923The 924.Dv BPF_JSET 925instruction 926checks that the IP fragment offset is 0 so we are sure 927that we have a TCP header. 928.Bd -literal 929struct bpf_insn insns[] = { 930 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 931 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 932 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 933 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 934 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 935 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 936 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 937 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 938 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 939 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 940 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 941 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 942 BPF_STMT(BPF_RET+BPF_K, 0), 943}; 944.Ed 945.Sh SEE ALSO 946.Xr tcpdump 1 , 947.Xr ioctl 2 , 948.Xr kqueue 2 , 949.Xr poll 2 , 950.Xr select 2 , 951.Xr byteorder 3 , 952.Xr ng_bpf 4 , 953.Xr bpf 9 954.Rs 955.%A McCanne, S. 956.%A Jacobson V. 957.%T "An efficient, extensible, and portable network monitor" 958.Re 959.Sh HISTORY 960The Enet packet filter was created in 1980 by Mike Accetta and 961Rick Rashid at Carnegie-Mellon University. 962Jeffrey Mogul, at 963Stanford, ported the code to 964.Bx 965and continued its development from 9661983 on. 967Since then, it has evolved into the Ultrix Packet Filter at 968.Tn DEC , 969a 970.Tn STREAMS 971.Tn NIT 972module under 973.Tn SunOS 4.1 , 974and 975.Tn BPF . 976.Sh AUTHORS 977.An -nosplit 978.An Steven McCanne , 979of Lawrence Berkeley Laboratory, implemented BPF in 980Summer 1990. 981Much of the design is due to 982.An Van Jacobson . 983.Pp 984Support for zero-copy buffers was added by 985.An Robert N. M. Watson 986under contract to Seccuris Inc. 987.Sh BUGS 988The read buffer must be of a fixed size (returned by the 989.Dv BIOCGBLEN 990ioctl). 991.Pp 992A file that does not request promiscuous mode may receive promiscuously 993received packets as a side effect of another file requesting this 994mode on the same hardware interface. 995This could be fixed in the kernel with additional processing overhead. 996However, we favor the model where 997all files must assume that the interface is promiscuous, and if 998so desired, must utilize a filter to reject foreign packets. 999.Pp 1000Data link protocols with variable length headers are not currently supported. 1001.Pp 1002The 1003.Dv SEESENT , 1004.Dv DIRECTION , 1005and 1006.Dv FEEDBACK 1007settings have been observed to work incorrectly on some interface 1008types, including those with hardware loopback rather than software loopback, 1009and point-to-point interfaces. 1010They appear to function correctly on a 1011broad range of Ethernet-style interfaces. 1012