1.\" Copyright (c) 2007 Seccuris Inc. 2.\" All rights reserved. 3.\" 4.\" This sofware was developed by Robert N. M. Watson under contract to 5.\" Seccuris Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" Copyright (c) 1990 The Regents of the University of California. 29.\" All rights reserved. 30.\" 31.\" Redistribution and use in source and binary forms, with or without 32.\" modification, are permitted provided that: (1) source code distributions 33.\" retain the above copyright notice and this paragraph in its entirety, (2) 34.\" distributions including binary code include the above copyright notice and 35.\" this paragraph in its entirety in the documentation or other materials 36.\" provided with the distribution, and (3) all advertising materials mentioning 37.\" features or use of this software display the following acknowledgement: 38.\" ``This product includes software developed by the University of California, 39.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 40.\" the University nor the names of its contributors may be used to endorse 41.\" or promote products derived from this software without specific prior 42.\" written permission. 43.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 44.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 45.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 46.\" 47.\" This document is derived in part from the enet man page (enet.4) 48.\" distributed with 4.3BSD Unix. 49.\" 50.\" $FreeBSD$ 51.\" 52.Dd February 26, 2007 53.Dt BPF 4 54.Os 55.Sh NAME 56.Nm bpf 57.Nd Berkeley Packet Filter 58.Sh SYNOPSIS 59.Cd device bpf 60.Sh DESCRIPTION 61The Berkeley Packet Filter 62provides a raw interface to data link layers in a protocol 63independent fashion. 64All packets on the network, even those destined for other hosts, 65are accessible through this mechanism. 66.Pp 67The packet filter appears as a character special device, 68.Pa /dev/bpf0 , 69.Pa /dev/bpf1 , 70etc. 71After opening the device, the file descriptor must be bound to a 72specific network interface with the 73.Dv BIOCSETIF 74ioctl. 75A given interface can be shared by multiple listeners, and the filter 76underlying each descriptor will see an identical packet stream. 77.Pp 78A separate device file is required for each minor device. 79If a file is in use, the open will fail and 80.Va errno 81will be set to 82.Er EBUSY . 83.Pp 84Associated with each open instance of a 85.Nm 86file is a user-settable packet filter. 87Whenever a packet is received by an interface, 88all file descriptors listening on that interface apply their filter. 89Each descriptor that accepts the packet receives its own copy. 90.Pp 91The packet filter will support any link level protocol that has fixed length 92headers. 93Currently, only Ethernet, 94.Tn SLIP , 95and 96.Tn PPP 97drivers have been modified to interact with 98.Nm . 99.Pp 100Since packet data is in network byte order, applications should use the 101.Xr byteorder 3 102macros to extract multi-byte values. 103.Pp 104A packet can be sent out on the network by writing to a 105.Nm 106file descriptor. 107The writes are unbuffered, meaning only one packet can be processed per write. 108Currently, only writes to Ethernets and 109.Tn SLIP 110links are supported. 111.Sh BUFFER MODES 112.Nm 113devices deliver packet data to the application via memory buffers provided by 114the application. 115The buffer mode is set using the 116.Dv BIOCSETBUFMODE 117ioctl, and read using the 118.Dv BIOCGETBUFMODE 119ioctl. 120.Ss Buffered read mode 121By default, 122.Nm 123devices operate in the 124.Dv BPF_BUFMODE_BUFFER 125mode, in which packet data is copied explicitly from kernel to user memory 126using the 127.Xr read 2 128system call. 129The user process will declare a fixed buffer size that will be used both for 130sizing internal buffers and for all 131.Xr read 2 132operations on the file. 133This size is queried using the 134.Dv BIOCGBLEN 135ioctl, and is set using the 136.Dv BIOCSBLEN 137ioctl. 138Note that an individual packet larger than the buffer size is necessarily 139truncated. 140.Ss Zero-copy buffer mode 141.Nm 142devices may also operate in the 143.Dv BPF_BUFMODE_ZEROCOPY 144mode, in which packet data is written directly into two user memory buffers 145by the kernel, avoiding both system call and copying overhead. 146Buffers are of fixed (and equal) size, page-aligned, and an even multiple of 147the page size. 148The maximum zero-copy buffer size is returned by the 149.Dv BIOCGETZMAX 150ioctl. 151Note that an individual packet larger than the buffer size is necessarily 152truncated. 153.Pp 154The user process registers two memory buffers using the 155.Dv BIOCSETZBUF 156ioctl, which accepts a 157.Vt struct bpf_zbuf 158pointer as an argument: 159.Bd -literal 160struct bpf_zbuf { 161 void *bz_bufa; 162 void *bz_bufb; 163 size_t bz_buflen; 164}; 165.Ed 166.Pp 167.Vt bz_bufa 168is a pointer to the userspace address of the first buffer that will be 169filled, and 170.Vt bz_bufb 171is a pointer to the second buffer. 172.Nm 173will then cycle between the two buffers as they fill and are acknowledged. 174.Pp 175Each buffer begins with a fixed-length header to hold synchronization and 176data length information for the buffer: 177.Bd -literal 178struct bpf_zbuf_header { 179 volatile u_int bzh_kernel_gen; /* Kernel generation number. */ 180 volatile u_int bzh_kernel_len; /* Length of data in the buffer. */ 181 volatile u_int bzh_user_gen; /* User generation number. */ 182 /* ...padding for future use... */ 183}; 184.Ed 185.Pp 186The header structure of each buffer, including all padding, should be zeroed 187before it is configured using 188.Dv BIOCSETZBUF . 189Remaining space in the buffer will be used by the kernel to store packet 190data, laid out in the same format as with buffered read mode. 191.Pp 192The kernel and the user process follow a simple acknowledgement protocol via 193the buffer header to synchronize access to the buffer: when the header 194generation numbers, 195.Vt bzh_kernel_gen 196and 197.Vt bzh_user_gen , 198hold the same value, the kernel owns the buffer, and when they differ, 199userspace owns the buffer. 200.Pp 201While the kernel owns the buffer, the contents are unstable and may change 202asynchronously; while the user process owns the buffer, its contents are 203stable and will not be changed until the buffer has been acknowledged. 204.Pp 205Initializing the buffer headers to all 0's before registering the buffer has 206the effect of assigning initial ownership of both buffers to the kernel. 207The kernel signals that a buffer has been assigned to userspace by modifying 208.Vt bzh_kernel_gen , 209and userspace acknowledges the buffer and returns it to the kernel by setting 210the value of 211.Vt bzh_user_gen 212to the value of 213.Vt bzh_kernel_gen . 214.Pp 215In order to avoid caching and memory re-ordering effects, the user process 216must use atomic operations and memory barriers when checking for and 217acknowledging buffers: 218.Bd -literal 219#include <machine/atomic.h> 220 221/* 222 * Return ownership of a buffer to the kernel for reuse. 223 */ 224static void 225buffer_acknowledge(struct bpf_zbuf_header *bzh) 226{ 227 228 atomic_store_rel_int(&bzh->bzh_user_gen, bzh->bzh_kernel_gen); 229} 230 231/* 232 * Check whether a buffer has been assigned to userspace by the kernel. 233 * Return true if userspace owns the buffer, and false otherwise. 234 */ 235static int 236buffer_check(struct bpf_zbuf_header *bzh) 237{ 238 239 return (bzh->bzh_user_gen != 240 atomic_load_acq_int(&bzh->bzh_kernel_gen)); 241} 242.Ed 243.Pp 244The user process may force the assignment of the next buffer, if any data 245is pending, to userspace using the 246.Dv BIOCROTZBUF 247ioctl. 248This allows the user process to retrieve data in a partially filled buffer 249before the buffer is full, such as following a timeout; the process must 250recheck for buffer ownership using the header generation numbers, as the 251buffer will not be assigned to userspace if no data was present. 252.Pp 253As in the buffered read mode, 254.Xr kqueue 2 , 255.Xr poll 2 , 256and 257.Xr select 2 258may be used to sleep awaiting the availbility of a completed buffer. 259They will return a readable file descriptor when ownership of the next buffer 260is assigned to user space. 261.Pp 262In the current implementation, the kernel may assign zero, one, or both 263buffers to the user process; however, an earlier implementation maintained 264the invariant that at most one buffer could be assigned to the user process 265at a time. 266In order to both ensure progress and high performance, user processes should 267acknowledge a completely processed buffer as quickly as possible, returning 268it for reuse, and not block waiting on a second buffer while holding another 269buffer. 270.Sh IOCTLS 271The 272.Xr ioctl 2 273command codes below are defined in 274.In net/bpf.h . 275All commands require 276these includes: 277.Bd -literal 278 #include <sys/types.h> 279 #include <sys/time.h> 280 #include <sys/ioctl.h> 281 #include <net/bpf.h> 282.Ed 283.Pp 284Additionally, 285.Dv BIOCGETIF 286and 287.Dv BIOCSETIF 288require 289.In sys/socket.h 290and 291.In net/if.h . 292.Pp 293In addition to 294.Dv FIONREAD 295and 296.Dv SIOCGIFADDR , 297the following commands may be applied to any open 298.Nm 299file. 300The (third) argument to 301.Xr ioctl 2 302should be a pointer to the type indicated. 303.Bl -tag -width BIOCGETBUFMODE 304.It Dv BIOCGBLEN 305.Pq Li u_int 306Returns the required buffer length for reads on 307.Nm 308files. 309.It Dv BIOCSBLEN 310.Pq Li u_int 311Sets the buffer length for reads on 312.Nm 313files. 314The buffer must be set before the file is attached to an interface 315with 316.Dv BIOCSETIF . 317If the requested buffer size cannot be accommodated, the closest 318allowable size will be set and returned in the argument. 319A read call will result in 320.Er EIO 321if it is passed a buffer that is not this size. 322.It Dv BIOCGDLT 323.Pq Li u_int 324Returns the type of the data link layer underlying the attached interface. 325.Er EINVAL 326is returned if no interface has been specified. 327The device types, prefixed with 328.Dq Li DLT_ , 329are defined in 330.In net/bpf.h . 331.It Dv BIOCPROMISC 332Forces the interface into promiscuous mode. 333All packets, not just those destined for the local host, are processed. 334Since more than one file can be listening on a given interface, 335a listener that opened its interface non-promiscuously may receive 336packets promiscuously. 337This problem can be remedied with an appropriate filter. 338.It Dv BIOCFLUSH 339Flushes the buffer of incoming packets, 340and resets the statistics that are returned by BIOCGSTATS. 341.It Dv BIOCGETIF 342.Pq Li "struct ifreq" 343Returns the name of the hardware interface that the file is listening on. 344The name is returned in the ifr_name field of 345the 346.Li ifreq 347structure. 348All other fields are undefined. 349.It Dv BIOCSETIF 350.Pq Li "struct ifreq" 351Sets the hardware interface associate with the file. 352This 353command must be performed before any packets can be read. 354The device is indicated by name using the 355.Li ifr_name 356field of the 357.Li ifreq 358structure. 359Additionally, performs the actions of 360.Dv BIOCFLUSH . 361.It Dv BIOCSRTIMEOUT 362.It Dv BIOCGRTIMEOUT 363.Pq Li "struct timeval" 364Set or get the read timeout parameter. 365The argument 366specifies the length of time to wait before timing 367out on a read request. 368This parameter is initialized to zero by 369.Xr open 2 , 370indicating no timeout. 371.It Dv BIOCGSTATS 372.Pq Li "struct bpf_stat" 373Returns the following structure of packet statistics: 374.Bd -literal 375struct bpf_stat { 376 u_int bs_recv; /* number of packets received */ 377 u_int bs_drop; /* number of packets dropped */ 378}; 379.Ed 380.Pp 381The fields are: 382.Bl -hang -offset indent 383.It Li bs_recv 384the number of packets received by the descriptor since opened or reset 385(including any buffered since the last read call); 386and 387.It Li bs_drop 388the number of packets which were accepted by the filter but dropped by the 389kernel because of buffer overflows 390(i.e., the application's reads are not keeping up with the packet traffic). 391.El 392.It Dv BIOCIMMEDIATE 393.Pq Li u_int 394Enable or disable 395.Dq immediate mode , 396based on the truth value of the argument. 397When immediate mode is enabled, reads return immediately upon packet 398reception. 399Otherwise, a read will block until either the kernel buffer 400becomes full or a timeout occurs. 401This is useful for programs like 402.Xr rarpd 8 403which must respond to messages in real time. 404The default for a new file is off. 405.It Dv BIOCSETF 406.Pq Li "struct bpf_program" 407Sets the read filter program used by the kernel to discard uninteresting 408packets. 409An array of instructions and its length is passed in using 410the following structure: 411.Bd -literal 412struct bpf_program { 413 int bf_len; 414 struct bpf_insn *bf_insns; 415}; 416.Ed 417.Pp 418The filter program is pointed to by the 419.Li bf_insns 420field while its length in units of 421.Sq Li struct bpf_insn 422is given by the 423.Li bf_len 424field. 425Also, the actions of 426.Dv BIOCFLUSH 427are performed. 428See section 429.Sx "FILTER MACHINE" 430for an explanation of the filter language. 431.It Dv BIOCSETWF 432.Pq Li "struct bpf_program" 433Sets the write filter program used by the kernel to control what type of 434packets can be written to the interface. 435See the 436.Dv BIOCSETF 437command for more 438information on the 439.Nm 440filter program. 441.It Dv BIOCVERSION 442.Pq Li "struct bpf_version" 443Returns the major and minor version numbers of the filter language currently 444recognized by the kernel. 445Before installing a filter, applications must check 446that the current version is compatible with the running kernel. 447Version numbers are compatible if the major numbers match and the application minor 448is less than or equal to the kernel minor. 449The kernel version number is returned in the following structure: 450.Bd -literal 451struct bpf_version { 452 u_short bv_major; 453 u_short bv_minor; 454}; 455.Ed 456.Pp 457The current version numbers are given by 458.Dv BPF_MAJOR_VERSION 459and 460.Dv BPF_MINOR_VERSION 461from 462.In net/bpf.h . 463An incompatible filter 464may result in undefined behavior (most likely, an error returned by 465.Fn ioctl 466or haphazard packet matching). 467.It Dv BIOCSHDRCMPLT 468.It Dv BIOCGHDRCMPLT 469.Pq Li u_int 470Set or get the status of the 471.Dq header complete 472flag. 473Set to zero if the link level source address should be filled in automatically 474by the interface output routine. 475Set to one if the link level source 476address will be written, as provided, to the wire. 477This flag is initialized to zero by default. 478.It Dv BIOCSSEESENT 479.It Dv BIOCGSEESENT 480.Pq Li u_int 481These commands are obsolete but left for compatibility. 482Use 483.Dv BIOCSDIRECTION 484and 485.Dv BIOCGDIRECTION 486instead. 487Set or get the flag determining whether locally generated packets on the 488interface should be returned by BPF. 489Set to zero to see only incoming packets on the interface. 490Set to one to see packets originating locally and remotely on the interface. 491This flag is initialized to one by default. 492.It Dv BIOCSDIRECTION 493.It Dv BIOCGDIRECTION 494.Pq Li u_int 495Set or get the setting determining whether incoming, outgoing, or all packets 496on the interface should be returned by BPF. 497Set to 498.Dv BPF_D_IN 499to see only incoming packets on the interface. 500Set to 501.Dv BPF_D_INOUT 502to see packets originating locally and remotely on the interface. 503Set to 504.Dv BPF_D_OUT 505to see only outgoing packets on the interface. 506This setting is initialized to 507.Dv BPF_D_INOUT 508by default. 509.It Dv BIOCFEEDBACK 510.Pq Li u_int 511Set packet feedback mode. 512This allows injected packets to be fed back as input to the interface when 513output via the interface is successful. 514When 515.Dv BPF_D_INOUT 516direction is set, injected outgoing packet is not returned by BPF to avoid 517duplication. This flag is initialized to zero by default. 518.It Dv BIOCLOCK 519Set the locked flag on the 520.Nm 521descriptor. 522This prevents the execution of 523ioctl commands which could change the underlying operating parameters of 524the device. 525.It Dv BIOCGETBUFMODE 526.It Dv BIOCSETBUFMODE 527.Pq Li u_int 528Get or set the current 529.Nm 530buffering mode; possible values are 531.Dv BPF_BUFMODE_BUFFER , 532buffered read mode, and 533.Dv BPF_BUFMODE_ZBUF , 534zero-copy buffer mode. 535.It Dv BIOCSETZBUF 536.Pq Li struct bpf_zbuf 537Set the current zero-copy buffer locations; buffer locations may be 538set only once zero-copy buffer mode has been selected, and prior to attaching 539to an interface. 540Buffers must be of identical size, page-aligned, and an integer multiple of 541pages in size. 542The three fields 543.Vt bz_bufa , 544.Vt bz_bufb , 545and 546.Vt bz_buflen 547must be filled out. 548If buffers have already been set for this device, the ioctl will fail. 549.It Dv BIOCGETZMAX 550.Pq Li size_t 551Get the largest individual zero-copy buffer size allowed. 552As two buffers are used in zero-copy buffer mode, the limit (in practice) is 553twice the returned size. 554As zero-copy buffers consume kernel address space, conservative selection of 555buffer size is suggested, especially when there are multiple 556.Nm 557descriptors in use on 32-bit systems. 558.It Dv BIOCROTZBUF 559Force ownership of the next buffer to be assigned to userspace, if any data 560present in the buffer. 561If no data is present, the buffer will remain owned by the kernel. 562This allows consumers of zero-copy buffering to implement timeouts and 563retrieve partially filled buffers. 564In order to handle the case where no data is present in the buffer and 565therefore ownership is not assigned, the user process must check 566.Vt bzh_kernel_gen 567against 568.Vt bzh_user_gen . 569.El 570.Sh BPF HEADER 571The following structure is prepended to each packet returned by 572.Xr read 2 573or via a zero-copy buffer: 574.Bd -literal 575struct bpf_hdr { 576 struct timeval bh_tstamp; /* time stamp */ 577 u_long bh_caplen; /* length of captured portion */ 578 u_long bh_datalen; /* original length of packet */ 579 u_short bh_hdrlen; /* length of bpf header (this struct 580 plus alignment padding */ 581}; 582.Ed 583.Pp 584The fields, whose values are stored in host order, and are: 585.Pp 586.Bl -tag -compact -width bh_datalen 587.It Li bh_tstamp 588The time at which the packet was processed by the packet filter. 589.It Li bh_caplen 590The length of the captured portion of the packet. 591This is the minimum of 592the truncation amount specified by the filter and the length of the packet. 593.It Li bh_datalen 594The length of the packet off the wire. 595This value is independent of the truncation amount specified by the filter. 596.It Li bh_hdrlen 597The length of the 598.Nm 599header, which may not be equal to 600.\" XXX - not really a function call 601.Fn sizeof "struct bpf_hdr" . 602.El 603.Pp 604The 605.Li bh_hdrlen 606field exists to account for 607padding between the header and the link level protocol. 608The purpose here is to guarantee proper alignment of the packet 609data structures, which is required on alignment sensitive 610architectures and improves performance on many other architectures. 611The packet filter insures that the 612.Li bpf_hdr 613and the network layer 614header will be word aligned. 615Suitable precautions 616must be taken when accessing the link layer protocol fields on alignment 617restricted machines. 618(This is not a problem on an Ethernet, since 619the type field is a short falling on an even offset, 620and the addresses are probably accessed in a bytewise fashion). 621.Pp 622Additionally, individual packets are padded so that each starts 623on a word boundary. 624This requires that an application 625has some knowledge of how to get from packet to packet. 626The macro 627.Dv BPF_WORDALIGN 628is defined in 629.In net/bpf.h 630to facilitate 631this process. 632It rounds up its argument to the nearest word aligned value (where a word is 633.Dv BPF_ALIGNMENT 634bytes wide). 635.Pp 636For example, if 637.Sq Li p 638points to the start of a packet, this expression 639will advance it to the next packet: 640.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen) 641.Pp 642For the alignment mechanisms to work properly, the 643buffer passed to 644.Xr read 2 645must itself be word aligned. 646The 647.Xr malloc 3 648function 649will always return an aligned buffer. 650.Sh FILTER MACHINE 651A filter program is an array of instructions, with all branches forwardly 652directed, terminated by a 653.Em return 654instruction. 655Each instruction performs some action on the pseudo-machine state, 656which consists of an accumulator, index register, scratch memory store, 657and implicit program counter. 658.Pp 659The following structure defines the instruction format: 660.Bd -literal 661struct bpf_insn { 662 u_short code; 663 u_char jt; 664 u_char jf; 665 u_long k; 666}; 667.Ed 668.Pp 669The 670.Li k 671field is used in different ways by different instructions, 672and the 673.Li jt 674and 675.Li jf 676fields are used as offsets 677by the branch instructions. 678The opcodes are encoded in a semi-hierarchical fashion. 679There are eight classes of instructions: 680.Dv BPF_LD , 681.Dv BPF_LDX , 682.Dv BPF_ST , 683.Dv BPF_STX , 684.Dv BPF_ALU , 685.Dv BPF_JMP , 686.Dv BPF_RET , 687and 688.Dv BPF_MISC . 689Various other mode and 690operator bits are or'd into the class to give the actual instructions. 691The classes and modes are defined in 692.In net/bpf.h . 693.Pp 694Below are the semantics for each defined 695.Nm 696instruction. 697We use the convention that A is the accumulator, X is the index register, 698P[] packet data, and M[] scratch memory store. 699P[i:n] gives the data at byte offset 700.Dq i 701in the packet, 702interpreted as a word (n=4), 703unsigned halfword (n=2), or unsigned byte (n=1). 704M[i] gives the i'th word in the scratch memory store, which is only 705addressed in word units. 706The memory store is indexed from 0 to 707.Dv BPF_MEMWORDS 708- 1. 709.Li k , 710.Li jt , 711and 712.Li jf 713are the corresponding fields in the 714instruction definition. 715.Dq len 716refers to the length of the packet. 717.Pp 718.Bl -tag -width BPF_STXx 719.It Dv BPF_LD 720These instructions copy a value into the accumulator. 721The type of the source operand is specified by an 722.Dq addressing mode 723and can be a constant 724.Pq Dv BPF_IMM , 725packet data at a fixed offset 726.Pq Dv BPF_ABS , 727packet data at a variable offset 728.Pq Dv BPF_IND , 729the packet length 730.Pq Dv BPF_LEN , 731or a word in the scratch memory store 732.Pq Dv BPF_MEM . 733For 734.Dv BPF_IND 735and 736.Dv BPF_ABS , 737the data size must be specified as a word 738.Pq Dv BPF_W , 739halfword 740.Pq Dv BPF_H , 741or byte 742.Pq Dv BPF_B . 743The semantics of all the recognized 744.Dv BPF_LD 745instructions follow. 746.Pp 747.Bd -literal 748BPF_LD+BPF_W+BPF_ABS A <- P[k:4] 749BPF_LD+BPF_H+BPF_ABS A <- P[k:2] 750BPF_LD+BPF_B+BPF_ABS A <- P[k:1] 751BPF_LD+BPF_W+BPF_IND A <- P[X+k:4] 752BPF_LD+BPF_H+BPF_IND A <- P[X+k:2] 753BPF_LD+BPF_B+BPF_IND A <- P[X+k:1] 754BPF_LD+BPF_W+BPF_LEN A <- len 755BPF_LD+BPF_IMM A <- k 756BPF_LD+BPF_MEM A <- M[k] 757.Ed 758.It Dv BPF_LDX 759These instructions load a value into the index register. 760Note that 761the addressing modes are more restrictive than those of the accumulator loads, 762but they include 763.Dv BPF_MSH , 764a hack for efficiently loading the IP header length. 765.Pp 766.Bd -literal 767BPF_LDX+BPF_W+BPF_IMM X <- k 768BPF_LDX+BPF_W+BPF_MEM X <- M[k] 769BPF_LDX+BPF_W+BPF_LEN X <- len 770BPF_LDX+BPF_B+BPF_MSH X <- 4*(P[k:1]&0xf) 771.Ed 772.It Dv BPF_ST 773This instruction stores the accumulator into the scratch memory. 774We do not need an addressing mode since there is only one possibility 775for the destination. 776.Pp 777.Bd -literal 778BPF_ST M[k] <- A 779.Ed 780.It Dv BPF_STX 781This instruction stores the index register in the scratch memory store. 782.Pp 783.Bd -literal 784BPF_STX M[k] <- X 785.Ed 786.It Dv BPF_ALU 787The alu instructions perform operations between the accumulator and 788index register or constant, and store the result back in the accumulator. 789For binary operations, a source mode is required 790.Dv ( BPF_K 791or 792.Dv BPF_X ) . 793.Pp 794.Bd -literal 795BPF_ALU+BPF_ADD+BPF_K A <- A + k 796BPF_ALU+BPF_SUB+BPF_K A <- A - k 797BPF_ALU+BPF_MUL+BPF_K A <- A * k 798BPF_ALU+BPF_DIV+BPF_K A <- A / k 799BPF_ALU+BPF_AND+BPF_K A <- A & k 800BPF_ALU+BPF_OR+BPF_K A <- A | k 801BPF_ALU+BPF_LSH+BPF_K A <- A << k 802BPF_ALU+BPF_RSH+BPF_K A <- A >> k 803BPF_ALU+BPF_ADD+BPF_X A <- A + X 804BPF_ALU+BPF_SUB+BPF_X A <- A - X 805BPF_ALU+BPF_MUL+BPF_X A <- A * X 806BPF_ALU+BPF_DIV+BPF_X A <- A / X 807BPF_ALU+BPF_AND+BPF_X A <- A & X 808BPF_ALU+BPF_OR+BPF_X A <- A | X 809BPF_ALU+BPF_LSH+BPF_X A <- A << X 810BPF_ALU+BPF_RSH+BPF_X A <- A >> X 811BPF_ALU+BPF_NEG A <- -A 812.Ed 813.It Dv BPF_JMP 814The jump instructions alter flow of control. 815Conditional jumps 816compare the accumulator against a constant 817.Pq Dv BPF_K 818or the index register 819.Pq Dv BPF_X . 820If the result is true (or non-zero), 821the true branch is taken, otherwise the false branch is taken. 822Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 823However, the jump always 824.Pq Dv BPF_JA 825opcode uses the 32 bit 826.Li k 827field as the offset, allowing arbitrarily distant destinations. 828All conditionals use unsigned comparison conventions. 829.Pp 830.Bd -literal 831BPF_JMP+BPF_JA pc += k 832BPF_JMP+BPF_JGT+BPF_K pc += (A > k) ? jt : jf 833BPF_JMP+BPF_JGE+BPF_K pc += (A >= k) ? jt : jf 834BPF_JMP+BPF_JEQ+BPF_K pc += (A == k) ? jt : jf 835BPF_JMP+BPF_JSET+BPF_K pc += (A & k) ? jt : jf 836BPF_JMP+BPF_JGT+BPF_X pc += (A > X) ? jt : jf 837BPF_JMP+BPF_JGE+BPF_X pc += (A >= X) ? jt : jf 838BPF_JMP+BPF_JEQ+BPF_X pc += (A == X) ? jt : jf 839BPF_JMP+BPF_JSET+BPF_X pc += (A & X) ? jt : jf 840.Ed 841.It Dv BPF_RET 842The return instructions terminate the filter program and specify the amount 843of packet to accept (i.e., they return the truncation amount). 844A return value of zero indicates that the packet should be ignored. 845The return value is either a constant 846.Pq Dv BPF_K 847or the accumulator 848.Pq Dv BPF_A . 849.Pp 850.Bd -literal 851BPF_RET+BPF_A accept A bytes 852BPF_RET+BPF_K accept k bytes 853.Ed 854.It Dv BPF_MISC 855The miscellaneous category was created for anything that does not 856fit into the above classes, and for any new instructions that might need to 857be added. 858Currently, these are the register transfer instructions 859that copy the index register to the accumulator or vice versa. 860.Pp 861.Bd -literal 862BPF_MISC+BPF_TAX X <- A 863BPF_MISC+BPF_TXA A <- X 864.Ed 865.El 866.Pp 867The 868.Nm 869interface provides the following macros to facilitate 870array initializers: 871.Fn BPF_STMT opcode operand 872and 873.Fn BPF_JUMP opcode operand true_offset false_offset . 874.Sh FILES 875.Bl -tag -compact -width /dev/bpfXXX 876.It Pa /dev/bpf Ns Sy n 877the packet filter device 878.El 879.Sh EXAMPLES 880The following filter is taken from the Reverse ARP Daemon. 881It accepts only Reverse ARP requests. 882.Bd -literal 883struct bpf_insn insns[] = { 884 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 885 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 886 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 887 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 888 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 889 sizeof(struct ether_header)), 890 BPF_STMT(BPF_RET+BPF_K, 0), 891}; 892.Ed 893.Pp 894This filter accepts only IP packets between host 128.3.112.15 and 895128.3.112.35. 896.Bd -literal 897struct bpf_insn insns[] = { 898 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 899 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 900 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 901 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 902 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 903 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 904 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 905 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 906 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 907 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 908 BPF_STMT(BPF_RET+BPF_K, 0), 909}; 910.Ed 911.Pp 912Finally, this filter returns only TCP finger packets. 913We must parse the IP header to reach the TCP header. 914The 915.Dv BPF_JSET 916instruction 917checks that the IP fragment offset is 0 so we are sure 918that we have a TCP header. 919.Bd -literal 920struct bpf_insn insns[] = { 921 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 922 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 923 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 924 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 925 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 926 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 927 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 928 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 929 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 930 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 931 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 932 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 933 BPF_STMT(BPF_RET+BPF_K, 0), 934}; 935.Ed 936.Sh SEE ALSO 937.Xr tcpdump 1 , 938.Xr ioctl 2 , 939.Xr kqueue 2 , 940.Xr poll 2 , 941.Xr select 2 , 942.Xr byteorder 3 , 943.Xr ng_bpf 4 , 944.Xr bpf 9 945.Rs 946.%A McCanne, S. 947.%A Jacobson V. 948.%T "An efficient, extensible, and portable network monitor" 949.Re 950.Sh HISTORY 951The Enet packet filter was created in 1980 by Mike Accetta and 952Rick Rashid at Carnegie-Mellon University. 953Jeffrey Mogul, at 954Stanford, ported the code to 955.Bx 956and continued its development from 9571983 on. 958Since then, it has evolved into the Ultrix Packet Filter at 959.Tn DEC , 960a 961.Tn STREAMS 962.Tn NIT 963module under 964.Tn SunOS 4.1 , 965and 966.Tn BPF . 967.Sh AUTHORS 968.An -nosplit 969.An Steven McCanne , 970of Lawrence Berkeley Laboratory, implemented BPF in 971Summer 1990. 972Much of the design is due to 973.An Van Jacobson . 974.Pp 975Support for zero-copy buffers was added by 976.An Robert N. M. Watson 977under contract to Seccuris Inc. 978.Sh BUGS 979The read buffer must be of a fixed size (returned by the 980.Dv BIOCGBLEN 981ioctl). 982.Pp 983A file that does not request promiscuous mode may receive promiscuously 984received packets as a side effect of another file requesting this 985mode on the same hardware interface. 986This could be fixed in the kernel with additional processing overhead. 987However, we favor the model where 988all files must assume that the interface is promiscuous, and if 989so desired, must utilize a filter to reject foreign packets. 990.Pp 991Data link protocols with variable length headers are not currently supported. 992.Pp 993The 994.Dv SEESENT , 995.Dv DIRECTION , 996and 997.Dv FEEDBACK 998settings have been observed to work incorrectly on some interface 999types, including those with hardware loopback rather than software loopback, 1000and point-to-point interfaces. 1001They appear to function correctly on a 1002broad range of Ethernet-style interfaces. 1003