1*07ff4f01SJakub Sitnicki.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) 2*07ff4f01SJakub Sitnicki 3*07ff4f01SJakub Sitnicki===================== 4*07ff4f01SJakub SitnickiBPF sk_lookup program 5*07ff4f01SJakub Sitnicki===================== 6*07ff4f01SJakub Sitnicki 7*07ff4f01SJakub SitnickiBPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability 8*07ff4f01SJakub Sitnickiinto the socket lookup performed by the transport layer when a packet is to be 9*07ff4f01SJakub Sitnickidelivered locally. 10*07ff4f01SJakub Sitnicki 11*07ff4f01SJakub SitnickiWhen invoked BPF sk_lookup program can select a socket that will receive the 12*07ff4f01SJakub Sitnickiincoming packet by calling the ``bpf_sk_assign()`` BPF helper function. 13*07ff4f01SJakub Sitnicki 14*07ff4f01SJakub SitnickiHooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. 15*07ff4f01SJakub Sitnicki 16*07ff4f01SJakub SitnickiMotivation 17*07ff4f01SJakub Sitnicki========== 18*07ff4f01SJakub Sitnicki 19*07ff4f01SJakub SitnickiBPF sk_lookup program type was introduced to address setup scenarios where 20*07ff4f01SJakub Sitnickibinding sockets to an address with ``bind()`` socket call is impractical, such 21*07ff4f01SJakub Sitnickias: 22*07ff4f01SJakub Sitnicki 23*07ff4f01SJakub Sitnicki1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when 24*07ff4f01SJakub Sitnicki binding to a wildcard address ``INADRR_ANY`` is not possible due to a port 25*07ff4f01SJakub Sitnicki conflict, 26*07ff4f01SJakub Sitnicki2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use 27*07ff4f01SJakub Sitnicki case. 28*07ff4f01SJakub Sitnicki 29*07ff4f01SJakub SitnickiSuch setups would require creating and ``bind()``'ing one socket to each of the 30*07ff4f01SJakub SitnickiIP address/port in the range, leading to resource consumption and potential 31*07ff4f01SJakub Sitnickilatency spikes during socket lookup. 32*07ff4f01SJakub Sitnicki 33*07ff4f01SJakub SitnickiAttachment 34*07ff4f01SJakub Sitnicki========== 35*07ff4f01SJakub Sitnicki 36*07ff4f01SJakub SitnickiBPF sk_lookup program can be attached to a network namespace with 37*07ff4f01SJakub Sitnicki``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a 38*07ff4f01SJakub Sitnickinetns FD as attachment ``target_fd``. 39*07ff4f01SJakub Sitnicki 40*07ff4f01SJakub SitnickiMultiple programs can be attached to one network namespace. Programs will be 41*07ff4f01SJakub Sitnickiinvoked in the same order as they were attached. 42*07ff4f01SJakub Sitnicki 43*07ff4f01SJakub SitnickiHooks 44*07ff4f01SJakub Sitnicki===== 45*07ff4f01SJakub Sitnicki 46*07ff4f01SJakub SitnickiThe attached BPF sk_lookup programs run whenever the transport layer needs to 47*07ff4f01SJakub Sitnickifind a listening (TCP) or an unconnected (UDP) socket for an incoming packet. 48*07ff4f01SJakub Sitnicki 49*07ff4f01SJakub SitnickiIncoming traffic to established (TCP) and connected (UDP) sockets is delivered 50*07ff4f01SJakub Sitnickias usual without triggering the BPF sk_lookup hook. 51*07ff4f01SJakub Sitnicki 52*07ff4f01SJakub SitnickiThe attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` 53*07ff4f01SJakub Sitnickiverdict code. As for other BPF program types that are network filters, 54*07ff4f01SJakub Sitnicki``SK_PASS`` signifies that the socket lookup should continue on to regular 55*07ff4f01SJakub Sitnickihashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the 56*07ff4f01SJakub Sitnickipacket. 57*07ff4f01SJakub Sitnicki 58*07ff4f01SJakub SitnickiA BPF sk_lookup program can also select a socket to receive the packet by 59*07ff4f01SJakub Sitnickicalling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket 60*07ff4f01SJakub Sitnickiin a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a 61*07ff4f01SJakub Sitnicki``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the 62*07ff4f01SJakub Sitnickiselection. Selecting a socket only takes effect if the program has terminated 63*07ff4f01SJakub Sitnickiwith ``SK_PASS`` code. 64*07ff4f01SJakub Sitnicki 65*07ff4f01SJakub SitnickiWhen multiple programs are attached, the end result is determined from return 66*07ff4f01SJakub Sitnickicodes of all the programs according to the following rules: 67*07ff4f01SJakub Sitnicki 68*07ff4f01SJakub Sitnicki1. If any program returned ``SK_PASS`` and selected a valid socket, the socket 69*07ff4f01SJakub Sitnicki is used as the result of the socket lookup. 70*07ff4f01SJakub Sitnicki2. If more than one program returned ``SK_PASS`` and selected a socket, the last 71*07ff4f01SJakub Sitnicki selection takes effect. 72*07ff4f01SJakub Sitnicki3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and 73*07ff4f01SJakub Sitnicki selected a socket, socket lookup fails. 74*07ff4f01SJakub Sitnicki4. If all programs returned ``SK_PASS`` and none of them selected a socket, 75*07ff4f01SJakub Sitnicki socket lookup continues on. 76*07ff4f01SJakub Sitnicki 77*07ff4f01SJakub SitnickiAPI 78*07ff4f01SJakub Sitnicki=== 79*07ff4f01SJakub Sitnicki 80*07ff4f01SJakub SitnickiIn its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program 81*07ff4f01SJakub Sitnickireceives information about the packet that triggered the socket lookup. Namely: 82*07ff4f01SJakub Sitnicki 83*07ff4f01SJakub Sitnicki* IP version (``AF_INET`` or ``AF_INET6``), 84*07ff4f01SJakub Sitnicki* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), 85*07ff4f01SJakub Sitnicki* source and destination IP address, 86*07ff4f01SJakub Sitnicki* source and destination L4 port, 87*07ff4f01SJakub Sitnicki* the socket that has been selected with ``bpf_sk_assign()``. 88*07ff4f01SJakub Sitnicki 89*07ff4f01SJakub SitnickiRefer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API 90*07ff4f01SJakub Sitnickiheader, and `bpf-helpers(7) 91*07ff4f01SJakub Sitnicki<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section 92*07ff4f01SJakub Sitnickifor ``bpf_sk_assign()`` for details. 93*07ff4f01SJakub Sitnicki 94*07ff4f01SJakub SitnickiExample 95*07ff4f01SJakub Sitnicki======= 96*07ff4f01SJakub Sitnicki 97*07ff4f01SJakub SitnickiSee ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference 98*07ff4f01SJakub Sitnickiimplementation. 99