xref: /linux/Documentation/bpf/prog_sk_lookup.rst (revision cbecf716ca618fd44feda6bd9a64a8179d031fc5)
1*07ff4f01SJakub Sitnicki.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
2*07ff4f01SJakub Sitnicki
3*07ff4f01SJakub Sitnicki=====================
4*07ff4f01SJakub SitnickiBPF sk_lookup program
5*07ff4f01SJakub Sitnicki=====================
6*07ff4f01SJakub Sitnicki
7*07ff4f01SJakub SitnickiBPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
8*07ff4f01SJakub Sitnickiinto the socket lookup performed by the transport layer when a packet is to be
9*07ff4f01SJakub Sitnickidelivered locally.
10*07ff4f01SJakub Sitnicki
11*07ff4f01SJakub SitnickiWhen invoked BPF sk_lookup program can select a socket that will receive the
12*07ff4f01SJakub Sitnickiincoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
13*07ff4f01SJakub Sitnicki
14*07ff4f01SJakub SitnickiHooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
15*07ff4f01SJakub Sitnicki
16*07ff4f01SJakub SitnickiMotivation
17*07ff4f01SJakub Sitnicki==========
18*07ff4f01SJakub Sitnicki
19*07ff4f01SJakub SitnickiBPF sk_lookup program type was introduced to address setup scenarios where
20*07ff4f01SJakub Sitnickibinding sockets to an address with ``bind()`` socket call is impractical, such
21*07ff4f01SJakub Sitnickias:
22*07ff4f01SJakub Sitnicki
23*07ff4f01SJakub Sitnicki1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
24*07ff4f01SJakub Sitnicki   binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
25*07ff4f01SJakub Sitnicki   conflict,
26*07ff4f01SJakub Sitnicki2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
27*07ff4f01SJakub Sitnicki   case.
28*07ff4f01SJakub Sitnicki
29*07ff4f01SJakub SitnickiSuch setups would require creating and ``bind()``'ing one socket to each of the
30*07ff4f01SJakub SitnickiIP address/port in the range, leading to resource consumption and potential
31*07ff4f01SJakub Sitnickilatency spikes during socket lookup.
32*07ff4f01SJakub Sitnicki
33*07ff4f01SJakub SitnickiAttachment
34*07ff4f01SJakub Sitnicki==========
35*07ff4f01SJakub Sitnicki
36*07ff4f01SJakub SitnickiBPF sk_lookup program can be attached to a network namespace with
37*07ff4f01SJakub Sitnicki``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
38*07ff4f01SJakub Sitnickinetns FD as attachment ``target_fd``.
39*07ff4f01SJakub Sitnicki
40*07ff4f01SJakub SitnickiMultiple programs can be attached to one network namespace. Programs will be
41*07ff4f01SJakub Sitnickiinvoked in the same order as they were attached.
42*07ff4f01SJakub Sitnicki
43*07ff4f01SJakub SitnickiHooks
44*07ff4f01SJakub Sitnicki=====
45*07ff4f01SJakub Sitnicki
46*07ff4f01SJakub SitnickiThe attached BPF sk_lookup programs run whenever the transport layer needs to
47*07ff4f01SJakub Sitnickifind a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
48*07ff4f01SJakub Sitnicki
49*07ff4f01SJakub SitnickiIncoming traffic to established (TCP) and connected (UDP) sockets is delivered
50*07ff4f01SJakub Sitnickias usual without triggering the BPF sk_lookup hook.
51*07ff4f01SJakub Sitnicki
52*07ff4f01SJakub SitnickiThe attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
53*07ff4f01SJakub Sitnickiverdict code. As for other BPF program types that are network filters,
54*07ff4f01SJakub Sitnicki``SK_PASS`` signifies that the socket lookup should continue on to regular
55*07ff4f01SJakub Sitnickihashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
56*07ff4f01SJakub Sitnickipacket.
57*07ff4f01SJakub Sitnicki
58*07ff4f01SJakub SitnickiA BPF sk_lookup program can also select a socket to receive the packet by
59*07ff4f01SJakub Sitnickicalling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
60*07ff4f01SJakub Sitnickiin a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
61*07ff4f01SJakub Sitnicki``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
62*07ff4f01SJakub Sitnickiselection. Selecting a socket only takes effect if the program has terminated
63*07ff4f01SJakub Sitnickiwith ``SK_PASS`` code.
64*07ff4f01SJakub Sitnicki
65*07ff4f01SJakub SitnickiWhen multiple programs are attached, the end result is determined from return
66*07ff4f01SJakub Sitnickicodes of all the programs according to the following rules:
67*07ff4f01SJakub Sitnicki
68*07ff4f01SJakub Sitnicki1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
69*07ff4f01SJakub Sitnicki   is used as the result of the socket lookup.
70*07ff4f01SJakub Sitnicki2. If more than one program returned ``SK_PASS`` and selected a socket, the last
71*07ff4f01SJakub Sitnicki   selection takes effect.
72*07ff4f01SJakub Sitnicki3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
73*07ff4f01SJakub Sitnicki   selected a socket, socket lookup fails.
74*07ff4f01SJakub Sitnicki4. If all programs returned ``SK_PASS`` and none of them selected a socket,
75*07ff4f01SJakub Sitnicki   socket lookup continues on.
76*07ff4f01SJakub Sitnicki
77*07ff4f01SJakub SitnickiAPI
78*07ff4f01SJakub Sitnicki===
79*07ff4f01SJakub Sitnicki
80*07ff4f01SJakub SitnickiIn its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
81*07ff4f01SJakub Sitnickireceives information about the packet that triggered the socket lookup. Namely:
82*07ff4f01SJakub Sitnicki
83*07ff4f01SJakub Sitnicki* IP version (``AF_INET`` or ``AF_INET6``),
84*07ff4f01SJakub Sitnicki* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
85*07ff4f01SJakub Sitnicki* source and destination IP address,
86*07ff4f01SJakub Sitnicki* source and destination L4 port,
87*07ff4f01SJakub Sitnicki* the socket that has been selected with ``bpf_sk_assign()``.
88*07ff4f01SJakub Sitnicki
89*07ff4f01SJakub SitnickiRefer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
90*07ff4f01SJakub Sitnickiheader, and `bpf-helpers(7)
91*07ff4f01SJakub Sitnicki<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
92*07ff4f01SJakub Sitnickifor ``bpf_sk_assign()`` for details.
93*07ff4f01SJakub Sitnicki
94*07ff4f01SJakub SitnickiExample
95*07ff4f01SJakub Sitnicki=======
96*07ff4f01SJakub Sitnicki
97*07ff4f01SJakub SitnickiSee ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
98*07ff4f01SJakub Sitnickiimplementation.
99