xref: /linux/Documentation/bpf/prog_flow_dissector.rst (revision 0ea5c948cb64bab5bc7a5516774eb8536f05aa0d)
180695946SStanislav Fomichev.. SPDX-License-Identifier: GPL-2.0
280695946SStanislav Fomichev
380695946SStanislav Fomichev============================
480695946SStanislav FomichevBPF_PROG_TYPE_FLOW_DISSECTOR
580695946SStanislav Fomichev============================
680695946SStanislav Fomichev
780695946SStanislav FomichevOverview
880695946SStanislav Fomichev========
980695946SStanislav Fomichev
1080695946SStanislav FomichevFlow dissector is a routine that parses metadata out of the packets. It's
1180695946SStanislav Fomichevused in the various places in the networking subsystem (RFS, flow hash, etc).
1280695946SStanislav Fomichev
1380695946SStanislav FomichevBPF flow dissector is an attempt to reimplement C-based flow dissector logic
1480695946SStanislav Fomichevin BPF to gain all the benefits of BPF verifier (namely, limits on the
1580695946SStanislav Fomichevnumber of instructions and tail calls).
1680695946SStanislav Fomichev
1780695946SStanislav FomichevAPI
1880695946SStanislav Fomichev===
1980695946SStanislav Fomichev
2080695946SStanislav FomichevBPF flow dissector programs operate on an ``__sk_buff``. However, only the
2180695946SStanislav Fomichevlimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
2280695946SStanislav Fomichev``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
2380695946SStanislav Fomichevand output arguments.
2480695946SStanislav Fomichev
2580695946SStanislav FomichevThe inputs are:
2680695946SStanislav Fomichev  * ``nhoff`` - initial offset of the networking header
2780695946SStanislav Fomichev  * ``thoff`` - initial offset of the transport header, initialized to nhoff
2880695946SStanislav Fomichev  * ``n_proto`` - L3 protocol type, parsed out of L2 header
291ac6b126SStanislav Fomichev  * ``flags`` - optional flags
3080695946SStanislav Fomichev
3180695946SStanislav FomichevFlow dissector BPF program should fill out the rest of the ``struct
3280695946SStanislav Fomichevbpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
3380695946SStanislav Fomichevalso adjusted accordingly.
3480695946SStanislav Fomichev
3580695946SStanislav FomichevThe return code of the BPF program is either BPF_OK to indicate successful
3680695946SStanislav Fomichevdissection, or BPF_DROP to indicate parsing error.
3780695946SStanislav Fomichev
3880695946SStanislav Fomichev__sk_buff->data
3980695946SStanislav Fomichev===============
4080695946SStanislav Fomichev
4180695946SStanislav FomichevIn the VLAN-less case, this is what the initial state of the BPF flow
4280695946SStanislav Fomichevdissector looks like::
4380695946SStanislav Fomichev
4480695946SStanislav Fomichev  +------+------+------------+-----------+
4580695946SStanislav Fomichev  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
4680695946SStanislav Fomichev  +------+------+------------+-----------+
4780695946SStanislav Fomichev                              ^
4880695946SStanislav Fomichev                              |
4980695946SStanislav Fomichev                              +-- flow dissector starts here
5080695946SStanislav Fomichev
5180695946SStanislav Fomichev
5280695946SStanislav Fomichev.. code:: c
5380695946SStanislav Fomichev
5480695946SStanislav Fomichev  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
5580695946SStanislav Fomichev  flow_keys->thoff = nhoff
5680695946SStanislav Fomichev  flow_keys->n_proto = ETHER_TYPE
5780695946SStanislav Fomichev
5880695946SStanislav FomichevIn case of VLAN, flow dissector can be called with the two different states.
5980695946SStanislav Fomichev
6080695946SStanislav FomichevPre-VLAN parsing::
6180695946SStanislav Fomichev
6280695946SStanislav Fomichev  +------+------+------+-----+-----------+-----------+
6380695946SStanislav Fomichev  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
6480695946SStanislav Fomichev  +------+------+------+-----+-----------+-----------+
6580695946SStanislav Fomichev                        ^
6680695946SStanislav Fomichev                        |
6780695946SStanislav Fomichev                        +-- flow dissector starts here
6880695946SStanislav Fomichev
6980695946SStanislav Fomichev.. code:: c
7080695946SStanislav Fomichev
7180695946SStanislav Fomichev  skb->data + flow_keys->nhoff point the to first byte of TCI
7280695946SStanislav Fomichev  flow_keys->thoff = nhoff
7380695946SStanislav Fomichev  flow_keys->n_proto = TPID
7480695946SStanislav Fomichev
7580695946SStanislav FomichevPlease note that TPID can be 802.1AD and, hence, BPF program would
7680695946SStanislav Fomichevhave to parse VLAN information twice for double tagged packets.
7780695946SStanislav Fomichev
7880695946SStanislav Fomichev
7980695946SStanislav FomichevPost-VLAN parsing::
8080695946SStanislav Fomichev
8180695946SStanislav Fomichev  +------+------+------+-----+-----------+-----------+
8280695946SStanislav Fomichev  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
8380695946SStanislav Fomichev  +------+------+------+-----+-----------+-----------+
8480695946SStanislav Fomichev                                          ^
8580695946SStanislav Fomichev                                          |
8680695946SStanislav Fomichev                                          +-- flow dissector starts here
8780695946SStanislav Fomichev
8880695946SStanislav Fomichev.. code:: c
8980695946SStanislav Fomichev
9080695946SStanislav Fomichev  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
9180695946SStanislav Fomichev  flow_keys->thoff = nhoff
9280695946SStanislav Fomichev  flow_keys->n_proto = ETHER_TYPE
9380695946SStanislav Fomichev
9480695946SStanislav FomichevIn this case VLAN information has been processed before the flow dissector
9580695946SStanislav Fomichevand BPF flow dissector is not required to handle it.
9680695946SStanislav Fomichev
9780695946SStanislav Fomichev
9880695946SStanislav FomichevThe takeaway here is as follows: BPF flow dissector program can be called with
9980695946SStanislav Fomichevthe optional VLAN header and should gracefully handle both cases: when single
10080695946SStanislav Fomichevor double VLAN is present and when it is not present. The same program
10180695946SStanislav Fomichevcan be called for both cases and would have to be written carefully to
10280695946SStanislav Fomichevhandle both cases.
10380695946SStanislav Fomichev
10480695946SStanislav Fomichev
1051ac6b126SStanislav FomichevFlags
1061ac6b126SStanislav Fomichev=====
1071ac6b126SStanislav Fomichev
1081ac6b126SStanislav Fomichev``flow_keys->flags`` might contain optional input flags that work as follows:
1091ac6b126SStanislav Fomichev
1101ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
1111ac6b126SStanislav Fomichev  continue parsing first fragment; the default expected behavior is that
1121ac6b126SStanislav Fomichev  flow dissector returns as soon as it finds out that the packet is fragmented;
1131ac6b126SStanislav Fomichev  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
1141ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
1151ac6b126SStanislav Fomichev  stop parsing as soon as it reaches IPv6 flow label; used by
116*558c50ccSQuan Tian  ``___skb_get_hash`` to get flow hash.
1171ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
1181ac6b126SStanislav Fomichev  parsing as soon as it reaches encapsulated headers; used by routing
1191ac6b126SStanislav Fomichev  infrastructure.
1201ac6b126SStanislav Fomichev
1211ac6b126SStanislav Fomichev
12280695946SStanislav FomichevReference Implementation
12380695946SStanislav Fomichev========================
12480695946SStanislav Fomichev
12580695946SStanislav FomichevSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
12680695946SStanislav Fomichevimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
12780695946SStanislav Fomichevfor the loader. bpftool can be used to load BPF flow dissector program as well.
12880695946SStanislav Fomichev
12980695946SStanislav FomichevThe reference implementation is organized as follows:
13080695946SStanislav Fomichev  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
13180695946SStanislav Fomichev  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
13280695946SStanislav Fomichev    does ``bpf_tail_call`` to the appropriate L3 handler
13380695946SStanislav Fomichev
13480695946SStanislav FomichevSince BPF at this point doesn't support looping (or any jumping back),
13580695946SStanislav Fomichevjmp_table is used instead to handle multiple levels of encapsulation (and
13680695946SStanislav FomichevIPv6 options).
13780695946SStanislav Fomichev
13880695946SStanislav Fomichev
13980695946SStanislav FomichevCurrent Limitations
14080695946SStanislav Fomichev===================
14180695946SStanislav FomichevBPF flow dissector doesn't support exporting all the metadata that in-kernel
14280695946SStanislav FomichevC-based implementation can export. Notable example is single VLAN (802.1Q)
14380695946SStanislav Fomichevand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
14480695946SStanislav Fomichevfor a set of information that's currently can be exported from the BPF context.
145a11c397cSStanislav Fomichev
146a11c397cSStanislav FomichevWhen BPF flow dissector is attached to the root network namespace (machine-wide
147a11c397cSStanislav Fomichevpolicy), users can't override it in their child network namespaces.
148