1cafb92d7SMaryam Tahhan.. SPDX-License-Identifier: GPL-2.0-only 2cafb92d7SMaryam Tahhan.. Copyright Red Hat 3cafb92d7SMaryam Tahhan 4cafb92d7SMaryam Tahhan============================================== 5cafb92d7SMaryam TahhanBPF_MAP_TYPE_SOCKMAP and BPF_MAP_TYPE_SOCKHASH 6cafb92d7SMaryam Tahhan============================================== 7cafb92d7SMaryam Tahhan 8cafb92d7SMaryam Tahhan.. note:: 9cafb92d7SMaryam Tahhan - ``BPF_MAP_TYPE_SOCKMAP`` was introduced in kernel version 4.14 10cafb92d7SMaryam Tahhan - ``BPF_MAP_TYPE_SOCKHASH`` was introduced in kernel version 4.18 11cafb92d7SMaryam Tahhan 12cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP`` and ``BPF_MAP_TYPE_SOCKHASH`` maps can be used to 13cafb92d7SMaryam Tahhanredirect skbs between sockets or to apply policy at the socket level based on 14cafb92d7SMaryam Tahhanthe result of a BPF (verdict) program with the help of the BPF helpers 15cafb92d7SMaryam Tahhan``bpf_sk_redirect_map()``, ``bpf_sk_redirect_hash()``, 16cafb92d7SMaryam Tahhan``bpf_msg_redirect_map()`` and ``bpf_msg_redirect_hash()``. 17cafb92d7SMaryam Tahhan 18cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP`` is backed by an array that uses an integer key as the 19cafb92d7SMaryam Tahhanindex to look up a reference to a ``struct sock``. The map values are socket 20cafb92d7SMaryam Tahhandescriptors. Similarly, ``BPF_MAP_TYPE_SOCKHASH`` is a hash backed BPF map that 21cafb92d7SMaryam Tahhanholds references to sockets via their socket descriptors. 22cafb92d7SMaryam Tahhan 23cafb92d7SMaryam Tahhan.. note:: 24cafb92d7SMaryam Tahhan The value type is either __u32 or __u64; the latter (__u64) is to support 25cafb92d7SMaryam Tahhan returning socket cookies to userspace. Returning the ``struct sock *`` that 26cafb92d7SMaryam Tahhan the map holds to user-space is neither safe nor useful. 27cafb92d7SMaryam Tahhan 28cafb92d7SMaryam TahhanThese maps may have BPF programs attached to them, specifically a parser program 29cafb92d7SMaryam Tahhanand a verdict program. The parser program determines how much data has been 30cafb92d7SMaryam Tahhanparsed and therefore how much data needs to be queued to come to a verdict. The 31cafb92d7SMaryam Tahhanverdict program is essentially the redirect program and can return a verdict 32cafb92d7SMaryam Tahhanof ``__SK_DROP``, ``__SK_PASS``, or ``__SK_REDIRECT``. 33cafb92d7SMaryam Tahhan 34cafb92d7SMaryam TahhanWhen a socket is inserted into one of these maps, its socket callbacks are 35cafb92d7SMaryam Tahhanreplaced and a ``struct sk_psock`` is attached to it. Additionally, this 36cafb92d7SMaryam Tahhan``sk_psock`` inherits the programs that are attached to the map. 37cafb92d7SMaryam Tahhan 38cafb92d7SMaryam TahhanA sock object may be in multiple maps, but can only inherit a single 39cafb92d7SMaryam Tahhanparse or verdict program. If adding a sock object to a map would result 40cafb92d7SMaryam Tahhanin having multiple parser programs the update will return an EBUSY error. 41cafb92d7SMaryam Tahhan 42cafb92d7SMaryam TahhanThe supported programs to attach to these maps are: 43cafb92d7SMaryam Tahhan 44cafb92d7SMaryam Tahhan.. code-block:: c 45cafb92d7SMaryam Tahhan 46cafb92d7SMaryam Tahhan struct sk_psock_progs { 47cafb92d7SMaryam Tahhan struct bpf_prog *msg_parser; 48cafb92d7SMaryam Tahhan struct bpf_prog *stream_parser; 49cafb92d7SMaryam Tahhan struct bpf_prog *stream_verdict; 50cafb92d7SMaryam Tahhan struct bpf_prog *skb_verdict; 51cafb92d7SMaryam Tahhan }; 52cafb92d7SMaryam Tahhan 53cafb92d7SMaryam Tahhan.. note:: 54cafb92d7SMaryam Tahhan Users are not allowed to attach ``stream_verdict`` and ``skb_verdict`` 55cafb92d7SMaryam Tahhan programs to the same map. 56cafb92d7SMaryam Tahhan 57cafb92d7SMaryam TahhanThe attach types for the map programs are: 58cafb92d7SMaryam Tahhan 59cafb92d7SMaryam Tahhan- ``msg_parser`` program - ``BPF_SK_MSG_VERDICT``. 60cafb92d7SMaryam Tahhan- ``stream_parser`` program - ``BPF_SK_SKB_STREAM_PARSER``. 61cafb92d7SMaryam Tahhan- ``stream_verdict`` program - ``BPF_SK_SKB_STREAM_VERDICT``. 62cafb92d7SMaryam Tahhan- ``skb_verdict`` program - ``BPF_SK_SKB_VERDICT``. 63cafb92d7SMaryam Tahhan 64cafb92d7SMaryam TahhanThere are additional helpers available to use with the parser and verdict 65cafb92d7SMaryam Tahhanprograms: ``bpf_msg_apply_bytes()`` and ``bpf_msg_cork_bytes()``. With 66cafb92d7SMaryam Tahhan``bpf_msg_apply_bytes()`` BPF programs can tell the infrastructure how many 67cafb92d7SMaryam Tahhanbytes the given verdict should apply to. The helper ``bpf_msg_cork_bytes()`` 68cafb92d7SMaryam Tahhanhandles a different case where a BPF program cannot reach a verdict on a msg 69cafb92d7SMaryam Tahhanuntil it receives more bytes AND the program doesn't want to forward the packet 70cafb92d7SMaryam Tahhanuntil it is known to be good. 71cafb92d7SMaryam Tahhan 72cafb92d7SMaryam TahhanFinally, the helpers ``bpf_msg_pull_data()`` and ``bpf_msg_push_data()`` are 73cafb92d7SMaryam Tahhanavailable to ``BPF_PROG_TYPE_SK_MSG`` BPF programs to pull in data and set the 74cafb92d7SMaryam Tahhanstart and end pointers to given values or to add metadata to the ``struct 75cafb92d7SMaryam Tahhansk_msg_buff *msg``. 76cafb92d7SMaryam Tahhan 77cafb92d7SMaryam TahhanAll these helpers will be described in more detail below. 78cafb92d7SMaryam Tahhan 79cafb92d7SMaryam TahhanUsage 80cafb92d7SMaryam Tahhan===== 81cafb92d7SMaryam TahhanKernel BPF 82cafb92d7SMaryam Tahhan---------- 83cafb92d7SMaryam Tahhanbpf_msg_redirect_map() 84cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 85cafb92d7SMaryam Tahhan.. code-block:: c 86cafb92d7SMaryam Tahhan 87cafb92d7SMaryam Tahhan long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 key, u64 flags) 88cafb92d7SMaryam Tahhan 89cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the socket level. If 90cafb92d7SMaryam Tahhanthe message ``msg`` is allowed to pass (i.e., if the verdict BPF program 91cafb92d7SMaryam Tahhanreturns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type 92cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces 93cafb92d7SMaryam Tahhancan be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used 94cafb92d7SMaryam Tahhanto select the ingress path otherwise the egress path is selected. This is the 95cafb92d7SMaryam Tahhanonly flag supported for now. 96cafb92d7SMaryam Tahhan 97cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error. 98cafb92d7SMaryam Tahhan 99cafb92d7SMaryam Tahhanbpf_sk_redirect_map() 100cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 101cafb92d7SMaryam Tahhan.. code-block:: c 102cafb92d7SMaryam Tahhan 103cafb92d7SMaryam Tahhan long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32 key u64 flags) 104cafb92d7SMaryam Tahhan 105cafb92d7SMaryam TahhanRedirect the packet to the socket referenced by ``map`` (of type 106cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces 107cafb92d7SMaryam Tahhancan be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used 108cafb92d7SMaryam Tahhanto select the ingress path otherwise the egress path is selected. This is the 109cafb92d7SMaryam Tahhanonly flag supported for now. 110cafb92d7SMaryam Tahhan 111cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error. 112cafb92d7SMaryam Tahhan 113cafb92d7SMaryam Tahhanbpf_map_lookup_elem() 114cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 115cafb92d7SMaryam Tahhan.. code-block:: c 116cafb92d7SMaryam Tahhan 117cafb92d7SMaryam Tahhan void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 118cafb92d7SMaryam Tahhan 119cafb92d7SMaryam Tahhansocket entries of type ``struct sock *`` can be retrieved using the 120cafb92d7SMaryam Tahhan``bpf_map_lookup_elem()`` helper. 121cafb92d7SMaryam Tahhan 122cafb92d7SMaryam Tahhanbpf_sock_map_update() 123cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 124cafb92d7SMaryam Tahhan.. code-block:: c 125cafb92d7SMaryam Tahhan 126cafb92d7SMaryam Tahhan long bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) 127cafb92d7SMaryam Tahhan 128cafb92d7SMaryam TahhanAdd an entry to, or update a ``map`` referencing sockets. The ``skops`` is used 129cafb92d7SMaryam Tahhanas a new value for the entry associated to ``key``. The ``flags`` argument can 130cafb92d7SMaryam Tahhanbe one of the following: 131cafb92d7SMaryam Tahhan 132cafb92d7SMaryam Tahhan- ``BPF_ANY``: Create a new element or update an existing element. 133cafb92d7SMaryam Tahhan- ``BPF_NOEXIST``: Create a new element only if it did not exist. 134cafb92d7SMaryam Tahhan- ``BPF_EXIST``: Update an existing element. 135cafb92d7SMaryam Tahhan 136cafb92d7SMaryam TahhanIf the ``map`` has BPF programs (parser and verdict), those will be inherited 137cafb92d7SMaryam Tahhanby the socket being added. If the socket is already attached to BPF programs, 138cafb92d7SMaryam Tahhanthis results in an error. 139cafb92d7SMaryam Tahhan 140cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure. 141cafb92d7SMaryam Tahhan 142cafb92d7SMaryam Tahhanbpf_sock_hash_update() 143cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 144cafb92d7SMaryam Tahhan.. code-block:: c 145cafb92d7SMaryam Tahhan 146cafb92d7SMaryam Tahhan long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) 147cafb92d7SMaryam Tahhan 148cafb92d7SMaryam TahhanAdd an entry to, or update a sockhash ``map`` referencing sockets. The ``skops`` 149cafb92d7SMaryam Tahhanis used as a new value for the entry associated to ``key``. 150cafb92d7SMaryam Tahhan 151cafb92d7SMaryam TahhanThe ``flags`` argument can be one of the following: 152cafb92d7SMaryam Tahhan 153cafb92d7SMaryam Tahhan- ``BPF_ANY``: Create a new element or update an existing element. 154cafb92d7SMaryam Tahhan- ``BPF_NOEXIST``: Create a new element only if it did not exist. 155cafb92d7SMaryam Tahhan- ``BPF_EXIST``: Update an existing element. 156cafb92d7SMaryam Tahhan 157cafb92d7SMaryam TahhanIf the ``map`` has BPF programs (parser and verdict), those will be inherited 158cafb92d7SMaryam Tahhanby the socket being added. If the socket is already attached to BPF programs, 159cafb92d7SMaryam Tahhanthis results in an error. 160cafb92d7SMaryam Tahhan 161cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure. 162cafb92d7SMaryam Tahhan 163cafb92d7SMaryam Tahhanbpf_msg_redirect_hash() 164cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^^ 165cafb92d7SMaryam Tahhan.. code-block:: c 166cafb92d7SMaryam Tahhan 167cafb92d7SMaryam Tahhan long bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags) 168cafb92d7SMaryam Tahhan 169cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the socket level. If 170cafb92d7SMaryam Tahhanthe message ``msg`` is allowed to pass (i.e., if the verdict BPF program returns 171cafb92d7SMaryam Tahhan``SK_PASS``), redirect it to the socket referenced by ``map`` (of type 172cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress 173cafb92d7SMaryam Tahhaninterfaces can be used for redirection. The ``BPF_F_INGRESS`` value in 174cafb92d7SMaryam Tahhan``flags`` is used to select the ingress path otherwise the egress path is 175cafb92d7SMaryam Tahhanselected. This is the only flag supported for now. 176cafb92d7SMaryam Tahhan 177cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error. 178cafb92d7SMaryam Tahhan 179cafb92d7SMaryam Tahhanbpf_sk_redirect_hash() 180cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 181cafb92d7SMaryam Tahhan.. code-block:: c 182cafb92d7SMaryam Tahhan 183cafb92d7SMaryam Tahhan long bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags) 184cafb92d7SMaryam Tahhan 185cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the skb socket level. 186cafb92d7SMaryam TahhanIf the sk_buff ``skb`` is allowed to pass (i.e., if the verdict BPF program 187cafb92d7SMaryam Tahhanreturns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type 188cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress 189cafb92d7SMaryam Tahhaninterfaces can be used for redirection. The ``BPF_F_INGRESS`` value in 190cafb92d7SMaryam Tahhan``flags`` is used to select the ingress path otherwise the egress path is 191cafb92d7SMaryam Tahhanselected. This is the only flag supported for now. 192cafb92d7SMaryam Tahhan 193cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error. 194cafb92d7SMaryam Tahhan 195cafb92d7SMaryam Tahhanbpf_msg_apply_bytes() 196cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 197cafb92d7SMaryam Tahhan.. code-block:: c 198cafb92d7SMaryam Tahhan 199cafb92d7SMaryam Tahhan long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes) 200cafb92d7SMaryam Tahhan 201cafb92d7SMaryam TahhanFor socket policies, apply the verdict of the BPF program to the next (number 202cafb92d7SMaryam Tahhanof ``bytes``) of message ``msg``. For example, this helper can be used in the 203cafb92d7SMaryam Tahhanfollowing cases: 204cafb92d7SMaryam Tahhan 205cafb92d7SMaryam Tahhan- A single ``sendmsg()`` or ``sendfile()`` system call contains multiple 206cafb92d7SMaryam Tahhan logical messages that the BPF program is supposed to read and for which it 207cafb92d7SMaryam Tahhan should apply a verdict. 208cafb92d7SMaryam Tahhan- A BPF program only cares to read the first ``bytes`` of a ``msg``. If the 209cafb92d7SMaryam Tahhan message has a large payload, then setting up and calling the BPF program 210cafb92d7SMaryam Tahhan repeatedly for all bytes, even though the verdict is already known, would 211cafb92d7SMaryam Tahhan create unnecessary overhead. 212cafb92d7SMaryam Tahhan 213cafb92d7SMaryam TahhanReturns 0 214cafb92d7SMaryam Tahhan 215cafb92d7SMaryam Tahhanbpf_msg_cork_bytes() 216cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 217cafb92d7SMaryam Tahhan.. code-block:: c 218cafb92d7SMaryam Tahhan 219cafb92d7SMaryam Tahhan long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes) 220cafb92d7SMaryam Tahhan 221cafb92d7SMaryam TahhanFor socket policies, prevent the execution of the verdict BPF program for 222cafb92d7SMaryam Tahhanmessage ``msg`` until the number of ``bytes`` have been accumulated. 223cafb92d7SMaryam Tahhan 224cafb92d7SMaryam TahhanThis can be used when one needs a specific number of bytes before a verdict can 225cafb92d7SMaryam Tahhanbe assigned, even if the data spans multiple ``sendmsg()`` or ``sendfile()`` 226cafb92d7SMaryam Tahhancalls. 227cafb92d7SMaryam Tahhan 228cafb92d7SMaryam TahhanReturns 0 229cafb92d7SMaryam Tahhan 230cafb92d7SMaryam Tahhanbpf_msg_pull_data() 231cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 232cafb92d7SMaryam Tahhan.. code-block:: c 233cafb92d7SMaryam Tahhan 234cafb92d7SMaryam Tahhan long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64 flags) 235cafb92d7SMaryam Tahhan 236cafb92d7SMaryam TahhanFor socket policies, pull in non-linear data from user space for ``msg`` and set 237cafb92d7SMaryam Tahhanpointers ``msg->data`` and ``msg->data_end`` to ``start`` and ``end`` bytes 238cafb92d7SMaryam Tahhanoffsets into ``msg``, respectively. 239cafb92d7SMaryam Tahhan 240cafb92d7SMaryam TahhanIf a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only 241cafb92d7SMaryam Tahhanparse data that the (``data``, ``data_end``) pointers have already consumed. 242cafb92d7SMaryam TahhanFor ``sendmsg()`` hooks this is likely the first scatterlist element. But for 243*dc97391eSDavid Howellscalls relying on MSG_SPLICE_PAGES (e.g., ``sendfile()``) this will be the 244*dc97391eSDavid Howellsrange (**0**, **0**) because the data is shared with user space and by default 245*dc97391eSDavid Howellsthe objective is to avoid allowing user space to modify data while (or after) 246*dc97391eSDavid HowellsBPF verdict is being decided. This helper can be used to pull in data and to 247*dc97391eSDavid Howellsset the start and end pointers to given values. Data will be copied if 248cafb92d7SMaryam Tahhannecessary (i.e., if data was not linear and if start and end pointers do not 249cafb92d7SMaryam Tahhanpoint to the same chunk). 250cafb92d7SMaryam Tahhan 251cafb92d7SMaryam TahhanA call to this helper is susceptible to change the underlying packet buffer. 252cafb92d7SMaryam TahhanTherefore, at load time, all checks on pointers previously done by the verifier 253cafb92d7SMaryam Tahhanare invalidated and must be performed again, if the helper is used in 254cafb92d7SMaryam Tahhancombination with direct packet access. 255cafb92d7SMaryam Tahhan 256cafb92d7SMaryam TahhanAll values for ``flags`` are reserved for future usage, and must be left at 257cafb92d7SMaryam Tahhanzero. 258cafb92d7SMaryam Tahhan 259cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure. 260cafb92d7SMaryam Tahhan 261cafb92d7SMaryam Tahhanbpf_map_lookup_elem() 262cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 263cafb92d7SMaryam Tahhan 264cafb92d7SMaryam Tahhan.. code-block:: c 265cafb92d7SMaryam Tahhan 266cafb92d7SMaryam Tahhan void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 267cafb92d7SMaryam Tahhan 268cafb92d7SMaryam TahhanLook up a socket entry in the sockmap or sockhash map. 269cafb92d7SMaryam Tahhan 270cafb92d7SMaryam TahhanReturns the socket entry associated to ``key``, or NULL if no entry was found. 271cafb92d7SMaryam Tahhan 272cafb92d7SMaryam Tahhanbpf_map_update_elem() 273cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 274cafb92d7SMaryam Tahhan.. code-block:: c 275cafb92d7SMaryam Tahhan 276cafb92d7SMaryam Tahhan long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 277cafb92d7SMaryam Tahhan 278cafb92d7SMaryam TahhanAdd or update a socket entry in a sockmap or sockhash. 279cafb92d7SMaryam Tahhan 280cafb92d7SMaryam TahhanThe flags argument can be one of the following: 281cafb92d7SMaryam Tahhan 282cafb92d7SMaryam Tahhan- BPF_ANY: Create a new element or update an existing element. 283cafb92d7SMaryam Tahhan- BPF_NOEXIST: Create a new element only if it did not exist. 284cafb92d7SMaryam Tahhan- BPF_EXIST: Update an existing element. 285cafb92d7SMaryam Tahhan 286cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure. 287cafb92d7SMaryam Tahhan 288cafb92d7SMaryam Tahhanbpf_map_delete_elem() 289cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^ 290cafb92d7SMaryam Tahhan.. code-block:: c 291cafb92d7SMaryam Tahhan 292cafb92d7SMaryam Tahhan long bpf_map_delete_elem(struct bpf_map *map, const void *key) 293cafb92d7SMaryam Tahhan 294cafb92d7SMaryam TahhanDelete a socket entry from a sockmap or a sockhash. 295cafb92d7SMaryam Tahhan 296cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure. 297cafb92d7SMaryam Tahhan 298cafb92d7SMaryam TahhanUser space 299cafb92d7SMaryam Tahhan---------- 300cafb92d7SMaryam Tahhanbpf_map_update_elem() 301cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 302cafb92d7SMaryam Tahhan.. code-block:: c 303cafb92d7SMaryam Tahhan 304cafb92d7SMaryam Tahhan int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags) 305cafb92d7SMaryam Tahhan 306cafb92d7SMaryam TahhanSockmap entries can be added or updated using the ``bpf_map_update_elem()`` 307cafb92d7SMaryam Tahhanfunction. The ``key`` parameter is the index value of the sockmap array. And the 308cafb92d7SMaryam Tahhan``value`` parameter is the FD value of that socket. 309cafb92d7SMaryam Tahhan 310cafb92d7SMaryam TahhanUnder the hood, the sockmap update function uses the socket FD value to 311cafb92d7SMaryam Tahhanretrieve the associated socket and its attached psock. 312cafb92d7SMaryam Tahhan 313cafb92d7SMaryam TahhanThe flags argument can be one of the following: 314cafb92d7SMaryam Tahhan 315cafb92d7SMaryam Tahhan- BPF_ANY: Create a new element or update an existing element. 316cafb92d7SMaryam Tahhan- BPF_NOEXIST: Create a new element only if it did not exist. 317cafb92d7SMaryam Tahhan- BPF_EXIST: Update an existing element. 318cafb92d7SMaryam Tahhan 319cafb92d7SMaryam Tahhanbpf_map_lookup_elem() 320cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 321cafb92d7SMaryam Tahhan.. code-block:: c 322cafb92d7SMaryam Tahhan 323cafb92d7SMaryam Tahhan int bpf_map_lookup_elem(int fd, const void *key, void *value) 324cafb92d7SMaryam Tahhan 325cafb92d7SMaryam TahhanSockmap entries can be retrieved using the ``bpf_map_lookup_elem()`` function. 326cafb92d7SMaryam Tahhan 327cafb92d7SMaryam Tahhan.. note:: 328cafb92d7SMaryam Tahhan The entry returned is a socket cookie rather than a socket itself. 329cafb92d7SMaryam Tahhan 330cafb92d7SMaryam Tahhanbpf_map_delete_elem() 331cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^ 332cafb92d7SMaryam Tahhan.. code-block:: c 333cafb92d7SMaryam Tahhan 334cafb92d7SMaryam Tahhan int bpf_map_delete_elem(int fd, const void *key) 335cafb92d7SMaryam Tahhan 336cafb92d7SMaryam TahhanSockmap entries can be deleted using the ``bpf_map_delete_elem()`` 337cafb92d7SMaryam Tahhanfunction. 338cafb92d7SMaryam Tahhan 339cafb92d7SMaryam TahhanReturns 0 on success, or negative error in case of failure. 340cafb92d7SMaryam Tahhan 341cafb92d7SMaryam TahhanExamples 342cafb92d7SMaryam Tahhan======== 343cafb92d7SMaryam Tahhan 344cafb92d7SMaryam TahhanKernel BPF 345cafb92d7SMaryam Tahhan---------- 346cafb92d7SMaryam TahhanSeveral examples of the use of sockmap APIs can be found in: 347cafb92d7SMaryam Tahhan 348cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_kern.h`_ 349cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`_ 350cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`_ 351cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_listen.c`_ 352cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_update.c`_ 353cafb92d7SMaryam Tahhan 354cafb92d7SMaryam TahhanThe following code snippet shows how to declare a sockmap. 355cafb92d7SMaryam Tahhan 356cafb92d7SMaryam Tahhan.. code-block:: c 357cafb92d7SMaryam Tahhan 358cafb92d7SMaryam Tahhan struct { 359cafb92d7SMaryam Tahhan __uint(type, BPF_MAP_TYPE_SOCKMAP); 360cafb92d7SMaryam Tahhan __uint(max_entries, 1); 361cafb92d7SMaryam Tahhan __type(key, __u32); 362cafb92d7SMaryam Tahhan __type(value, __u64); 363cafb92d7SMaryam Tahhan } sock_map_rx SEC(".maps"); 364cafb92d7SMaryam Tahhan 365cafb92d7SMaryam TahhanThe following code snippet shows a sample parser program. 366cafb92d7SMaryam Tahhan 367cafb92d7SMaryam Tahhan.. code-block:: c 368cafb92d7SMaryam Tahhan 369cafb92d7SMaryam Tahhan SEC("sk_skb/stream_parser") 370cafb92d7SMaryam Tahhan int bpf_prog_parser(struct __sk_buff *skb) 371cafb92d7SMaryam Tahhan { 372cafb92d7SMaryam Tahhan return skb->len; 373cafb92d7SMaryam Tahhan } 374cafb92d7SMaryam Tahhan 375cafb92d7SMaryam TahhanThe following code snippet shows a simple verdict program that interacts with a 376cafb92d7SMaryam Tahhansockmap to redirect traffic to another socket based on the local port. 377cafb92d7SMaryam Tahhan 378cafb92d7SMaryam Tahhan.. code-block:: c 379cafb92d7SMaryam Tahhan 380cafb92d7SMaryam Tahhan SEC("sk_skb/stream_verdict") 381cafb92d7SMaryam Tahhan int bpf_prog_verdict(struct __sk_buff *skb) 382cafb92d7SMaryam Tahhan { 383cafb92d7SMaryam Tahhan __u32 lport = skb->local_port; 384cafb92d7SMaryam Tahhan __u32 idx = 0; 385cafb92d7SMaryam Tahhan 386cafb92d7SMaryam Tahhan if (lport == 10000) 387cafb92d7SMaryam Tahhan return bpf_sk_redirect_map(skb, &sock_map_rx, idx, 0); 388cafb92d7SMaryam Tahhan 389cafb92d7SMaryam Tahhan return SK_PASS; 390cafb92d7SMaryam Tahhan } 391cafb92d7SMaryam Tahhan 392cafb92d7SMaryam TahhanThe following code snippet shows how to declare a sockhash map. 393cafb92d7SMaryam Tahhan 394cafb92d7SMaryam Tahhan.. code-block:: c 395cafb92d7SMaryam Tahhan 396cafb92d7SMaryam Tahhan struct socket_key { 397cafb92d7SMaryam Tahhan __u32 src_ip; 398cafb92d7SMaryam Tahhan __u32 dst_ip; 399cafb92d7SMaryam Tahhan __u32 src_port; 400cafb92d7SMaryam Tahhan __u32 dst_port; 401cafb92d7SMaryam Tahhan }; 402cafb92d7SMaryam Tahhan 403cafb92d7SMaryam Tahhan struct { 404cafb92d7SMaryam Tahhan __uint(type, BPF_MAP_TYPE_SOCKHASH); 405cafb92d7SMaryam Tahhan __uint(max_entries, 1); 406cafb92d7SMaryam Tahhan __type(key, struct socket_key); 407cafb92d7SMaryam Tahhan __type(value, __u64); 408cafb92d7SMaryam Tahhan } sock_hash_rx SEC(".maps"); 409cafb92d7SMaryam Tahhan 410cafb92d7SMaryam TahhanThe following code snippet shows a simple verdict program that interacts with a 411cafb92d7SMaryam Tahhansockhash to redirect traffic to another socket based on a hash of some of the 412cafb92d7SMaryam Tahhanskb parameters. 413cafb92d7SMaryam Tahhan 414cafb92d7SMaryam Tahhan.. code-block:: c 415cafb92d7SMaryam Tahhan 416cafb92d7SMaryam Tahhan static inline 417cafb92d7SMaryam Tahhan void extract_socket_key(struct __sk_buff *skb, struct socket_key *key) 418cafb92d7SMaryam Tahhan { 419cafb92d7SMaryam Tahhan key->src_ip = skb->remote_ip4; 420cafb92d7SMaryam Tahhan key->dst_ip = skb->local_ip4; 421cafb92d7SMaryam Tahhan key->src_port = skb->remote_port >> 16; 422cafb92d7SMaryam Tahhan key->dst_port = (bpf_htonl(skb->local_port)) >> 16; 423cafb92d7SMaryam Tahhan } 424cafb92d7SMaryam Tahhan 425cafb92d7SMaryam Tahhan SEC("sk_skb/stream_verdict") 426cafb92d7SMaryam Tahhan int bpf_prog_verdict(struct __sk_buff *skb) 427cafb92d7SMaryam Tahhan { 428cafb92d7SMaryam Tahhan struct socket_key key; 429cafb92d7SMaryam Tahhan 430cafb92d7SMaryam Tahhan extract_socket_key(skb, &key); 431cafb92d7SMaryam Tahhan 432cafb92d7SMaryam Tahhan return bpf_sk_redirect_hash(skb, &sock_hash_rx, &key, 0); 433cafb92d7SMaryam Tahhan } 434cafb92d7SMaryam Tahhan 435cafb92d7SMaryam TahhanUser space 436cafb92d7SMaryam Tahhan---------- 437cafb92d7SMaryam TahhanSeveral examples of the use of sockmap APIs can be found in: 438cafb92d7SMaryam Tahhan 439cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`_ 440cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/test_sockmap.c`_ 441cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/test_maps.c`_ 442cafb92d7SMaryam Tahhan 443cafb92d7SMaryam TahhanThe following code sample shows how to create a sockmap, attach a parser and 444cafb92d7SMaryam Tahhanverdict program, as well as add a socket entry. 445cafb92d7SMaryam Tahhan 446cafb92d7SMaryam Tahhan.. code-block:: c 447cafb92d7SMaryam Tahhan 448cafb92d7SMaryam Tahhan int create_sample_sockmap(int sock, int parse_prog_fd, int verdict_prog_fd) 449cafb92d7SMaryam Tahhan { 450cafb92d7SMaryam Tahhan int index = 0; 451cafb92d7SMaryam Tahhan int map, err; 452cafb92d7SMaryam Tahhan 453cafb92d7SMaryam Tahhan map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), sizeof(int), 1, NULL); 454cafb92d7SMaryam Tahhan if (map < 0) { 455cafb92d7SMaryam Tahhan fprintf(stderr, "Failed to create sockmap: %s\n", strerror(errno)); 456cafb92d7SMaryam Tahhan return -1; 457cafb92d7SMaryam Tahhan } 458cafb92d7SMaryam Tahhan 459cafb92d7SMaryam Tahhan err = bpf_prog_attach(parse_prog_fd, map, BPF_SK_SKB_STREAM_PARSER, 0); 460cafb92d7SMaryam Tahhan if (err){ 461cafb92d7SMaryam Tahhan fprintf(stderr, "Failed to attach_parser_prog_to_map: %s\n", strerror(errno)); 462cafb92d7SMaryam Tahhan goto out; 463cafb92d7SMaryam Tahhan } 464cafb92d7SMaryam Tahhan 465cafb92d7SMaryam Tahhan err = bpf_prog_attach(verdict_prog_fd, map, BPF_SK_SKB_STREAM_VERDICT, 0); 466cafb92d7SMaryam Tahhan if (err){ 467cafb92d7SMaryam Tahhan fprintf(stderr, "Failed to attach_verdict_prog_to_map: %s\n", strerror(errno)); 468cafb92d7SMaryam Tahhan goto out; 469cafb92d7SMaryam Tahhan } 470cafb92d7SMaryam Tahhan 471cafb92d7SMaryam Tahhan err = bpf_map_update_elem(map, &index, &sock, BPF_NOEXIST); 472cafb92d7SMaryam Tahhan if (err) { 473cafb92d7SMaryam Tahhan fprintf(stderr, "Failed to update sockmap: %s\n", strerror(errno)); 474cafb92d7SMaryam Tahhan goto out; 475cafb92d7SMaryam Tahhan } 476cafb92d7SMaryam Tahhan 477cafb92d7SMaryam Tahhan out: 478cafb92d7SMaryam Tahhan close(map); 479cafb92d7SMaryam Tahhan return err; 480cafb92d7SMaryam Tahhan } 481cafb92d7SMaryam Tahhan 482cafb92d7SMaryam TahhanReferences 483cafb92d7SMaryam Tahhan=========== 484cafb92d7SMaryam Tahhan 485cafb92d7SMaryam Tahhan- https://github.com/jrfastab/linux-kernel-xdp/commit/c89fd73cb9d2d7f3c716c3e00836f07b1aeb261f 486cafb92d7SMaryam Tahhan- https://lwn.net/Articles/731133/ 487cafb92d7SMaryam Tahhan- http://vger.kernel.org/lpc_net2018_talks/ktls_bpf_paper.pdf 488cafb92d7SMaryam Tahhan- https://lwn.net/Articles/748628/ 489cafb92d7SMaryam Tahhan- https://lore.kernel.org/bpf/20200218171023.844439-7-jakub@cloudflare.com/ 490cafb92d7SMaryam Tahhan 491cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_kern.h`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_kern.h 492cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_parse_prog.c 493cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c 494cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c 495cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/test_sockmap.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_sockmap.c 496cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/test_maps.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_maps.c 497cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_listen.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_listen.c 498cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_update.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_update.c 499