xref: /linux/Documentation/bpf/map_sockmap.rst (revision cdd5b5a9761fd66d17586e4f4ba6588c70e640ea)
1cafb92d7SMaryam Tahhan.. SPDX-License-Identifier: GPL-2.0-only
2cafb92d7SMaryam Tahhan.. Copyright Red Hat
3cafb92d7SMaryam Tahhan
4cafb92d7SMaryam Tahhan==============================================
5cafb92d7SMaryam TahhanBPF_MAP_TYPE_SOCKMAP and BPF_MAP_TYPE_SOCKHASH
6cafb92d7SMaryam Tahhan==============================================
7cafb92d7SMaryam Tahhan
8cafb92d7SMaryam Tahhan.. note::
9cafb92d7SMaryam Tahhan   - ``BPF_MAP_TYPE_SOCKMAP`` was introduced in kernel version 4.14
10cafb92d7SMaryam Tahhan   - ``BPF_MAP_TYPE_SOCKHASH`` was introduced in kernel version 4.18
11cafb92d7SMaryam Tahhan
12cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP`` and ``BPF_MAP_TYPE_SOCKHASH`` maps can be used to
13cafb92d7SMaryam Tahhanredirect skbs between sockets or to apply policy at the socket level based on
14cafb92d7SMaryam Tahhanthe result of a BPF (verdict) program with the help of the BPF helpers
15cafb92d7SMaryam Tahhan``bpf_sk_redirect_map()``, ``bpf_sk_redirect_hash()``,
16cafb92d7SMaryam Tahhan``bpf_msg_redirect_map()`` and ``bpf_msg_redirect_hash()``.
17cafb92d7SMaryam Tahhan
18cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP`` is backed by an array that uses an integer key as the
19cafb92d7SMaryam Tahhanindex to look up a reference to a ``struct sock``. The map values are socket
20cafb92d7SMaryam Tahhandescriptors. Similarly, ``BPF_MAP_TYPE_SOCKHASH`` is a hash backed BPF map that
21cafb92d7SMaryam Tahhanholds references to sockets via their socket descriptors.
22cafb92d7SMaryam Tahhan
23cafb92d7SMaryam Tahhan.. note::
24cafb92d7SMaryam Tahhan    The value type is either __u32 or __u64; the latter (__u64) is to support
25cafb92d7SMaryam Tahhan    returning socket cookies to userspace. Returning the ``struct sock *`` that
26cafb92d7SMaryam Tahhan    the map holds to user-space is neither safe nor useful.
27cafb92d7SMaryam Tahhan
28cafb92d7SMaryam TahhanThese maps may have BPF programs attached to them, specifically a parser program
29cafb92d7SMaryam Tahhanand a verdict program. The parser program determines how much data has been
30cafb92d7SMaryam Tahhanparsed and therefore how much data needs to be queued to come to a verdict. The
31cafb92d7SMaryam Tahhanverdict program is essentially the redirect program and can return a verdict
32cafb92d7SMaryam Tahhanof ``__SK_DROP``, ``__SK_PASS``, or ``__SK_REDIRECT``.
33cafb92d7SMaryam Tahhan
34cafb92d7SMaryam TahhanWhen a socket is inserted into one of these maps, its socket callbacks are
35cafb92d7SMaryam Tahhanreplaced and a ``struct sk_psock`` is attached to it. Additionally, this
36cafb92d7SMaryam Tahhan``sk_psock`` inherits the programs that are attached to the map.
37cafb92d7SMaryam Tahhan
38cafb92d7SMaryam TahhanA sock object may be in multiple maps, but can only inherit a single
39cafb92d7SMaryam Tahhanparse or verdict program. If adding a sock object to a map would result
40cafb92d7SMaryam Tahhanin having multiple parser programs the update will return an EBUSY error.
41cafb92d7SMaryam Tahhan
42cafb92d7SMaryam TahhanThe supported programs to attach to these maps are:
43cafb92d7SMaryam Tahhan
44cafb92d7SMaryam Tahhan.. code-block:: c
45cafb92d7SMaryam Tahhan
46cafb92d7SMaryam Tahhan	struct sk_psock_progs {
47cafb92d7SMaryam Tahhan		struct bpf_prog *msg_parser;
48cafb92d7SMaryam Tahhan		struct bpf_prog *stream_parser;
49cafb92d7SMaryam Tahhan		struct bpf_prog *stream_verdict;
50cafb92d7SMaryam Tahhan		struct bpf_prog	*skb_verdict;
51cafb92d7SMaryam Tahhan	};
52cafb92d7SMaryam Tahhan
53cafb92d7SMaryam Tahhan.. note::
54cafb92d7SMaryam Tahhan    Users are not allowed to attach ``stream_verdict`` and ``skb_verdict``
55cafb92d7SMaryam Tahhan    programs to the same map.
56cafb92d7SMaryam Tahhan
57cafb92d7SMaryam TahhanThe attach types for the map programs are:
58cafb92d7SMaryam Tahhan
59cafb92d7SMaryam Tahhan- ``msg_parser`` program - ``BPF_SK_MSG_VERDICT``.
60cafb92d7SMaryam Tahhan- ``stream_parser`` program - ``BPF_SK_SKB_STREAM_PARSER``.
61cafb92d7SMaryam Tahhan- ``stream_verdict`` program - ``BPF_SK_SKB_STREAM_VERDICT``.
62cafb92d7SMaryam Tahhan- ``skb_verdict`` program - ``BPF_SK_SKB_VERDICT``.
63cafb92d7SMaryam Tahhan
64cafb92d7SMaryam TahhanThere are additional helpers available to use with the parser and verdict
65cafb92d7SMaryam Tahhanprograms: ``bpf_msg_apply_bytes()`` and ``bpf_msg_cork_bytes()``. With
66cafb92d7SMaryam Tahhan``bpf_msg_apply_bytes()`` BPF programs can tell the infrastructure how many
67cafb92d7SMaryam Tahhanbytes the given verdict should apply to. The helper ``bpf_msg_cork_bytes()``
68cafb92d7SMaryam Tahhanhandles a different case where a BPF program cannot reach a verdict on a msg
69cafb92d7SMaryam Tahhanuntil it receives more bytes AND the program doesn't want to forward the packet
70cafb92d7SMaryam Tahhanuntil it is known to be good.
71cafb92d7SMaryam Tahhan
72cafb92d7SMaryam TahhanFinally, the helpers ``bpf_msg_pull_data()`` and ``bpf_msg_push_data()`` are
73cafb92d7SMaryam Tahhanavailable to ``BPF_PROG_TYPE_SK_MSG`` BPF programs to pull in data and set the
74cafb92d7SMaryam Tahhanstart and end pointers to given values or to add metadata to the ``struct
75cafb92d7SMaryam Tahhansk_msg_buff *msg``.
76cafb92d7SMaryam Tahhan
77cafb92d7SMaryam TahhanAll these helpers will be described in more detail below.
78cafb92d7SMaryam Tahhan
79cafb92d7SMaryam TahhanUsage
80cafb92d7SMaryam Tahhan=====
81cafb92d7SMaryam TahhanKernel BPF
82cafb92d7SMaryam Tahhan----------
83cafb92d7SMaryam Tahhanbpf_msg_redirect_map()
84cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
85cafb92d7SMaryam Tahhan.. code-block:: c
86cafb92d7SMaryam Tahhan
87cafb92d7SMaryam Tahhan	long bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 key, u64 flags)
88cafb92d7SMaryam Tahhan
89cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the socket level. If
90cafb92d7SMaryam Tahhanthe message ``msg`` is allowed to pass (i.e., if the verdict BPF program
91cafb92d7SMaryam Tahhanreturns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
92cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces
93cafb92d7SMaryam Tahhancan be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used
94cafb92d7SMaryam Tahhanto select the ingress path otherwise the egress path is selected. This is the
95cafb92d7SMaryam Tahhanonly flag supported for now.
96cafb92d7SMaryam Tahhan
97cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error.
98cafb92d7SMaryam Tahhan
99cafb92d7SMaryam Tahhanbpf_sk_redirect_map()
100cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
101cafb92d7SMaryam Tahhan.. code-block:: c
102cafb92d7SMaryam Tahhan
103cafb92d7SMaryam Tahhan    long bpf_sk_redirect_map(struct sk_buff *skb, struct bpf_map *map, u32 key u64 flags)
104cafb92d7SMaryam Tahhan
105cafb92d7SMaryam TahhanRedirect the packet to the socket referenced by ``map`` (of type
106cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKMAP``) at index ``key``. Both ingress and egress interfaces
107cafb92d7SMaryam Tahhancan be used for redirection. The ``BPF_F_INGRESS`` value in ``flags`` is used
108cafb92d7SMaryam Tahhanto select the ingress path otherwise the egress path is selected. This is the
109cafb92d7SMaryam Tahhanonly flag supported for now.
110cafb92d7SMaryam Tahhan
111cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error.
112cafb92d7SMaryam Tahhan
113cafb92d7SMaryam Tahhanbpf_map_lookup_elem()
114cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
115cafb92d7SMaryam Tahhan.. code-block:: c
116cafb92d7SMaryam Tahhan
117cafb92d7SMaryam Tahhan    void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
118cafb92d7SMaryam Tahhan
119cafb92d7SMaryam Tahhansocket entries of type ``struct sock *`` can be retrieved using the
120cafb92d7SMaryam Tahhan``bpf_map_lookup_elem()`` helper.
121cafb92d7SMaryam Tahhan
122cafb92d7SMaryam Tahhanbpf_sock_map_update()
123cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
124cafb92d7SMaryam Tahhan.. code-block:: c
125cafb92d7SMaryam Tahhan
126cafb92d7SMaryam Tahhan    long bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
127cafb92d7SMaryam Tahhan
128cafb92d7SMaryam TahhanAdd an entry to, or update a ``map`` referencing sockets. The ``skops`` is used
129cafb92d7SMaryam Tahhanas a new value for the entry associated to ``key``. The ``flags`` argument can
130cafb92d7SMaryam Tahhanbe one of the following:
131cafb92d7SMaryam Tahhan
132cafb92d7SMaryam Tahhan- ``BPF_ANY``: Create a new element or update an existing element.
133cafb92d7SMaryam Tahhan- ``BPF_NOEXIST``: Create a new element only if it did not exist.
134cafb92d7SMaryam Tahhan- ``BPF_EXIST``: Update an existing element.
135cafb92d7SMaryam Tahhan
136cafb92d7SMaryam TahhanIf the ``map`` has BPF programs (parser and verdict), those will be inherited
137cafb92d7SMaryam Tahhanby the socket being added. If the socket is already attached to BPF programs,
138cafb92d7SMaryam Tahhanthis results in an error.
139cafb92d7SMaryam Tahhan
140cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure.
141cafb92d7SMaryam Tahhan
142cafb92d7SMaryam Tahhanbpf_sock_hash_update()
143cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
144cafb92d7SMaryam Tahhan.. code-block:: c
145cafb92d7SMaryam Tahhan
146cafb92d7SMaryam Tahhan    long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
147cafb92d7SMaryam Tahhan
148cafb92d7SMaryam TahhanAdd an entry to, or update a sockhash ``map`` referencing sockets. The ``skops``
149cafb92d7SMaryam Tahhanis used as a new value for the entry associated to ``key``.
150cafb92d7SMaryam Tahhan
151cafb92d7SMaryam TahhanThe ``flags`` argument can be one of the following:
152cafb92d7SMaryam Tahhan
153cafb92d7SMaryam Tahhan- ``BPF_ANY``: Create a new element or update an existing element.
154cafb92d7SMaryam Tahhan- ``BPF_NOEXIST``: Create a new element only if it did not exist.
155cafb92d7SMaryam Tahhan- ``BPF_EXIST``: Update an existing element.
156cafb92d7SMaryam Tahhan
157cafb92d7SMaryam TahhanIf the ``map`` has BPF programs (parser and verdict), those will be inherited
158cafb92d7SMaryam Tahhanby the socket being added. If the socket is already attached to BPF programs,
159cafb92d7SMaryam Tahhanthis results in an error.
160cafb92d7SMaryam Tahhan
161cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure.
162cafb92d7SMaryam Tahhan
163cafb92d7SMaryam Tahhanbpf_msg_redirect_hash()
164cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^^
165cafb92d7SMaryam Tahhan.. code-block:: c
166cafb92d7SMaryam Tahhan
167cafb92d7SMaryam Tahhan    long bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags)
168cafb92d7SMaryam Tahhan
169cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the socket level. If
170cafb92d7SMaryam Tahhanthe message ``msg`` is allowed to pass (i.e., if the verdict BPF program returns
171cafb92d7SMaryam Tahhan``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
172cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress
173cafb92d7SMaryam Tahhaninterfaces can be used for redirection. The ``BPF_F_INGRESS`` value in
174cafb92d7SMaryam Tahhan``flags`` is used to select the ingress path otherwise the egress path is
175cafb92d7SMaryam Tahhanselected. This is the only flag supported for now.
176cafb92d7SMaryam Tahhan
177cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error.
178cafb92d7SMaryam Tahhan
179cafb92d7SMaryam Tahhanbpf_sk_redirect_hash()
180cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
181cafb92d7SMaryam Tahhan.. code-block:: c
182cafb92d7SMaryam Tahhan
183cafb92d7SMaryam Tahhan    long bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags)
184cafb92d7SMaryam Tahhan
185cafb92d7SMaryam TahhanThis helper is used in programs implementing policies at the skb socket level.
186cafb92d7SMaryam TahhanIf the sk_buff ``skb`` is allowed to pass (i.e., if the verdict BPF program
187cafb92d7SMaryam Tahhanreturns ``SK_PASS``), redirect it to the socket referenced by ``map`` (of type
188cafb92d7SMaryam Tahhan``BPF_MAP_TYPE_SOCKHASH``) using hash ``key``. Both ingress and egress
189cafb92d7SMaryam Tahhaninterfaces can be used for redirection. The ``BPF_F_INGRESS`` value in
190cafb92d7SMaryam Tahhan``flags`` is used to select the ingress path otherwise the egress path is
191cafb92d7SMaryam Tahhanselected. This is the only flag supported for now.
192cafb92d7SMaryam Tahhan
193cafb92d7SMaryam TahhanReturns ``SK_PASS`` on success, or ``SK_DROP`` on error.
194cafb92d7SMaryam Tahhan
195cafb92d7SMaryam Tahhanbpf_msg_apply_bytes()
196cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
197cafb92d7SMaryam Tahhan.. code-block:: c
198cafb92d7SMaryam Tahhan
199cafb92d7SMaryam Tahhan    long bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
200cafb92d7SMaryam Tahhan
201cafb92d7SMaryam TahhanFor socket policies, apply the verdict of the BPF program to the next (number
202cafb92d7SMaryam Tahhanof ``bytes``) of message ``msg``. For example, this helper can be used in the
203cafb92d7SMaryam Tahhanfollowing cases:
204cafb92d7SMaryam Tahhan
205cafb92d7SMaryam Tahhan- A single ``sendmsg()`` or ``sendfile()`` system call contains multiple
206cafb92d7SMaryam Tahhan  logical messages that the BPF program is supposed to read and for which it
207cafb92d7SMaryam Tahhan  should apply a verdict.
208cafb92d7SMaryam Tahhan- A BPF program only cares to read the first ``bytes`` of a ``msg``. If the
209cafb92d7SMaryam Tahhan  message has a large payload, then setting up and calling the BPF program
210cafb92d7SMaryam Tahhan  repeatedly for all bytes, even though the verdict is already known, would
211cafb92d7SMaryam Tahhan  create unnecessary overhead.
212cafb92d7SMaryam Tahhan
213cafb92d7SMaryam TahhanReturns 0
214cafb92d7SMaryam Tahhan
215cafb92d7SMaryam Tahhanbpf_msg_cork_bytes()
216cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
217cafb92d7SMaryam Tahhan.. code-block:: c
218cafb92d7SMaryam Tahhan
219cafb92d7SMaryam Tahhan    long bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
220cafb92d7SMaryam Tahhan
221cafb92d7SMaryam TahhanFor socket policies, prevent the execution of the verdict BPF program for
222cafb92d7SMaryam Tahhanmessage ``msg`` until the number of ``bytes`` have been accumulated.
223cafb92d7SMaryam Tahhan
224cafb92d7SMaryam TahhanThis can be used when one needs a specific number of bytes before a verdict can
225cafb92d7SMaryam Tahhanbe assigned, even if the data spans multiple ``sendmsg()`` or ``sendfile()``
226cafb92d7SMaryam Tahhancalls.
227cafb92d7SMaryam Tahhan
228cafb92d7SMaryam TahhanReturns 0
229cafb92d7SMaryam Tahhan
230cafb92d7SMaryam Tahhanbpf_msg_pull_data()
231cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
232cafb92d7SMaryam Tahhan.. code-block:: c
233cafb92d7SMaryam Tahhan
234cafb92d7SMaryam Tahhan    long bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64 flags)
235cafb92d7SMaryam Tahhan
236cafb92d7SMaryam TahhanFor socket policies, pull in non-linear data from user space for ``msg`` and set
237cafb92d7SMaryam Tahhanpointers ``msg->data`` and ``msg->data_end`` to ``start`` and ``end`` bytes
238cafb92d7SMaryam Tahhanoffsets into ``msg``, respectively.
239cafb92d7SMaryam Tahhan
240cafb92d7SMaryam TahhanIf a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only
241cafb92d7SMaryam Tahhanparse data that the (``data``, ``data_end``) pointers have already consumed.
242cafb92d7SMaryam TahhanFor ``sendmsg()`` hooks this is likely the first scatterlist element. But for
243*dc97391eSDavid Howellscalls relying on MSG_SPLICE_PAGES (e.g., ``sendfile()``) this will be the
244*dc97391eSDavid Howellsrange (**0**, **0**) because the data is shared with user space and by default
245*dc97391eSDavid Howellsthe objective is to avoid allowing user space to modify data while (or after)
246*dc97391eSDavid HowellsBPF verdict is being decided. This helper can be used to pull in data and to
247*dc97391eSDavid Howellsset the start and end pointers to given values. Data will be copied if
248cafb92d7SMaryam Tahhannecessary (i.e., if data was not linear and if start and end pointers do not
249cafb92d7SMaryam Tahhanpoint to the same chunk).
250cafb92d7SMaryam Tahhan
251cafb92d7SMaryam TahhanA call to this helper is susceptible to change the underlying packet buffer.
252cafb92d7SMaryam TahhanTherefore, at load time, all checks on pointers previously done by the verifier
253cafb92d7SMaryam Tahhanare invalidated and must be performed again, if the helper is used in
254cafb92d7SMaryam Tahhancombination with direct packet access.
255cafb92d7SMaryam Tahhan
256cafb92d7SMaryam TahhanAll values for ``flags`` are reserved for future usage, and must be left at
257cafb92d7SMaryam Tahhanzero.
258cafb92d7SMaryam Tahhan
259cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure.
260cafb92d7SMaryam Tahhan
261cafb92d7SMaryam Tahhanbpf_map_lookup_elem()
262cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
263cafb92d7SMaryam Tahhan
264cafb92d7SMaryam Tahhan.. code-block:: c
265cafb92d7SMaryam Tahhan
266cafb92d7SMaryam Tahhan	void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
267cafb92d7SMaryam Tahhan
268cafb92d7SMaryam TahhanLook up a socket entry in the sockmap or sockhash map.
269cafb92d7SMaryam Tahhan
270cafb92d7SMaryam TahhanReturns the socket entry associated to ``key``, or NULL if no entry was found.
271cafb92d7SMaryam Tahhan
272cafb92d7SMaryam Tahhanbpf_map_update_elem()
273cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
274cafb92d7SMaryam Tahhan.. code-block:: c
275cafb92d7SMaryam Tahhan
276cafb92d7SMaryam Tahhan	long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
277cafb92d7SMaryam Tahhan
278cafb92d7SMaryam TahhanAdd or update a socket entry in a sockmap or sockhash.
279cafb92d7SMaryam Tahhan
280cafb92d7SMaryam TahhanThe flags argument can be one of the following:
281cafb92d7SMaryam Tahhan
282cafb92d7SMaryam Tahhan- BPF_ANY: Create a new element or update an existing element.
283cafb92d7SMaryam Tahhan- BPF_NOEXIST: Create a new element only if it did not exist.
284cafb92d7SMaryam Tahhan- BPF_EXIST: Update an existing element.
285cafb92d7SMaryam Tahhan
286cafb92d7SMaryam TahhanReturns 0 on success, or a negative error in case of failure.
287cafb92d7SMaryam Tahhan
288cafb92d7SMaryam Tahhanbpf_map_delete_elem()
289cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^^
290cafb92d7SMaryam Tahhan.. code-block:: c
291cafb92d7SMaryam Tahhan
292cafb92d7SMaryam Tahhan    long bpf_map_delete_elem(struct bpf_map *map, const void *key)
293cafb92d7SMaryam Tahhan
294cafb92d7SMaryam TahhanDelete a socket entry from a sockmap or a sockhash.
295cafb92d7SMaryam Tahhan
296cafb92d7SMaryam TahhanReturns	0 on success, or a negative error in case of failure.
297cafb92d7SMaryam Tahhan
298cafb92d7SMaryam TahhanUser space
299cafb92d7SMaryam Tahhan----------
300cafb92d7SMaryam Tahhanbpf_map_update_elem()
301cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
302cafb92d7SMaryam Tahhan.. code-block:: c
303cafb92d7SMaryam Tahhan
304cafb92d7SMaryam Tahhan	int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
305cafb92d7SMaryam Tahhan
306cafb92d7SMaryam TahhanSockmap entries can be added or updated using the ``bpf_map_update_elem()``
307cafb92d7SMaryam Tahhanfunction. The ``key`` parameter is the index value of the sockmap array. And the
308cafb92d7SMaryam Tahhan``value`` parameter is the FD value of that socket.
309cafb92d7SMaryam Tahhan
310cafb92d7SMaryam TahhanUnder the hood, the sockmap update function uses the socket FD value to
311cafb92d7SMaryam Tahhanretrieve the associated socket and its attached psock.
312cafb92d7SMaryam Tahhan
313cafb92d7SMaryam TahhanThe flags argument can be one of the following:
314cafb92d7SMaryam Tahhan
315cafb92d7SMaryam Tahhan- BPF_ANY: Create a new element or update an existing element.
316cafb92d7SMaryam Tahhan- BPF_NOEXIST: Create a new element only if it did not exist.
317cafb92d7SMaryam Tahhan- BPF_EXIST: Update an existing element.
318cafb92d7SMaryam Tahhan
319cafb92d7SMaryam Tahhanbpf_map_lookup_elem()
320cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
321cafb92d7SMaryam Tahhan.. code-block:: c
322cafb92d7SMaryam Tahhan
323cafb92d7SMaryam Tahhan    int bpf_map_lookup_elem(int fd, const void *key, void *value)
324cafb92d7SMaryam Tahhan
325cafb92d7SMaryam TahhanSockmap entries can be retrieved using the ``bpf_map_lookup_elem()`` function.
326cafb92d7SMaryam Tahhan
327cafb92d7SMaryam Tahhan.. note::
328cafb92d7SMaryam Tahhan	The entry returned is a socket cookie rather than a socket itself.
329cafb92d7SMaryam Tahhan
330cafb92d7SMaryam Tahhanbpf_map_delete_elem()
331cafb92d7SMaryam Tahhan^^^^^^^^^^^^^^^^^^^^^
332cafb92d7SMaryam Tahhan.. code-block:: c
333cafb92d7SMaryam Tahhan
334cafb92d7SMaryam Tahhan    int bpf_map_delete_elem(int fd, const void *key)
335cafb92d7SMaryam Tahhan
336cafb92d7SMaryam TahhanSockmap entries can be deleted using the ``bpf_map_delete_elem()``
337cafb92d7SMaryam Tahhanfunction.
338cafb92d7SMaryam Tahhan
339cafb92d7SMaryam TahhanReturns 0 on success, or negative error in case of failure.
340cafb92d7SMaryam Tahhan
341cafb92d7SMaryam TahhanExamples
342cafb92d7SMaryam Tahhan========
343cafb92d7SMaryam Tahhan
344cafb92d7SMaryam TahhanKernel BPF
345cafb92d7SMaryam Tahhan----------
346cafb92d7SMaryam TahhanSeveral examples of the use of sockmap APIs can be found in:
347cafb92d7SMaryam Tahhan
348cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_kern.h`_
349cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`_
350cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`_
351cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_listen.c`_
352cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/progs/test_sockmap_update.c`_
353cafb92d7SMaryam Tahhan
354cafb92d7SMaryam TahhanThe following code snippet shows how to declare a sockmap.
355cafb92d7SMaryam Tahhan
356cafb92d7SMaryam Tahhan.. code-block:: c
357cafb92d7SMaryam Tahhan
358cafb92d7SMaryam Tahhan	struct {
359cafb92d7SMaryam Tahhan		__uint(type, BPF_MAP_TYPE_SOCKMAP);
360cafb92d7SMaryam Tahhan		__uint(max_entries, 1);
361cafb92d7SMaryam Tahhan		__type(key, __u32);
362cafb92d7SMaryam Tahhan		__type(value, __u64);
363cafb92d7SMaryam Tahhan	} sock_map_rx SEC(".maps");
364cafb92d7SMaryam Tahhan
365cafb92d7SMaryam TahhanThe following code snippet shows a sample parser program.
366cafb92d7SMaryam Tahhan
367cafb92d7SMaryam Tahhan.. code-block:: c
368cafb92d7SMaryam Tahhan
369cafb92d7SMaryam Tahhan	SEC("sk_skb/stream_parser")
370cafb92d7SMaryam Tahhan	int bpf_prog_parser(struct __sk_buff *skb)
371cafb92d7SMaryam Tahhan	{
372cafb92d7SMaryam Tahhan		return skb->len;
373cafb92d7SMaryam Tahhan	}
374cafb92d7SMaryam Tahhan
375cafb92d7SMaryam TahhanThe following code snippet shows a simple verdict program that interacts with a
376cafb92d7SMaryam Tahhansockmap to redirect traffic to another socket based on the local port.
377cafb92d7SMaryam Tahhan
378cafb92d7SMaryam Tahhan.. code-block:: c
379cafb92d7SMaryam Tahhan
380cafb92d7SMaryam Tahhan	SEC("sk_skb/stream_verdict")
381cafb92d7SMaryam Tahhan	int bpf_prog_verdict(struct __sk_buff *skb)
382cafb92d7SMaryam Tahhan	{
383cafb92d7SMaryam Tahhan		__u32 lport = skb->local_port;
384cafb92d7SMaryam Tahhan		__u32 idx = 0;
385cafb92d7SMaryam Tahhan
386cafb92d7SMaryam Tahhan		if (lport == 10000)
387cafb92d7SMaryam Tahhan			return bpf_sk_redirect_map(skb, &sock_map_rx, idx, 0);
388cafb92d7SMaryam Tahhan
389cafb92d7SMaryam Tahhan		return SK_PASS;
390cafb92d7SMaryam Tahhan	}
391cafb92d7SMaryam Tahhan
392cafb92d7SMaryam TahhanThe following code snippet shows how to declare a sockhash map.
393cafb92d7SMaryam Tahhan
394cafb92d7SMaryam Tahhan.. code-block:: c
395cafb92d7SMaryam Tahhan
396cafb92d7SMaryam Tahhan	struct socket_key {
397cafb92d7SMaryam Tahhan		__u32 src_ip;
398cafb92d7SMaryam Tahhan		__u32 dst_ip;
399cafb92d7SMaryam Tahhan		__u32 src_port;
400cafb92d7SMaryam Tahhan		__u32 dst_port;
401cafb92d7SMaryam Tahhan	};
402cafb92d7SMaryam Tahhan
403cafb92d7SMaryam Tahhan	struct {
404cafb92d7SMaryam Tahhan		__uint(type, BPF_MAP_TYPE_SOCKHASH);
405cafb92d7SMaryam Tahhan		__uint(max_entries, 1);
406cafb92d7SMaryam Tahhan		__type(key, struct socket_key);
407cafb92d7SMaryam Tahhan		__type(value, __u64);
408cafb92d7SMaryam Tahhan	} sock_hash_rx SEC(".maps");
409cafb92d7SMaryam Tahhan
410cafb92d7SMaryam TahhanThe following code snippet shows a simple verdict program that interacts with a
411cafb92d7SMaryam Tahhansockhash to redirect traffic to another socket based on a hash of some of the
412cafb92d7SMaryam Tahhanskb parameters.
413cafb92d7SMaryam Tahhan
414cafb92d7SMaryam Tahhan.. code-block:: c
415cafb92d7SMaryam Tahhan
416cafb92d7SMaryam Tahhan	static inline
417cafb92d7SMaryam Tahhan	void extract_socket_key(struct __sk_buff *skb, struct socket_key *key)
418cafb92d7SMaryam Tahhan	{
419cafb92d7SMaryam Tahhan		key->src_ip = skb->remote_ip4;
420cafb92d7SMaryam Tahhan		key->dst_ip = skb->local_ip4;
421cafb92d7SMaryam Tahhan		key->src_port = skb->remote_port >> 16;
422cafb92d7SMaryam Tahhan		key->dst_port = (bpf_htonl(skb->local_port)) >> 16;
423cafb92d7SMaryam Tahhan	}
424cafb92d7SMaryam Tahhan
425cafb92d7SMaryam Tahhan	SEC("sk_skb/stream_verdict")
426cafb92d7SMaryam Tahhan	int bpf_prog_verdict(struct __sk_buff *skb)
427cafb92d7SMaryam Tahhan	{
428cafb92d7SMaryam Tahhan		struct socket_key key;
429cafb92d7SMaryam Tahhan
430cafb92d7SMaryam Tahhan		extract_socket_key(skb, &key);
431cafb92d7SMaryam Tahhan
432cafb92d7SMaryam Tahhan		return bpf_sk_redirect_hash(skb, &sock_hash_rx, &key, 0);
433cafb92d7SMaryam Tahhan	}
434cafb92d7SMaryam Tahhan
435cafb92d7SMaryam TahhanUser space
436cafb92d7SMaryam Tahhan----------
437cafb92d7SMaryam TahhanSeveral examples of the use of sockmap APIs can be found in:
438cafb92d7SMaryam Tahhan
439cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`_
440cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/test_sockmap.c`_
441cafb92d7SMaryam Tahhan- `tools/testing/selftests/bpf/test_maps.c`_
442cafb92d7SMaryam Tahhan
443cafb92d7SMaryam TahhanThe following code sample shows how to create a sockmap, attach a parser and
444cafb92d7SMaryam Tahhanverdict program, as well as add a socket entry.
445cafb92d7SMaryam Tahhan
446cafb92d7SMaryam Tahhan.. code-block:: c
447cafb92d7SMaryam Tahhan
448cafb92d7SMaryam Tahhan	int create_sample_sockmap(int sock, int parse_prog_fd, int verdict_prog_fd)
449cafb92d7SMaryam Tahhan	{
450cafb92d7SMaryam Tahhan		int index = 0;
451cafb92d7SMaryam Tahhan		int map, err;
452cafb92d7SMaryam Tahhan
453cafb92d7SMaryam Tahhan		map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), sizeof(int), 1, NULL);
454cafb92d7SMaryam Tahhan		if (map < 0) {
455cafb92d7SMaryam Tahhan			fprintf(stderr, "Failed to create sockmap: %s\n", strerror(errno));
456cafb92d7SMaryam Tahhan			return -1;
457cafb92d7SMaryam Tahhan		}
458cafb92d7SMaryam Tahhan
459cafb92d7SMaryam Tahhan		err = bpf_prog_attach(parse_prog_fd, map, BPF_SK_SKB_STREAM_PARSER, 0);
460cafb92d7SMaryam Tahhan		if (err){
461cafb92d7SMaryam Tahhan			fprintf(stderr, "Failed to attach_parser_prog_to_map: %s\n", strerror(errno));
462cafb92d7SMaryam Tahhan			goto out;
463cafb92d7SMaryam Tahhan		}
464cafb92d7SMaryam Tahhan
465cafb92d7SMaryam Tahhan		err = bpf_prog_attach(verdict_prog_fd, map, BPF_SK_SKB_STREAM_VERDICT, 0);
466cafb92d7SMaryam Tahhan		if (err){
467cafb92d7SMaryam Tahhan			fprintf(stderr, "Failed to attach_verdict_prog_to_map: %s\n", strerror(errno));
468cafb92d7SMaryam Tahhan			goto out;
469cafb92d7SMaryam Tahhan		}
470cafb92d7SMaryam Tahhan
471cafb92d7SMaryam Tahhan		err = bpf_map_update_elem(map, &index, &sock, BPF_NOEXIST);
472cafb92d7SMaryam Tahhan		if (err) {
473cafb92d7SMaryam Tahhan			fprintf(stderr, "Failed to update sockmap: %s\n", strerror(errno));
474cafb92d7SMaryam Tahhan			goto out;
475cafb92d7SMaryam Tahhan		}
476cafb92d7SMaryam Tahhan
477cafb92d7SMaryam Tahhan	out:
478cafb92d7SMaryam Tahhan		close(map);
479cafb92d7SMaryam Tahhan		return err;
480cafb92d7SMaryam Tahhan	}
481cafb92d7SMaryam Tahhan
482cafb92d7SMaryam TahhanReferences
483cafb92d7SMaryam Tahhan===========
484cafb92d7SMaryam Tahhan
485cafb92d7SMaryam Tahhan- https://github.com/jrfastab/linux-kernel-xdp/commit/c89fd73cb9d2d7f3c716c3e00836f07b1aeb261f
486cafb92d7SMaryam Tahhan- https://lwn.net/Articles/731133/
487cafb92d7SMaryam Tahhan- http://vger.kernel.org/lpc_net2018_talks/ktls_bpf_paper.pdf
488cafb92d7SMaryam Tahhan- https://lwn.net/Articles/748628/
489cafb92d7SMaryam Tahhan- https://lore.kernel.org/bpf/20200218171023.844439-7-jakub@cloudflare.com/
490cafb92d7SMaryam Tahhan
491cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_kern.h`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
492cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/sockmap_parse_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_parse_prog.c
493cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c
494cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/prog_tests/sockmap_basic.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
495cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/test_sockmap.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_sockmap.c
496cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/test_maps.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_maps.c
497cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_listen.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_listen.c
498cafb92d7SMaryam Tahhan.. _`tools/testing/selftests/bpf/progs/test_sockmap_update.c`: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/test_sockmap_update.c
499