xref: /linux/Documentation/bpf/kfuncs.rst (revision 1c07425e902cd3137961c3d45b4271bf8a9b8eb9)
1=============================
2BPF Kernel Functions (kfuncs)
3=============================
4
51. Introduction
6===============
7
8BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
9kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
10kfuncs do not have a stable interface and can change from one kernel release to
11another. Hence, BPF programs need to be updated in response to changes in the
12kernel.
13
142. Defining a kfunc
15===================
16
17There are two ways to expose a kernel function to BPF programs, either make an
18existing function in the kernel visible, or add a new wrapper for BPF. In both
19cases, care must be taken that BPF program can only call such function in a
20valid context. To enforce this, visibility of a kfunc can be per program type.
21
22If you are not creating a BPF wrapper for existing kernel function, skip ahead
23to :ref:`BPF_kfunc_nodef`.
24
252.1 Creating a wrapper kfunc
26----------------------------
27
28When defining a wrapper kfunc, the wrapper function should have extern linkage.
29This prevents the compiler from optimizing away dead code, as this wrapper kfunc
30is not invoked anywhere in the kernel itself. It is not necessary to provide a
31prototype in a header for the wrapper kfunc.
32
33An example is given below::
34
35        /* Disables missing prototype warnings */
36        __diag_push();
37        __diag_ignore_all("-Wmissing-prototypes",
38                          "Global kfuncs as their definitions will be in BTF");
39
40        struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
41        {
42                return find_get_task_by_vpid(nr);
43        }
44
45        __diag_pop();
46
47A wrapper kfunc is often needed when we need to annotate parameters of the
48kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
49registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
50
512.2 Annotating kfunc parameters
52-------------------------------
53
54Similar to BPF helpers, there is sometime need for additional context required
55by the verifier to make the usage of kernel functions safer and more useful.
56Hence, we can annotate a parameter by suffixing the name of the argument of the
57kfunc with a __tag, where tag may be one of the supported annotations.
58
592.2.1 __sz Annotation
60---------------------
61
62This annotation is used to indicate a memory and size pair in the argument list.
63An example is given below::
64
65        void bpf_memzero(void *mem, int mem__sz)
66        {
67        ...
68        }
69
70Here, the verifier will treat first argument as a PTR_TO_MEM, and second
71argument as its size. By default, without __sz annotation, the size of the type
72of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
73pointer.
74
752.2.2 __k Annotation
76--------------------
77
78This annotation is only understood for scalar arguments, where it indicates that
79the verifier must check the scalar argument to be a known constant, which does
80not indicate a size parameter, and the value of the constant is relevant to the
81safety of the program.
82
83An example is given below::
84
85        void *bpf_obj_new(u32 local_type_id__k, ...)
86        {
87        ...
88        }
89
90Here, bpf_obj_new uses local_type_id argument to find out the size of that type
91ID in program's BTF and return a sized pointer to it. Each type ID will have a
92distinct size, hence it is crucial to treat each such call as distinct when
93values don't match during verifier state pruning checks.
94
95Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
96size parameter, and the value of the constant matters for program safety, __k
97suffix should be used.
98
99.. _BPF_kfunc_nodef:
100
1012.3 Using an existing kernel function
102-------------------------------------
103
104When an existing function in the kernel is fit for consumption by BPF programs,
105it can be directly registered with the BPF subsystem. However, care must still
106be taken to review the context in which it will be invoked by the BPF program
107and whether it is safe to do so.
108
1092.4 Annotating kfuncs
110---------------------
111
112In addition to kfuncs' arguments, verifier may need more information about the
113type of kfunc(s) being registered with the BPF subsystem. To do so, we define
114flags on a set of kfuncs as follows::
115
116        BTF_SET8_START(bpf_task_set)
117        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
118        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
119        BTF_SET8_END(bpf_task_set)
120
121This set encodes the BTF ID of each kfunc listed above, and encodes the flags
122along with it. Ofcourse, it is also allowed to specify no flags.
123
1242.4.1 KF_ACQUIRE flag
125---------------------
126
127The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
128refcounted object. The verifier will then ensure that the pointer to the object
129is eventually released using a release kfunc, or transferred to a map using a
130referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
131loading of the BPF program until no lingering references remain in all possible
132explored states of the program.
133
1342.4.2 KF_RET_NULL flag
135----------------------
136
137The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
138may be NULL. Hence, it forces the user to do a NULL check on the pointer
139returned from the kfunc before making use of it (dereferencing or passing to
140another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
141both are orthogonal to each other.
142
1432.4.3 KF_RELEASE flag
144---------------------
145
146The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
147passed in to it. There can be only one referenced pointer that can be passed in.
148All copies of the pointer being released are invalidated as a result of invoking
149kfunc with this flag.
150
1512.4.4 KF_KPTR_GET flag
152----------------------
153
154The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
155as a pointer to kptr, safely increments the refcount of the object it points to,
156and returns a reference to the user. The rest of the arguments may be normal
157arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
158KF_ACQUIRE and KF_RET_NULL flags.
159
1602.4.5 KF_TRUSTED_ARGS flag
161--------------------------
162
163The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
164indicates that the all pointer arguments are valid, and that all pointers to
165BTF objects have been passed in their unmodified form (that is, at a zero
166offset, and without having been obtained from walking another pointer).
167
168There are two types of pointers to kernel objects which are considered "valid":
169
1701. Pointers which are passed as tracepoint or struct_ops callback arguments.
1712. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
172
173Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
174KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
175
176The definition of "valid" pointers is subject to change at any time, and has
177absolutely no ABI stability guarantees.
178
1792.4.6 KF_SLEEPABLE flag
180-----------------------
181
182The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
183be called by sleepable BPF programs (BPF_F_SLEEPABLE).
184
1852.4.7 KF_DESTRUCTIVE flag
186--------------------------
187
188The KF_DESTRUCTIVE flag is used to indicate functions calling which is
189destructive to the system. For example such a call can result in system
190rebooting or panicking. Due to this additional restrictions apply to these
191calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
192added later.
193
1942.4.8 KF_RCU flag
195-----------------
196
197The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
198When used together with KF_ACQUIRE, it indicates the kfunc should have a
199single argument which must be a trusted argument or a MEM_RCU pointer.
200The argument may have reference count of 0 and the kfunc must take this
201into consideration.
202
2032.5 Registering the kfuncs
204--------------------------
205
206Once the kfunc is prepared for use, the final step to making it visible is
207registering it with the BPF subsystem. Registration is done per BPF program
208type. An example is shown below::
209
210        BTF_SET8_START(bpf_task_set)
211        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
212        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
213        BTF_SET8_END(bpf_task_set)
214
215        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
216                .owner = THIS_MODULE,
217                .set   = &bpf_task_set,
218        };
219
220        static int init_subsystem(void)
221        {
222                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
223        }
224        late_initcall(init_subsystem);
225
2263. Core kfuncs
227==============
228
229The BPF subsystem provides a number of "core" kfuncs that are potentially
230applicable to a wide variety of different possible use cases and programs.
231Those kfuncs are documented here.
232
2333.1 struct task_struct * kfuncs
234-------------------------------
235
236There are a number of kfuncs that allow ``struct task_struct *`` objects to be
237used as kptrs:
238
239.. kernel-doc:: kernel/bpf/helpers.c
240   :identifiers: bpf_task_acquire bpf_task_release
241
242These kfuncs are useful when you want to acquire or release a reference to a
243``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
244struct_ops callback arg. For example:
245
246.. code-block:: c
247
248	/**
249	 * A trivial example tracepoint program that shows how to
250	 * acquire and release a struct task_struct * pointer.
251	 */
252	SEC("tp_btf/task_newtask")
253	int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
254	{
255		struct task_struct *acquired;
256
257		acquired = bpf_task_acquire(task);
258
259		/*
260		 * In a typical program you'd do something like store
261		 * the task in a map, and the map will automatically
262		 * release it later. Here, we release it manually.
263		 */
264		bpf_task_release(acquired);
265		return 0;
266	}
267
268----
269
270A BPF program can also look up a task from a pid. This can be useful if the
271caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
272it can acquire a reference on with bpf_task_acquire().
273
274.. kernel-doc:: kernel/bpf/helpers.c
275   :identifiers: bpf_task_from_pid
276
277Here is an example of it being used:
278
279.. code-block:: c
280
281	SEC("tp_btf/task_newtask")
282	int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
283	{
284		struct task_struct *lookup;
285
286		lookup = bpf_task_from_pid(task->pid);
287		if (!lookup)
288			/* A task should always be found, as %task is a tracepoint arg. */
289			return -ENOENT;
290
291		if (lookup->pid != task->pid) {
292			/* bpf_task_from_pid() looks up the task via its
293			 * globally-unique pid from the init_pid_ns. Thus,
294			 * the pid of the lookup task should always be the
295			 * same as the input task.
296			 */
297			bpf_task_release(lookup);
298			return -EINVAL;
299		}
300
301		/* bpf_task_from_pid() returns an acquired reference,
302		 * so it must be dropped before returning from the
303		 * tracepoint handler.
304		 */
305		bpf_task_release(lookup);
306		return 0;
307	}
308
3093.2 struct cgroup * kfuncs
310--------------------------
311
312``struct cgroup *`` objects also have acquire and release functions:
313
314.. kernel-doc:: kernel/bpf/helpers.c
315   :identifiers: bpf_cgroup_acquire bpf_cgroup_release
316
317These kfuncs are used in exactly the same manner as bpf_task_acquire() and
318bpf_task_release() respectively, so we won't provide examples for them.
319
320----
321
322You may also acquire a reference to a ``struct cgroup`` kptr that's already
323stored in a map using bpf_cgroup_kptr_get():
324
325.. kernel-doc:: kernel/bpf/helpers.c
326   :identifiers: bpf_cgroup_kptr_get
327
328Here's an example of how it can be used:
329
330.. code-block:: c
331
332	/* struct containing the struct task_struct kptr which is actually stored in the map. */
333	struct __cgroups_kfunc_map_value {
334		struct cgroup __kptr_ref * cgroup;
335	};
336
337	/* The map containing struct __cgroups_kfunc_map_value entries. */
338	struct {
339		__uint(type, BPF_MAP_TYPE_HASH);
340		__type(key, int);
341		__type(value, struct __cgroups_kfunc_map_value);
342		__uint(max_entries, 1);
343	} __cgroups_kfunc_map SEC(".maps");
344
345	/* ... */
346
347	/**
348	 * A simple example tracepoint program showing how a
349	 * struct cgroup kptr that is stored in a map can
350	 * be acquired using the bpf_cgroup_kptr_get() kfunc.
351	 */
352	 SEC("tp_btf/cgroup_mkdir")
353	 int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
354	 {
355		struct cgroup *kptr;
356		struct __cgroups_kfunc_map_value *v;
357		s32 id = cgrp->self.id;
358
359		/* Assume a cgroup kptr was previously stored in the map. */
360		v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
361		if (!v)
362			return -ENOENT;
363
364		/* Acquire a reference to the cgroup kptr that's already stored in the map. */
365		kptr = bpf_cgroup_kptr_get(&v->cgroup);
366		if (!kptr)
367			/* If no cgroup was present in the map, it's because
368			 * we're racing with another CPU that removed it with
369			 * bpf_kptr_xchg() between the bpf_map_lookup_elem()
370			 * above, and our call to bpf_cgroup_kptr_get().
371			 * bpf_cgroup_kptr_get() internally safely handles this
372			 * race, and will return NULL if the task is no longer
373			 * present in the map by the time we invoke the kfunc.
374			 */
375			return -EBUSY;
376
377		/* Free the reference we just took above. Note that the
378		 * original struct cgroup kptr is still in the map. It will
379		 * be freed either at a later time if another context deletes
380		 * it from the map, or automatically by the BPF subsystem if
381		 * it's still present when the map is destroyed.
382		 */
383		bpf_cgroup_release(kptr);
384
385		return 0;
386        }
387
388----
389
390Another kfunc available for interacting with ``struct cgroup *`` objects is
391bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
392and return it as a cgroup kptr.
393
394.. kernel-doc:: kernel/bpf/helpers.c
395   :identifiers: bpf_cgroup_ancestor
396
397Eventually, BPF should be updated to allow this to happen with a normal memory
398load in the program itself. This is currently not possible without more work in
399the verifier. bpf_cgroup_ancestor() can be used as follows:
400
401.. code-block:: c
402
403	/**
404	 * Simple tracepoint example that illustrates how a cgroup's
405	 * ancestor can be accessed using bpf_cgroup_ancestor().
406	 */
407	SEC("tp_btf/cgroup_mkdir")
408	int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
409	{
410		struct cgroup *parent;
411
412		/* The parent cgroup resides at the level before the current cgroup's level. */
413		parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
414		if (!parent)
415			return -ENOENT;
416
417		bpf_printk("Parent id is %d", parent->self.id);
418
419		/* Return the parent cgroup that was acquired above. */
420		bpf_cgroup_release(parent);
421		return 0;
422	}
423