xref: /linux/Documentation/bpf/map_cgroup_storage.rst (revision 4f2c0a4acffbec01079c28f839422e64ddeff004)
14e15f460SYiFei Zhu.. SPDX-License-Identifier: GPL-2.0-only
24e15f460SYiFei Zhu.. Copyright (C) 2020 Google LLC.
34e15f460SYiFei Zhu
44e15f460SYiFei Zhu===========================
54e15f460SYiFei ZhuBPF_MAP_TYPE_CGROUP_STORAGE
64e15f460SYiFei Zhu===========================
74e15f460SYiFei Zhu
84e15f460SYiFei ZhuThe ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
94e15f460SYiFei Zhustorage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
104e15f460SYiFei Zhuattach to cgroups; the programs are made available by the same Kconfig. The
114e15f460SYiFei Zhustorage is identified by the cgroup the program is attached to.
124e15f460SYiFei Zhu
134e15f460SYiFei ZhuThe map provide a local storage at the cgroup that the BPF program is attached
144e15f460SYiFei Zhuto. It provides a faster and simpler access than the general purpose hash
154e15f460SYiFei Zhutable, which performs a hash table lookups, and requires user to track live
164e15f460SYiFei Zhucgroups on their own.
174e15f460SYiFei Zhu
184e15f460SYiFei ZhuThis document describes the usage and semantics of the
194e15f460SYiFei Zhu``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
204e15f460SYiFei ZhuLinux 5.9 and this document will describe the differences.
214e15f460SYiFei Zhu
224e15f460SYiFei ZhuUsage
234e15f460SYiFei Zhu=====
244e15f460SYiFei Zhu
254e15f460SYiFei ZhuThe map uses key of type of either ``__u64 cgroup_inode_id`` or
264e15f460SYiFei Zhu``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
274e15f460SYiFei Zhu
284e15f460SYiFei Zhu    struct bpf_cgroup_storage_key {
294e15f460SYiFei Zhu            __u64 cgroup_inode_id;
304e15f460SYiFei Zhu            __u32 attach_type;
314e15f460SYiFei Zhu    };
324e15f460SYiFei Zhu
334e15f460SYiFei Zhu``cgroup_inode_id`` is the inode id of the cgroup directory.
34*d2bef8e1SAkhil Raj``attach_type`` is the program's attach type.
354e15f460SYiFei Zhu
364e15f460SYiFei ZhuLinux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
374e15f460SYiFei ZhuWhen this key type is used, then all attach types of the particular cgroup and
384e15f460SYiFei Zhumap will share the same storage. Otherwise, if the type is
394e15f460SYiFei Zhu``struct bpf_cgroup_storage_key``, then programs of different attach types
404e15f460SYiFei Zhube isolated and see different storages.
414e15f460SYiFei Zhu
424e15f460SYiFei ZhuTo access the storage in a program, use ``bpf_get_local_storage``::
434e15f460SYiFei Zhu
444e15f460SYiFei Zhu    void *bpf_get_local_storage(void *map, u64 flags)
454e15f460SYiFei Zhu
464e15f460SYiFei Zhu``flags`` is reserved for future use and must be 0.
474e15f460SYiFei Zhu
484e15f460SYiFei ZhuThere is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
494e15f460SYiFei Zhucan be accessed by multiple programs across different CPUs, and user should
504e15f460SYiFei Zhutake care of synchronization by themselves. The bpf infrastructure provides
514e15f460SYiFei Zhu``struct bpf_spin_lock`` to synchronize the storage. See
524e15f460SYiFei Zhu``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
534e15f460SYiFei Zhu
544e15f460SYiFei ZhuExamples
554e15f460SYiFei Zhu========
564e15f460SYiFei Zhu
574e15f460SYiFei ZhuUsage with key type as ``struct bpf_cgroup_storage_key``::
584e15f460SYiFei Zhu
594e15f460SYiFei Zhu    #include <bpf/bpf.h>
604e15f460SYiFei Zhu
614e15f460SYiFei Zhu    struct {
624e15f460SYiFei Zhu            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
634e15f460SYiFei Zhu            __type(key, struct bpf_cgroup_storage_key);
644e15f460SYiFei Zhu            __type(value, __u32);
654e15f460SYiFei Zhu    } cgroup_storage SEC(".maps");
664e15f460SYiFei Zhu
674e15f460SYiFei Zhu    int program(struct __sk_buff *skb)
684e15f460SYiFei Zhu    {
694e15f460SYiFei Zhu            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
704e15f460SYiFei Zhu            __sync_fetch_and_add(ptr, 1);
714e15f460SYiFei Zhu
724e15f460SYiFei Zhu            return 0;
734e15f460SYiFei Zhu    }
744e15f460SYiFei Zhu
754e15f460SYiFei ZhuUserspace accessing map declared above::
764e15f460SYiFei Zhu
774e15f460SYiFei Zhu    #include <linux/bpf.h>
784e15f460SYiFei Zhu    #include <linux/libbpf.h>
794e15f460SYiFei Zhu
804e15f460SYiFei Zhu    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
814e15f460SYiFei Zhu    {
824e15f460SYiFei Zhu            struct bpf_cgroup_storage_key = {
834e15f460SYiFei Zhu                    .cgroup_inode_id = cgrp,
844e15f460SYiFei Zhu                    .attach_type = type,
854e15f460SYiFei Zhu            };
864e15f460SYiFei Zhu            __u32 value;
874e15f460SYiFei Zhu            bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
884e15f460SYiFei Zhu            // error checking omitted
894e15f460SYiFei Zhu            return value;
904e15f460SYiFei Zhu    }
914e15f460SYiFei Zhu
924e15f460SYiFei ZhuAlternatively, using just ``__u64 cgroup_inode_id`` as key type::
934e15f460SYiFei Zhu
944e15f460SYiFei Zhu    #include <bpf/bpf.h>
954e15f460SYiFei Zhu
964e15f460SYiFei Zhu    struct {
974e15f460SYiFei Zhu            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
984e15f460SYiFei Zhu            __type(key, __u64);
994e15f460SYiFei Zhu            __type(value, __u32);
1004e15f460SYiFei Zhu    } cgroup_storage SEC(".maps");
1014e15f460SYiFei Zhu
1024e15f460SYiFei Zhu    int program(struct __sk_buff *skb)
1034e15f460SYiFei Zhu    {
1044e15f460SYiFei Zhu            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
1054e15f460SYiFei Zhu            __sync_fetch_and_add(ptr, 1);
1064e15f460SYiFei Zhu
1074e15f460SYiFei Zhu            return 0;
1084e15f460SYiFei Zhu    }
1094e15f460SYiFei Zhu
1104e15f460SYiFei ZhuAnd userspace::
1114e15f460SYiFei Zhu
1124e15f460SYiFei Zhu    #include <linux/bpf.h>
1134e15f460SYiFei Zhu    #include <linux/libbpf.h>
1144e15f460SYiFei Zhu
1154e15f460SYiFei Zhu    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
1164e15f460SYiFei Zhu    {
1174e15f460SYiFei Zhu            __u32 value;
1184e15f460SYiFei Zhu            bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
1194e15f460SYiFei Zhu            // error checking omitted
1204e15f460SYiFei Zhu            return value;
1214e15f460SYiFei Zhu    }
1224e15f460SYiFei Zhu
1234e15f460SYiFei ZhuSemantics
1244e15f460SYiFei Zhu=========
1254e15f460SYiFei Zhu
1264e15f460SYiFei Zhu``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
1274e15f460SYiFei Zhuper-CPU variant will have different memory regions for each CPU for each
1284e15f460SYiFei Zhustorage. The non-per-CPU will have the same memory region for each storage.
1294e15f460SYiFei Zhu
1304e15f460SYiFei ZhuPrior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
1314e15f460SYiFei Zhufor a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
1324e15f460SYiFei Zhuthat uses the map. A program may be attached to multiple cgroups or have
1334e15f460SYiFei Zhumultiple attach types, and each attach creates a fresh zeroed storage. The
1344e15f460SYiFei Zhustorage is freed upon detach.
1354e15f460SYiFei Zhu
1364e15f460SYiFei ZhuThere is a one-to-one association between the map of each type (per-CPU and
1374e15f460SYiFei Zhunon-per-CPU) and the BPF program during load verification time. As a result,
1384e15f460SYiFei Zhueach map can only be used by one BPF program and each BPF program can only use
1394e15f460SYiFei Zhuone storage map of each type. Because of map can only be used by one BPF
1404e15f460SYiFei Zhuprogram, sharing of this cgroup's storage with other BPF programs were
1414e15f460SYiFei Zhuimpossible.
1424e15f460SYiFei Zhu
1434e15f460SYiFei ZhuSince Linux 5.9, storage can be shared by multiple programs. When a program is
1444e15f460SYiFei Zhuattached to a cgroup, the kernel would create a new storage only if the map
1454e15f460SYiFei Zhudoes not already contain an entry for the cgroup and attach type pair, or else
1464e15f460SYiFei Zhuthe old storage is reused for the new attachment. If the map is attach type
1474e15f460SYiFei Zhushared, then attach type is simply ignored during comparison. Storage is freed
1484e15f460SYiFei Zhuonly when either the map or the cgroup attached to is being freed. Detaching
1494e15f460SYiFei Zhuwill not directly free the storage, but it may cause the reference to the map
1504e15f460SYiFei Zhuto reach zero and indirectly freeing all storage in the map.
1514e15f460SYiFei Zhu
1524e15f460SYiFei ZhuThe map is not associated with any BPF program, thus making sharing possible.
1534e15f460SYiFei ZhuHowever, the BPF program can still only associate with one map of each type
1544e15f460SYiFei Zhu(per-CPU and non-per-CPU). A BPF program cannot use more than one
1554e15f460SYiFei Zhu``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
1564e15f460SYiFei Zhu``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
1574e15f460SYiFei Zhu
158*d2bef8e1SAkhil RajIn all versions, userspace may use the attach parameters of cgroup and
1594e15f460SYiFei Zhuattach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
1604e15f460SYiFei ZhuAPIs to read or update the storage for a given attachment. For Linux 5.9
1614e15f460SYiFei Zhuattach type shared storages, only the first value in the struct, cgroup inode
1624e15f460SYiFei Zhuid, is used during comparison, so userspace may just specify a ``__u64``
1634e15f460SYiFei Zhudirectly.
1644e15f460SYiFei Zhu
1654e15f460SYiFei ZhuThe storage is bound at attach time. Even if the program is attached to parent
1664e15f460SYiFei Zhuand triggers in child, the storage still belongs to the parent.
1674e15f460SYiFei Zhu
1684e15f460SYiFei ZhuUserspace cannot create a new entry in the map or delete an existing entry.
1694e15f460SYiFei ZhuProgram test runs always use a temporary storage.
170