14e15f460SYiFei Zhu.. SPDX-License-Identifier: GPL-2.0-only 24e15f460SYiFei Zhu.. Copyright (C) 2020 Google LLC. 34e15f460SYiFei Zhu 44e15f460SYiFei Zhu=========================== 54e15f460SYiFei ZhuBPF_MAP_TYPE_CGROUP_STORAGE 64e15f460SYiFei Zhu=========================== 74e15f460SYiFei Zhu 84e15f460SYiFei ZhuThe ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized 94e15f460SYiFei Zhustorage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that 104e15f460SYiFei Zhuattach to cgroups; the programs are made available by the same Kconfig. The 114e15f460SYiFei Zhustorage is identified by the cgroup the program is attached to. 124e15f460SYiFei Zhu 134e15f460SYiFei ZhuThe map provide a local storage at the cgroup that the BPF program is attached 144e15f460SYiFei Zhuto. It provides a faster and simpler access than the general purpose hash 154e15f460SYiFei Zhutable, which performs a hash table lookups, and requires user to track live 164e15f460SYiFei Zhucgroups on their own. 174e15f460SYiFei Zhu 184e15f460SYiFei ZhuThis document describes the usage and semantics of the 194e15f460SYiFei Zhu``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in 204e15f460SYiFei ZhuLinux 5.9 and this document will describe the differences. 214e15f460SYiFei Zhu 224e15f460SYiFei ZhuUsage 234e15f460SYiFei Zhu===== 244e15f460SYiFei Zhu 254e15f460SYiFei ZhuThe map uses key of type of either ``__u64 cgroup_inode_id`` or 264e15f460SYiFei Zhu``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: 274e15f460SYiFei Zhu 284e15f460SYiFei Zhu struct bpf_cgroup_storage_key { 294e15f460SYiFei Zhu __u64 cgroup_inode_id; 304e15f460SYiFei Zhu __u32 attach_type; 314e15f460SYiFei Zhu }; 324e15f460SYiFei Zhu 334e15f460SYiFei Zhu``cgroup_inode_id`` is the inode id of the cgroup directory. 34*d2bef8e1SAkhil Raj``attach_type`` is the program's attach type. 354e15f460SYiFei Zhu 364e15f460SYiFei ZhuLinux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. 374e15f460SYiFei ZhuWhen this key type is used, then all attach types of the particular cgroup and 384e15f460SYiFei Zhumap will share the same storage. Otherwise, if the type is 394e15f460SYiFei Zhu``struct bpf_cgroup_storage_key``, then programs of different attach types 404e15f460SYiFei Zhube isolated and see different storages. 414e15f460SYiFei Zhu 424e15f460SYiFei ZhuTo access the storage in a program, use ``bpf_get_local_storage``:: 434e15f460SYiFei Zhu 444e15f460SYiFei Zhu void *bpf_get_local_storage(void *map, u64 flags) 454e15f460SYiFei Zhu 464e15f460SYiFei Zhu``flags`` is reserved for future use and must be 0. 474e15f460SYiFei Zhu 484e15f460SYiFei ZhuThere is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` 494e15f460SYiFei Zhucan be accessed by multiple programs across different CPUs, and user should 504e15f460SYiFei Zhutake care of synchronization by themselves. The bpf infrastructure provides 514e15f460SYiFei Zhu``struct bpf_spin_lock`` to synchronize the storage. See 524e15f460SYiFei Zhu``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 534e15f460SYiFei Zhu 544e15f460SYiFei ZhuExamples 554e15f460SYiFei Zhu======== 564e15f460SYiFei Zhu 574e15f460SYiFei ZhuUsage with key type as ``struct bpf_cgroup_storage_key``:: 584e15f460SYiFei Zhu 594e15f460SYiFei Zhu #include <bpf/bpf.h> 604e15f460SYiFei Zhu 614e15f460SYiFei Zhu struct { 624e15f460SYiFei Zhu __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 634e15f460SYiFei Zhu __type(key, struct bpf_cgroup_storage_key); 644e15f460SYiFei Zhu __type(value, __u32); 654e15f460SYiFei Zhu } cgroup_storage SEC(".maps"); 664e15f460SYiFei Zhu 674e15f460SYiFei Zhu int program(struct __sk_buff *skb) 684e15f460SYiFei Zhu { 694e15f460SYiFei Zhu __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 704e15f460SYiFei Zhu __sync_fetch_and_add(ptr, 1); 714e15f460SYiFei Zhu 724e15f460SYiFei Zhu return 0; 734e15f460SYiFei Zhu } 744e15f460SYiFei Zhu 754e15f460SYiFei ZhuUserspace accessing map declared above:: 764e15f460SYiFei Zhu 774e15f460SYiFei Zhu #include <linux/bpf.h> 784e15f460SYiFei Zhu #include <linux/libbpf.h> 794e15f460SYiFei Zhu 804e15f460SYiFei Zhu __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 814e15f460SYiFei Zhu { 824e15f460SYiFei Zhu struct bpf_cgroup_storage_key = { 834e15f460SYiFei Zhu .cgroup_inode_id = cgrp, 844e15f460SYiFei Zhu .attach_type = type, 854e15f460SYiFei Zhu }; 864e15f460SYiFei Zhu __u32 value; 874e15f460SYiFei Zhu bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); 884e15f460SYiFei Zhu // error checking omitted 894e15f460SYiFei Zhu return value; 904e15f460SYiFei Zhu } 914e15f460SYiFei Zhu 924e15f460SYiFei ZhuAlternatively, using just ``__u64 cgroup_inode_id`` as key type:: 934e15f460SYiFei Zhu 944e15f460SYiFei Zhu #include <bpf/bpf.h> 954e15f460SYiFei Zhu 964e15f460SYiFei Zhu struct { 974e15f460SYiFei Zhu __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 984e15f460SYiFei Zhu __type(key, __u64); 994e15f460SYiFei Zhu __type(value, __u32); 1004e15f460SYiFei Zhu } cgroup_storage SEC(".maps"); 1014e15f460SYiFei Zhu 1024e15f460SYiFei Zhu int program(struct __sk_buff *skb) 1034e15f460SYiFei Zhu { 1044e15f460SYiFei Zhu __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 1054e15f460SYiFei Zhu __sync_fetch_and_add(ptr, 1); 1064e15f460SYiFei Zhu 1074e15f460SYiFei Zhu return 0; 1084e15f460SYiFei Zhu } 1094e15f460SYiFei Zhu 1104e15f460SYiFei ZhuAnd userspace:: 1114e15f460SYiFei Zhu 1124e15f460SYiFei Zhu #include <linux/bpf.h> 1134e15f460SYiFei Zhu #include <linux/libbpf.h> 1144e15f460SYiFei Zhu 1154e15f460SYiFei Zhu __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 1164e15f460SYiFei Zhu { 1174e15f460SYiFei Zhu __u32 value; 1184e15f460SYiFei Zhu bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); 1194e15f460SYiFei Zhu // error checking omitted 1204e15f460SYiFei Zhu return value; 1214e15f460SYiFei Zhu } 1224e15f460SYiFei Zhu 1234e15f460SYiFei ZhuSemantics 1244e15f460SYiFei Zhu========= 1254e15f460SYiFei Zhu 1264e15f460SYiFei Zhu``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This 1274e15f460SYiFei Zhuper-CPU variant will have different memory regions for each CPU for each 1284e15f460SYiFei Zhustorage. The non-per-CPU will have the same memory region for each storage. 1294e15f460SYiFei Zhu 1304e15f460SYiFei ZhuPrior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and 1314e15f460SYiFei Zhufor a single ``CGROUP_STORAGE`` map, there can be at most one program loaded 1324e15f460SYiFei Zhuthat uses the map. A program may be attached to multiple cgroups or have 1334e15f460SYiFei Zhumultiple attach types, and each attach creates a fresh zeroed storage. The 1344e15f460SYiFei Zhustorage is freed upon detach. 1354e15f460SYiFei Zhu 1364e15f460SYiFei ZhuThere is a one-to-one association between the map of each type (per-CPU and 1374e15f460SYiFei Zhunon-per-CPU) and the BPF program during load verification time. As a result, 1384e15f460SYiFei Zhueach map can only be used by one BPF program and each BPF program can only use 1394e15f460SYiFei Zhuone storage map of each type. Because of map can only be used by one BPF 1404e15f460SYiFei Zhuprogram, sharing of this cgroup's storage with other BPF programs were 1414e15f460SYiFei Zhuimpossible. 1424e15f460SYiFei Zhu 1434e15f460SYiFei ZhuSince Linux 5.9, storage can be shared by multiple programs. When a program is 1444e15f460SYiFei Zhuattached to a cgroup, the kernel would create a new storage only if the map 1454e15f460SYiFei Zhudoes not already contain an entry for the cgroup and attach type pair, or else 1464e15f460SYiFei Zhuthe old storage is reused for the new attachment. If the map is attach type 1474e15f460SYiFei Zhushared, then attach type is simply ignored during comparison. Storage is freed 1484e15f460SYiFei Zhuonly when either the map or the cgroup attached to is being freed. Detaching 1494e15f460SYiFei Zhuwill not directly free the storage, but it may cause the reference to the map 1504e15f460SYiFei Zhuto reach zero and indirectly freeing all storage in the map. 1514e15f460SYiFei Zhu 1524e15f460SYiFei ZhuThe map is not associated with any BPF program, thus making sharing possible. 1534e15f460SYiFei ZhuHowever, the BPF program can still only associate with one map of each type 1544e15f460SYiFei Zhu(per-CPU and non-per-CPU). A BPF program cannot use more than one 1554e15f460SYiFei Zhu``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one 1564e15f460SYiFei Zhu``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. 1574e15f460SYiFei Zhu 158*d2bef8e1SAkhil RajIn all versions, userspace may use the attach parameters of cgroup and 1594e15f460SYiFei Zhuattach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map 1604e15f460SYiFei ZhuAPIs to read or update the storage for a given attachment. For Linux 5.9 1614e15f460SYiFei Zhuattach type shared storages, only the first value in the struct, cgroup inode 1624e15f460SYiFei Zhuid, is used during comparison, so userspace may just specify a ``__u64`` 1634e15f460SYiFei Zhudirectly. 1644e15f460SYiFei Zhu 1654e15f460SYiFei ZhuThe storage is bound at attach time. Even if the program is attached to parent 1664e15f460SYiFei Zhuand triggers in child, the storage still belongs to the parent. 1674e15f460SYiFei Zhu 1684e15f460SYiFei ZhuUserspace cannot create a new entry in the map or delete an existing entry. 1694e15f460SYiFei ZhuProgram test runs always use a temporary storage. 170