18972e18aSSreevani Sreejith============= 28972e18aSSreevani SreejithBPF Iterators 38972e18aSSreevani Sreejith============= 48972e18aSSreevani Sreejith 58972e18aSSreevani Sreejith 68972e18aSSreevani Sreejith---------- 78972e18aSSreevani SreejithMotivation 88972e18aSSreevani Sreejith---------- 98972e18aSSreevani Sreejith 108972e18aSSreevani SreejithThere are a few existing ways to dump kernel data into user space. The most 118972e18aSSreevani Sreejithpopular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps 128972e18aSSreevani Sreejithall tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink 138972e18aSSreevani Sreejithsockets in the system. However, their output format tends to be fixed, and if 148972e18aSSreevani Sreejithusers want more information about these sockets, they have to patch the kernel, 158972e18aSSreevani Sreejithwhich often takes time to publish upstream and release. The same is true for popular 168972e18aSSreevani Sreejithtools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any 178972e18aSSreevani Sreejithadditional information needs a kernel patch. 188972e18aSSreevani Sreejith 198972e18aSSreevani SreejithTo solve this problem, the `drgn 208972e18aSSreevani Sreejith<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to 218972e18aSSreevani Sreejithdig out the kernel data with no kernel change. However, the main drawback for 228972e18aSSreevani Sreejithdrgn is performance, as it cannot do pointer tracing inside the kernel. In 238972e18aSSreevani Sreejithaddition, drgn cannot validate a pointer value and may read invalid data if the 248972e18aSSreevani Sreejithpointer becomes invalid inside the kernel. 258972e18aSSreevani Sreejith 268972e18aSSreevani SreejithThe BPF iterator solves the above problem by providing flexibility on what data 278972e18aSSreevani Sreejith(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel 288972e18aSSreevani Sreejithdata object. 298972e18aSSreevani Sreejith 308972e18aSSreevani Sreejith---------------------- 318972e18aSSreevani SreejithHow BPF Iterators Work 328972e18aSSreevani Sreejith---------------------- 338972e18aSSreevani Sreejith 348972e18aSSreevani SreejithA BPF iterator is a type of BPF program that allows users to iterate over 358972e18aSSreevani Sreejithspecific types of kernel objects. Unlike traditional BPF tracing programs that 368972e18aSSreevani Sreejithallow users to define callbacks that are invoked at particular points of 378972e18aSSreevani Sreejithexecution in the kernel, BPF iterators allow users to define callbacks that 388972e18aSSreevani Sreejithshould be executed for every entry in a variety of kernel data structures. 398972e18aSSreevani Sreejith 408972e18aSSreevani SreejithFor example, users can define a BPF iterator that iterates over every task on 418972e18aSSreevani Sreejiththe system and dumps the total amount of CPU runtime currently used by each of 428972e18aSSreevani Sreejiththem. Another BPF task iterator may instead dump the cgroup information for each 438972e18aSSreevani Sreejithtask. Such flexibility is the core value of BPF iterators. 448972e18aSSreevani Sreejith 458972e18aSSreevani SreejithA BPF program is always loaded into the kernel at the behest of a user space 468972e18aSSreevani Sreejithprocess. A user space process loads a BPF program by opening and initializing 478972e18aSSreevani Sreejiththe program skeleton as required and then invoking a syscall to have the BPF 488972e18aSSreevani Sreejithprogram verified and loaded by the kernel. 498972e18aSSreevani Sreejith 508972e18aSSreevani SreejithIn traditional tracing programs, a program is activated by having user space 518972e18aSSreevani Sreejithobtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once 528972e18aSSreevani Sreejithactivated, the program callback will be invoked whenever the tracepoint is 538972e18aSSreevani Sreejithtriggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the 548972e18aSSreevani Sreejithprogram is obtained using ``bpf_link_create()``, and the program callback is 558972e18aSSreevani Sreejithinvoked by issuing system calls from user space. 568972e18aSSreevani Sreejith 578972e18aSSreevani SreejithNext, let us see how you can use the iterators to iterate on kernel objects and 588972e18aSSreevani Sreejithread data. 598972e18aSSreevani Sreejith 608972e18aSSreevani Sreejith------------------------ 618972e18aSSreevani SreejithHow to Use BPF iterators 628972e18aSSreevani Sreejith------------------------ 638972e18aSSreevani Sreejith 648972e18aSSreevani SreejithBPF selftests are a great resource to illustrate how to use the iterators. In 658972e18aSSreevani Sreejiththis section, we’ll walk through a BPF selftest which shows how to load and use 668972e18aSSreevani Sreejitha BPF iterator program. To begin, we’ll look at `bpf_iter.c 678972e18aSSreevani Sreejith<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_, 688972e18aSSreevani Sreejithwhich illustrates how to load and trigger BPF iterators on the user space side. 698972e18aSSreevani SreejithLater, we’ll look at a BPF program that runs in kernel space. 708972e18aSSreevani Sreejith 718972e18aSSreevani SreejithLoading a BPF iterator in the kernel from user space typically involves the 728972e18aSSreevani Sreejithfollowing steps: 738972e18aSSreevani Sreejith 748972e18aSSreevani Sreejith* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel 758972e18aSSreevani Sreejith has verified and loaded the program, it returns a file descriptor (fd) to user 768972e18aSSreevani Sreejith space. 778972e18aSSreevani Sreejith* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()`` 788972e18aSSreevani Sreejith specified with the BPF program file descriptor received from the kernel. 798972e18aSSreevani Sreejith* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the 808972e18aSSreevani Sreejith ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2. 818972e18aSSreevani Sreejith* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is 828972e18aSSreevani Sreejith available. 838972e18aSSreevani Sreejith* Close the iterator fd using ``close(bpf_iter_fd)``. 848972e18aSSreevani Sreejith* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again. 858972e18aSSreevani Sreejith 868972e18aSSreevani SreejithThe following are a few examples of selftest BPF iterator programs: 878972e18aSSreevani Sreejith 888972e18aSSreevani Sreejith* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_ 898972e18aSSreevani Sreejith* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_ 908972e18aSSreevani Sreejith* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_ 918972e18aSSreevani Sreejith 928972e18aSSreevani SreejithLet us look at ``bpf_iter_task_file.c``, which runs in kernel space: 938972e18aSSreevani Sreejith 948972e18aSSreevani SreejithHere is the definition of ``bpf_iter__task_file`` in `vmlinux.h 958972e18aSSreevani Sreejith<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_. 968972e18aSSreevani SreejithAny struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>`` 978972e18aSSreevani Sreejithrepresents a BPF iterator. The suffix ``<iter_name>`` represents the type of 988972e18aSSreevani Sreejithiterator. 998972e18aSSreevani Sreejith 1008972e18aSSreevani Sreejith:: 1018972e18aSSreevani Sreejith 1028972e18aSSreevani Sreejith struct bpf_iter__task_file { 1038972e18aSSreevani Sreejith union { 1048972e18aSSreevani Sreejith struct bpf_iter_meta *meta; 1058972e18aSSreevani Sreejith }; 1068972e18aSSreevani Sreejith union { 1078972e18aSSreevani Sreejith struct task_struct *task; 1088972e18aSSreevani Sreejith }; 1098972e18aSSreevani Sreejith u32 fd; 1108972e18aSSreevani Sreejith union { 1118972e18aSSreevani Sreejith struct file *file; 1128972e18aSSreevani Sreejith }; 1138972e18aSSreevani Sreejith }; 1148972e18aSSreevani Sreejith 1158972e18aSSreevani SreejithIn the above code, the field 'meta' contains the metadata, which is the same for 1168972e18aSSreevani Sreejithall BPF iterator programs. The rest of the fields are specific to different 1178972e18aSSreevani Sreejithiterators. For example, for task_file iterators, the kernel layer provides the 1188972e18aSSreevani Sreejith'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference 1198972e18aSSreevani Sreejithcounted 1208972e18aSSreevani Sreejith<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_, 1218972e18aSSreevani Sreejithso they won't go away when the BPF program runs. 1228972e18aSSreevani Sreejith 1238972e18aSSreevani SreejithHere is a snippet from the ``bpf_iter_task_file.c`` file: 1248972e18aSSreevani Sreejith 1258972e18aSSreevani Sreejith:: 1268972e18aSSreevani Sreejith 1278972e18aSSreevani Sreejith SEC("iter/task_file") 1288972e18aSSreevani Sreejith int dump_task_file(struct bpf_iter__task_file *ctx) 1298972e18aSSreevani Sreejith { 1308972e18aSSreevani Sreejith struct seq_file *seq = ctx->meta->seq; 1318972e18aSSreevani Sreejith struct task_struct *task = ctx->task; 1328972e18aSSreevani Sreejith struct file *file = ctx->file; 1338972e18aSSreevani Sreejith __u32 fd = ctx->fd; 1348972e18aSSreevani Sreejith 1358972e18aSSreevani Sreejith if (task == NULL || file == NULL) 1368972e18aSSreevani Sreejith return 0; 1378972e18aSSreevani Sreejith 1388972e18aSSreevani Sreejith if (ctx->meta->seq_num == 0) { 1398972e18aSSreevani Sreejith count = 0; 1408972e18aSSreevani Sreejith BPF_SEQ_PRINTF(seq, " tgid gid fd file\n"); 1418972e18aSSreevani Sreejith } 1428972e18aSSreevani Sreejith 1438972e18aSSreevani Sreejith if (tgid == task->tgid && task->tgid != task->pid) 1448972e18aSSreevani Sreejith count++; 1458972e18aSSreevani Sreejith 1468972e18aSSreevani Sreejith if (last_tgid != task->tgid) { 1478972e18aSSreevani Sreejith last_tgid = task->tgid; 1488972e18aSSreevani Sreejith unique_tgid_count++; 1498972e18aSSreevani Sreejith } 1508972e18aSSreevani Sreejith 1518972e18aSSreevani Sreejith BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 1528972e18aSSreevani Sreejith (long)file->f_op); 1538972e18aSSreevani Sreejith return 0; 1548972e18aSSreevani Sreejith } 1558972e18aSSreevani Sreejith 1568972e18aSSreevani SreejithIn the above example, the section name ``SEC(iter/task_file)``, indicates that 1578972e18aSSreevani Sreejiththe program is a BPF iterator program to iterate all files from all tasks. The 1588972e18aSSreevani Sreejithcontext of the program is ``bpf_iter__task_file`` struct. 1598972e18aSSreevani Sreejith 1608972e18aSSreevani SreejithThe user space program invokes the BPF iterator program running in the kernel 1618972e18aSSreevani Sreejithby issuing a ``read()`` syscall. Once invoked, the BPF 1628972e18aSSreevani Sreejithprogram can export data to user space using a variety of BPF helper functions. 1638972e18aSSreevani SreejithYou can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or 1648972e18aSSreevani Sreejith``bpf_seq_write()`` function based on whether you need formatted output or just 1658972e18aSSreevani Sreejithbinary data, respectively. For binary-encoded data, the user space applications 1668972e18aSSreevani Sreejithcan process the data from ``bpf_seq_write()`` as needed. For the formatted data, 1678972e18aSSreevani Sreejithyou can use ``cat <path>`` to print the results similar to ``cat 1688972e18aSSreevani Sreejith/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later, 1698972e18aSSreevani Sreejithuse ``rm -f <path>`` to remove the pinned iterator. 1708972e18aSSreevani Sreejith 1718972e18aSSreevani SreejithFor example, you can use the following command to create a BPF iterator from the 1728972e18aSSreevani Sreejith``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route`` 1738972e18aSSreevani Sreejithpath: 1748972e18aSSreevani Sreejith 1758972e18aSSreevani Sreejith:: 1768972e18aSSreevani Sreejith 1778972e18aSSreevani Sreejith $ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route 1788972e18aSSreevani Sreejith 1798972e18aSSreevani SreejithAnd then print out the results using the following command: 1808972e18aSSreevani Sreejith 1818972e18aSSreevani Sreejith:: 1828972e18aSSreevani Sreejith 1838972e18aSSreevani Sreejith $ cat /sys/fs/bpf/my_route 1848972e18aSSreevani Sreejith 1858972e18aSSreevani Sreejith 1868972e18aSSreevani Sreejith------------------------------------------------------- 1878972e18aSSreevani SreejithImplement Kernel Support for BPF Iterator Program Types 1888972e18aSSreevani Sreejith------------------------------------------------------- 1898972e18aSSreevani Sreejith 1908972e18aSSreevani SreejithTo implement a BPF iterator in the kernel, the developer must make a one-time 1918972e18aSSreevani Sreejithchange to the following key data structure defined in the `bpf.h 1928972e18aSSreevani Sreejith<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_ 1938972e18aSSreevani Sreejithfile. 1948972e18aSSreevani Sreejith 1958972e18aSSreevani Sreejith:: 1968972e18aSSreevani Sreejith 1978972e18aSSreevani Sreejith struct bpf_iter_reg { 1988972e18aSSreevani Sreejith const char *target; 1998972e18aSSreevani Sreejith bpf_iter_attach_target_t attach_target; 2008972e18aSSreevani Sreejith bpf_iter_detach_target_t detach_target; 2018972e18aSSreevani Sreejith bpf_iter_show_fdinfo_t show_fdinfo; 2028972e18aSSreevani Sreejith bpf_iter_fill_link_info_t fill_link_info; 2038972e18aSSreevani Sreejith bpf_iter_get_func_proto_t get_func_proto; 2048972e18aSSreevani Sreejith u32 ctx_arg_info_size; 2058972e18aSSreevani Sreejith u32 feature; 2068972e18aSSreevani Sreejith struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; 2078972e18aSSreevani Sreejith const struct bpf_iter_seq_info *seq_info; 2088972e18aSSreevani Sreejith }; 2098972e18aSSreevani Sreejith 2108972e18aSSreevani SreejithAfter filling the data structure fields, call ``bpf_iter_reg_target()`` to 2118972e18aSSreevani Sreejithregister the iterator to the main BPF iterator subsystem. 2128972e18aSSreevani Sreejith 2138972e18aSSreevani SreejithThe following is the breakdown for each field in struct ``bpf_iter_reg``. 2148972e18aSSreevani Sreejith 2158972e18aSSreevani Sreejith.. list-table:: 2168972e18aSSreevani Sreejith :widths: 25 50 2178972e18aSSreevani Sreejith :header-rows: 1 2188972e18aSSreevani Sreejith 2198972e18aSSreevani Sreejith * - Fields 2208972e18aSSreevani Sreejith - Description 2218972e18aSSreevani Sreejith * - target 2228972e18aSSreevani Sreejith - Specifies the name of the BPF iterator. For example: ``bpf_map``, 2238972e18aSSreevani Sreejith ``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel. 2248972e18aSSreevani Sreejith * - attach_target and detach_target 2258972e18aSSreevani Sreejith - Allows for target specific ``link_create`` action since some targets 2268972e18aSSreevani Sreejith may need special processing. Called during the user space link_create stage. 2278972e18aSSreevani Sreejith * - show_fdinfo and fill_link_info 2288972e18aSSreevani Sreejith - Called to fill target specific information when user tries to get link 2298972e18aSSreevani Sreejith info associated with the iterator. 2308972e18aSSreevani Sreejith * - get_func_proto 2318972e18aSSreevani Sreejith - Permits a BPF iterator to access BPF helpers specific to the iterator. 2328972e18aSSreevani Sreejith * - ctx_arg_info_size and ctx_arg_info 2338972e18aSSreevani Sreejith - Specifies the verifier states for BPF program arguments associated with 2348972e18aSSreevani Sreejith the bpf iterator. 2358972e18aSSreevani Sreejith * - feature 2368972e18aSSreevani Sreejith - Specifies certain action requests in the kernel BPF iterator 2378972e18aSSreevani Sreejith infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means 2388972e18aSSreevani Sreejith that the kernel function cond_resched() is called to avoid other kernel 2398972e18aSSreevani Sreejith subsystem (e.g., rcu) misbehaving. 2408972e18aSSreevani Sreejith * - seq_info 241*2404dd01SAnton Protopopov - Specifies the set of seq operations for the BPF iterator and helpers to 242*2404dd01SAnton Protopopov initialize/free the private data for the corresponding ``seq_file``. 2438972e18aSSreevani Sreejith 2448972e18aSSreevani Sreejith`Click here 2458972e18aSSreevani Sreejith<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_ 2468972e18aSSreevani Sreejithto see an implementation of the ``task_vma`` BPF iterator in the kernel. 2478972e18aSSreevani Sreejith 2488972e18aSSreevani Sreejith--------------------------------- 2498972e18aSSreevani SreejithParameterizing BPF Task Iterators 2508972e18aSSreevani Sreejith--------------------------------- 2518972e18aSSreevani Sreejith 2528972e18aSSreevani SreejithBy default, BPF iterators walk through all the objects of the specified types 2538972e18aSSreevani Sreejith(processes, cgroups, maps, etc.) across the entire system to read relevant 2548972e18aSSreevani Sreejithkernel data. But often, there are cases where we only care about a much smaller 2558972e18aSSreevani Sreejithsubset of iterable kernel objects, such as only iterating tasks within a 2568972e18aSSreevani Sreejithspecific process. Therefore, BPF iterator programs support filtering out objects 2578972e18aSSreevani Sreejithfrom iteration by allowing user space to configure the iterator program when it 2588972e18aSSreevani Sreejithis attached. 2598972e18aSSreevani Sreejith 2608972e18aSSreevani Sreejith-------------------------- 2618972e18aSSreevani SreejithBPF Task Iterator Program 2628972e18aSSreevani Sreejith-------------------------- 2638972e18aSSreevani Sreejith 2648972e18aSSreevani SreejithThe following code is a BPF iterator program to print files and task information 2658972e18aSSreevani Sreejiththrough the ``seq_file`` of the iterator. It is a standard BPF iterator program 2668972e18aSSreevani Sreejiththat visits every file of an iterator. We will use this BPF program in our 2678972e18aSSreevani Sreejithexample later. 2688972e18aSSreevani Sreejith 2698972e18aSSreevani Sreejith:: 2708972e18aSSreevani Sreejith 2718972e18aSSreevani Sreejith #include <vmlinux.h> 2728972e18aSSreevani Sreejith #include <bpf/bpf_helpers.h> 2738972e18aSSreevani Sreejith 2748972e18aSSreevani Sreejith char _license[] SEC("license") = "GPL"; 2758972e18aSSreevani Sreejith 2768972e18aSSreevani Sreejith SEC("iter/task_file") 2778972e18aSSreevani Sreejith int dump_task_file(struct bpf_iter__task_file *ctx) 2788972e18aSSreevani Sreejith { 2798972e18aSSreevani Sreejith struct seq_file *seq = ctx->meta->seq; 2808972e18aSSreevani Sreejith struct task_struct *task = ctx->task; 2818972e18aSSreevani Sreejith struct file *file = ctx->file; 2828972e18aSSreevani Sreejith __u32 fd = ctx->fd; 2838972e18aSSreevani Sreejith if (task == NULL || file == NULL) 2848972e18aSSreevani Sreejith return 0; 2858972e18aSSreevani Sreejith if (ctx->meta->seq_num == 0) { 2868972e18aSSreevani Sreejith BPF_SEQ_PRINTF(seq, " tgid pid fd file\n"); 2878972e18aSSreevani Sreejith } 2888972e18aSSreevani Sreejith BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 2898972e18aSSreevani Sreejith (long)file->f_op); 2908972e18aSSreevani Sreejith return 0; 2918972e18aSSreevani Sreejith } 2928972e18aSSreevani Sreejith 2938972e18aSSreevani Sreejith---------------------------------------- 2948972e18aSSreevani SreejithCreating a File Iterator with Parameters 2958972e18aSSreevani Sreejith---------------------------------------- 2968972e18aSSreevani Sreejith 2978972e18aSSreevani SreejithNow, let us look at how to create an iterator that includes only files of a 2988972e18aSSreevani Sreejithprocess. 2998972e18aSSreevani Sreejith 3008972e18aSSreevani SreejithFirst, fill the ``bpf_iter_attach_opts`` struct as shown below: 3018972e18aSSreevani Sreejith 3028972e18aSSreevani Sreejith:: 3038972e18aSSreevani Sreejith 3048972e18aSSreevani Sreejith LIBBPF_OPTS(bpf_iter_attach_opts, opts); 3058972e18aSSreevani Sreejith union bpf_iter_link_info linfo; 3068972e18aSSreevani Sreejith memset(&linfo, 0, sizeof(linfo)); 3078972e18aSSreevani Sreejith linfo.task.pid = getpid(); 3088972e18aSSreevani Sreejith opts.link_info = &linfo; 3098972e18aSSreevani Sreejith opts.link_info_len = sizeof(linfo); 3108972e18aSSreevani Sreejith 3118972e18aSSreevani Sreejith``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator 3128972e18aSSreevani Sreejiththat only includes opened files for the process with the specified ``pid``. In 3138972e18aSSreevani Sreejiththis example, we will only be iterating files for our process. If 3148972e18aSSreevani Sreejith``linfo.task.pid`` is zero, the iterator will visit every opened file of every 3158972e18aSSreevani Sreejithprocess. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator 3168972e18aSSreevani Sreejiththat visits opened files of a specific thread, not a process. In this example, 3178972e18aSSreevani Sreejith``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a 3188972e18aSSreevani Sreejithseparate file descriptor table. In most circumstances, all process threads share 3198972e18aSSreevani Sreejitha single file descriptor table. 3208972e18aSSreevani Sreejith 3218972e18aSSreevani SreejithNow, in the userspace program, pass the pointer of struct to the 3228972e18aSSreevani Sreejith``bpf_program__attach_iter()``. 3238972e18aSSreevani Sreejith 3248972e18aSSreevani Sreejith:: 3258972e18aSSreevani Sreejith 3268972e18aSSreevani Sreejith link = bpf_program__attach_iter(prog, &opts); iter_fd = 3278972e18aSSreevani Sreejith bpf_iter_create(bpf_link__fd(link)); 3288972e18aSSreevani Sreejith 3298972e18aSSreevani SreejithIf both *tid* and *pid* are zero, an iterator created from this struct 3308972e18aSSreevani Sreejith``bpf_iter_attach_opts`` will include every opened file of every task in the 3318972e18aSSreevani Sreejithsystem (in the namespace, actually.) It is the same as passing a NULL as the 3328972e18aSSreevani Sreejithsecond argument to ``bpf_program__attach_iter()``. 3338972e18aSSreevani Sreejith 3348972e18aSSreevani SreejithThe whole program looks like the following code: 3358972e18aSSreevani Sreejith 3368972e18aSSreevani Sreejith:: 3378972e18aSSreevani Sreejith 3388972e18aSSreevani Sreejith #include <stdio.h> 3398972e18aSSreevani Sreejith #include <unistd.h> 3408972e18aSSreevani Sreejith #include <bpf/bpf.h> 3418972e18aSSreevani Sreejith #include <bpf/libbpf.h> 3428972e18aSSreevani Sreejith #include "bpf_iter_task_ex.skel.h" 3438972e18aSSreevani Sreejith 3448972e18aSSreevani Sreejith static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts) 3458972e18aSSreevani Sreejith { 3468972e18aSSreevani Sreejith struct bpf_link *link; 3478972e18aSSreevani Sreejith char buf[16] = {}; 3488972e18aSSreevani Sreejith int iter_fd = -1, len; 3498972e18aSSreevani Sreejith int ret = 0; 3508972e18aSSreevani Sreejith 3518972e18aSSreevani Sreejith link = bpf_program__attach_iter(prog, opts); 3528972e18aSSreevani Sreejith if (!link) { 3538972e18aSSreevani Sreejith fprintf(stderr, "bpf_program__attach_iter() fails\n"); 3548972e18aSSreevani Sreejith return -1; 3558972e18aSSreevani Sreejith } 3568972e18aSSreevani Sreejith iter_fd = bpf_iter_create(bpf_link__fd(link)); 3578972e18aSSreevani Sreejith if (iter_fd < 0) { 3588972e18aSSreevani Sreejith fprintf(stderr, "bpf_iter_create() fails\n"); 3598972e18aSSreevani Sreejith ret = -1; 3608972e18aSSreevani Sreejith goto free_link; 3618972e18aSSreevani Sreejith } 3628972e18aSSreevani Sreejith /* not check contents, but ensure read() ends without error */ 3638972e18aSSreevani Sreejith while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) { 3648972e18aSSreevani Sreejith buf[len] = 0; 3658972e18aSSreevani Sreejith printf("%s", buf); 3668972e18aSSreevani Sreejith } 3678972e18aSSreevani Sreejith printf("\n"); 3688972e18aSSreevani Sreejith free_link: 3698972e18aSSreevani Sreejith if (iter_fd >= 0) 3708972e18aSSreevani Sreejith close(iter_fd); 3718972e18aSSreevani Sreejith bpf_link__destroy(link); 3728972e18aSSreevani Sreejith return 0; 3738972e18aSSreevani Sreejith } 3748972e18aSSreevani Sreejith 3758972e18aSSreevani Sreejith static void test_task_file(void) 3768972e18aSSreevani Sreejith { 3778972e18aSSreevani Sreejith LIBBPF_OPTS(bpf_iter_attach_opts, opts); 3788972e18aSSreevani Sreejith struct bpf_iter_task_ex *skel; 3798972e18aSSreevani Sreejith union bpf_iter_link_info linfo; 3808972e18aSSreevani Sreejith skel = bpf_iter_task_ex__open_and_load(); 3818972e18aSSreevani Sreejith if (skel == NULL) 3828972e18aSSreevani Sreejith return; 3838972e18aSSreevani Sreejith memset(&linfo, 0, sizeof(linfo)); 3848972e18aSSreevani Sreejith linfo.task.pid = getpid(); 3858972e18aSSreevani Sreejith opts.link_info = &linfo; 3868972e18aSSreevani Sreejith opts.link_info_len = sizeof(linfo); 3878972e18aSSreevani Sreejith printf("PID %d\n", getpid()); 3888972e18aSSreevani Sreejith do_read_opts(skel->progs.dump_task_file, &opts); 3898972e18aSSreevani Sreejith bpf_iter_task_ex__destroy(skel); 3908972e18aSSreevani Sreejith } 3918972e18aSSreevani Sreejith 3928972e18aSSreevani Sreejith int main(int argc, const char * const * argv) 3938972e18aSSreevani Sreejith { 3948972e18aSSreevani Sreejith test_task_file(); 3958972e18aSSreevani Sreejith return 0; 3968972e18aSSreevani Sreejith } 3978972e18aSSreevani Sreejith 3988972e18aSSreevani SreejithThe following lines are the output of the program. 3998972e18aSSreevani Sreejith:: 4008972e18aSSreevani Sreejith 4018972e18aSSreevani Sreejith PID 1859 4028972e18aSSreevani Sreejith 4038972e18aSSreevani Sreejith tgid pid fd file 4048972e18aSSreevani Sreejith 1859 1859 0 ffffffff82270aa0 4058972e18aSSreevani Sreejith 1859 1859 1 ffffffff82270aa0 4068972e18aSSreevani Sreejith 1859 1859 2 ffffffff82270aa0 4078972e18aSSreevani Sreejith 1859 1859 3 ffffffff82272980 4088972e18aSSreevani Sreejith 1859 1859 4 ffffffff8225e120 4098972e18aSSreevani Sreejith 1859 1859 5 ffffffff82255120 4108972e18aSSreevani Sreejith 1859 1859 6 ffffffff82254f00 4118972e18aSSreevani Sreejith 1859 1859 7 ffffffff82254d80 4128972e18aSSreevani Sreejith 1859 1859 8 ffffffff8225abe0 4138972e18aSSreevani Sreejith 4148972e18aSSreevani Sreejith------------------ 4158972e18aSSreevani SreejithWithout Parameters 4168972e18aSSreevani Sreejith------------------ 4178972e18aSSreevani Sreejith 4188972e18aSSreevani SreejithLet us look at how a BPF iterator without parameters skips files of other 4198972e18aSSreevani Sreejithprocesses in the system. In this case, the BPF program has to check the pid or 4208972e18aSSreevani Sreejiththe tid of tasks, or it will receive every opened file in the system (in the 4218972e18aSSreevani Sreejithcurrent *pid* namespace, actually). So, we usually add a global variable in the 4228972e18aSSreevani SreejithBPF program to pass a *pid* to the BPF program. 4238972e18aSSreevani Sreejith 4248972e18aSSreevani SreejithThe BPF program would look like the following block. 4258972e18aSSreevani Sreejith 4268972e18aSSreevani Sreejith :: 4278972e18aSSreevani Sreejith 4288972e18aSSreevani Sreejith ...... 4298972e18aSSreevani Sreejith int target_pid = 0; 4308972e18aSSreevani Sreejith 4318972e18aSSreevani Sreejith SEC("iter/task_file") 4328972e18aSSreevani Sreejith int dump_task_file(struct bpf_iter__task_file *ctx) 4338972e18aSSreevani Sreejith { 4348972e18aSSreevani Sreejith ...... 4358972e18aSSreevani Sreejith if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */ 4368972e18aSSreevani Sreejith return 0; 4378972e18aSSreevani Sreejith BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 4388972e18aSSreevani Sreejith (long)file->f_op); 4398972e18aSSreevani Sreejith return 0; 4408972e18aSSreevani Sreejith } 4418972e18aSSreevani Sreejith 4428972e18aSSreevani SreejithThe user space program would look like the following block: 4438972e18aSSreevani Sreejith 4448972e18aSSreevani Sreejith :: 4458972e18aSSreevani Sreejith 4468972e18aSSreevani Sreejith ...... 4478972e18aSSreevani Sreejith static void test_task_file(void) 4488972e18aSSreevani Sreejith { 4498972e18aSSreevani Sreejith ...... 4508972e18aSSreevani Sreejith skel = bpf_iter_task_ex__open_and_load(); 4518972e18aSSreevani Sreejith if (skel == NULL) 4528972e18aSSreevani Sreejith return; 4538972e18aSSreevani Sreejith skel->bss->target_pid = getpid(); /* process ID. For thread id, use gettid() */ 4548972e18aSSreevani Sreejith memset(&linfo, 0, sizeof(linfo)); 4558972e18aSSreevani Sreejith linfo.task.pid = getpid(); 4568972e18aSSreevani Sreejith opts.link_info = &linfo; 4578972e18aSSreevani Sreejith opts.link_info_len = sizeof(linfo); 4588972e18aSSreevani Sreejith ...... 4598972e18aSSreevani Sreejith } 4608972e18aSSreevani Sreejith 4618972e18aSSreevani Sreejith``target_pid`` is a global variable in the BPF program. The user space program 4628972e18aSSreevani Sreejithshould initialize the variable with a process ID to skip opened files of other 4638972e18aSSreevani Sreejithprocesses in the BPF program. When you parametrize a BPF iterator, the iterator 4648972e18aSSreevani Sreejithcalls the BPF program fewer times which can save significant resources. 4658972e18aSSreevani Sreejith 4668972e18aSSreevani Sreejith--------------------------- 4678972e18aSSreevani SreejithParametrizing VMA Iterators 4688972e18aSSreevani Sreejith--------------------------- 4698972e18aSSreevani Sreejith 4708972e18aSSreevani SreejithBy default, a BPF VMA iterator includes every VMA in every process. However, 4718972e18aSSreevani Sreejithyou can still specify a process or a thread to include only its VMAs. Unlike 4728972e18aSSreevani Sreejithfiles, a thread can not have a separate address space (since Linux 2.6.0-test6). 4738972e18aSSreevani SreejithHere, using *tid* makes no difference from using *pid*. 4748972e18aSSreevani Sreejith 4758972e18aSSreevani Sreejith---------------------------- 4768972e18aSSreevani SreejithParametrizing Task Iterators 4778972e18aSSreevani Sreejith---------------------------- 4788972e18aSSreevani Sreejith 4798972e18aSSreevani SreejithA BPF task iterator with *pid* includes all tasks (threads) of a process. The 4808972e18aSSreevani SreejithBPF program receives these tasks one after another. You can specify a BPF task 4818972e18aSSreevani Sreejithiterator with *tid* parameter to include only the tasks that match the given 4828972e18aSSreevani Sreejith*tid*. 483