xref: /linux/Documentation/bpf/bpf_iterators.rst (revision cdd5b5a9761fd66d17586e4f4ba6588c70e640ea)
18972e18aSSreevani Sreejith=============
28972e18aSSreevani SreejithBPF Iterators
38972e18aSSreevani Sreejith=============
48972e18aSSreevani Sreejith
58972e18aSSreevani Sreejith
68972e18aSSreevani Sreejith----------
78972e18aSSreevani SreejithMotivation
88972e18aSSreevani Sreejith----------
98972e18aSSreevani Sreejith
108972e18aSSreevani SreejithThere are a few existing ways to dump kernel data into user space. The most
118972e18aSSreevani Sreejithpopular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
128972e18aSSreevani Sreejithall tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink
138972e18aSSreevani Sreejithsockets in the system. However, their output format tends to be fixed, and if
148972e18aSSreevani Sreejithusers want more information about these sockets, they have to patch the kernel,
158972e18aSSreevani Sreejithwhich often takes time to publish upstream and release. The same is true for popular
168972e18aSSreevani Sreejithtools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any
178972e18aSSreevani Sreejithadditional information needs a kernel patch.
188972e18aSSreevani Sreejith
198972e18aSSreevani SreejithTo solve this problem, the `drgn
208972e18aSSreevani Sreejith<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to
218972e18aSSreevani Sreejithdig out the kernel data with no kernel change. However, the main drawback for
228972e18aSSreevani Sreejithdrgn is performance, as it cannot do pointer tracing inside the kernel. In
238972e18aSSreevani Sreejithaddition, drgn cannot validate a pointer value and may read invalid data if the
248972e18aSSreevani Sreejithpointer becomes invalid inside the kernel.
258972e18aSSreevani Sreejith
268972e18aSSreevani SreejithThe BPF iterator solves the above problem by providing flexibility on what data
278972e18aSSreevani Sreejith(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel
288972e18aSSreevani Sreejithdata object.
298972e18aSSreevani Sreejith
308972e18aSSreevani Sreejith----------------------
318972e18aSSreevani SreejithHow BPF Iterators Work
328972e18aSSreevani Sreejith----------------------
338972e18aSSreevani Sreejith
348972e18aSSreevani SreejithA BPF iterator is a type of BPF program that allows users to iterate over
358972e18aSSreevani Sreejithspecific types of kernel objects. Unlike traditional BPF tracing programs that
368972e18aSSreevani Sreejithallow users to define callbacks that are invoked at particular points of
378972e18aSSreevani Sreejithexecution in the kernel, BPF iterators allow users to define callbacks that
388972e18aSSreevani Sreejithshould be executed for every entry in a variety of kernel data structures.
398972e18aSSreevani Sreejith
408972e18aSSreevani SreejithFor example, users can define a BPF iterator that iterates over every task on
418972e18aSSreevani Sreejiththe system and dumps the total amount of CPU runtime currently used by each of
428972e18aSSreevani Sreejiththem. Another BPF task iterator may instead dump the cgroup information for each
438972e18aSSreevani Sreejithtask. Such flexibility is the core value of BPF iterators.
448972e18aSSreevani Sreejith
458972e18aSSreevani SreejithA BPF program is always loaded into the kernel at the behest of a user space
468972e18aSSreevani Sreejithprocess. A user space process loads a BPF program by opening and initializing
478972e18aSSreevani Sreejiththe program skeleton as required and then invoking a syscall to have the BPF
488972e18aSSreevani Sreejithprogram verified and loaded by the kernel.
498972e18aSSreevani Sreejith
508972e18aSSreevani SreejithIn traditional tracing programs, a program is activated by having user space
518972e18aSSreevani Sreejithobtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once
528972e18aSSreevani Sreejithactivated, the program callback will be invoked whenever the tracepoint is
538972e18aSSreevani Sreejithtriggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the
548972e18aSSreevani Sreejithprogram is obtained using ``bpf_link_create()``, and the program callback is
558972e18aSSreevani Sreejithinvoked by issuing system calls from user space.
568972e18aSSreevani Sreejith
578972e18aSSreevani SreejithNext, let us see how you can use the iterators to iterate on kernel objects and
588972e18aSSreevani Sreejithread data.
598972e18aSSreevani Sreejith
608972e18aSSreevani Sreejith------------------------
618972e18aSSreevani SreejithHow to Use BPF iterators
628972e18aSSreevani Sreejith------------------------
638972e18aSSreevani Sreejith
648972e18aSSreevani SreejithBPF selftests are a great resource to illustrate how to use the iterators. In
658972e18aSSreevani Sreejiththis section, we’ll walk through a BPF selftest which shows how to load and use
668972e18aSSreevani Sreejitha BPF iterator program.   To begin, we’ll look at `bpf_iter.c
678972e18aSSreevani Sreejith<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_,
688972e18aSSreevani Sreejithwhich illustrates how to load and trigger BPF iterators on the user space side.
698972e18aSSreevani SreejithLater, we’ll look at a BPF program that runs in kernel space.
708972e18aSSreevani Sreejith
718972e18aSSreevani SreejithLoading a BPF iterator in the kernel from user space typically involves the
728972e18aSSreevani Sreejithfollowing steps:
738972e18aSSreevani Sreejith
748972e18aSSreevani Sreejith* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel
758972e18aSSreevani Sreejith  has verified and loaded the program, it returns a file descriptor (fd) to user
768972e18aSSreevani Sreejith  space.
778972e18aSSreevani Sreejith* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()``
788972e18aSSreevani Sreejith  specified with the BPF program file descriptor received from the kernel.
798972e18aSSreevani Sreejith* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the
808972e18aSSreevani Sreejith  ``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2.
818972e18aSSreevani Sreejith* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is
828972e18aSSreevani Sreejith  available.
838972e18aSSreevani Sreejith* Close the iterator fd using ``close(bpf_iter_fd)``.
848972e18aSSreevani Sreejith* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again.
858972e18aSSreevani Sreejith
868972e18aSSreevani SreejithThe following are a few examples of selftest BPF iterator programs:
878972e18aSSreevani Sreejith
888972e18aSSreevani Sreejith* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_
898972e18aSSreevani Sreejith* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_
908972e18aSSreevani Sreejith* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_
918972e18aSSreevani Sreejith
928972e18aSSreevani SreejithLet us look at ``bpf_iter_task_file.c``, which runs in kernel space:
938972e18aSSreevani Sreejith
948972e18aSSreevani SreejithHere is the definition of ``bpf_iter__task_file`` in `vmlinux.h
958972e18aSSreevani Sreejith<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_.
968972e18aSSreevani SreejithAny struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>``
978972e18aSSreevani Sreejithrepresents a BPF iterator. The suffix ``<iter_name>`` represents the type of
988972e18aSSreevani Sreejithiterator.
998972e18aSSreevani Sreejith
1008972e18aSSreevani Sreejith::
1018972e18aSSreevani Sreejith
1028972e18aSSreevani Sreejith    struct bpf_iter__task_file {
1038972e18aSSreevani Sreejith            union {
1048972e18aSSreevani Sreejith                struct bpf_iter_meta *meta;
1058972e18aSSreevani Sreejith            };
1068972e18aSSreevani Sreejith            union {
1078972e18aSSreevani Sreejith                struct task_struct *task;
1088972e18aSSreevani Sreejith            };
1098972e18aSSreevani Sreejith            u32 fd;
1108972e18aSSreevani Sreejith            union {
1118972e18aSSreevani Sreejith                struct file *file;
1128972e18aSSreevani Sreejith            };
1138972e18aSSreevani Sreejith    };
1148972e18aSSreevani Sreejith
1158972e18aSSreevani SreejithIn the above code, the field 'meta' contains the metadata, which is the same for
1168972e18aSSreevani Sreejithall BPF iterator programs. The rest of the fields are specific to different
1178972e18aSSreevani Sreejithiterators. For example, for task_file iterators, the kernel layer provides the
1188972e18aSSreevani Sreejith'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference
1198972e18aSSreevani Sreejithcounted
1208972e18aSSreevani Sreejith<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_,
1218972e18aSSreevani Sreejithso they won't go away when the BPF program runs.
1228972e18aSSreevani Sreejith
1238972e18aSSreevani SreejithHere is a snippet from the  ``bpf_iter_task_file.c`` file:
1248972e18aSSreevani Sreejith
1258972e18aSSreevani Sreejith::
1268972e18aSSreevani Sreejith
1278972e18aSSreevani Sreejith  SEC("iter/task_file")
1288972e18aSSreevani Sreejith  int dump_task_file(struct bpf_iter__task_file *ctx)
1298972e18aSSreevani Sreejith  {
1308972e18aSSreevani Sreejith    struct seq_file *seq = ctx->meta->seq;
1318972e18aSSreevani Sreejith    struct task_struct *task = ctx->task;
1328972e18aSSreevani Sreejith    struct file *file = ctx->file;
1338972e18aSSreevani Sreejith    __u32 fd = ctx->fd;
1348972e18aSSreevani Sreejith
1358972e18aSSreevani Sreejith    if (task == NULL || file == NULL)
1368972e18aSSreevani Sreejith      return 0;
1378972e18aSSreevani Sreejith
1388972e18aSSreevani Sreejith    if (ctx->meta->seq_num == 0) {
1398972e18aSSreevani Sreejith      count = 0;
1408972e18aSSreevani Sreejith      BPF_SEQ_PRINTF(seq, "    tgid      gid       fd      file\n");
1418972e18aSSreevani Sreejith    }
1428972e18aSSreevani Sreejith
1438972e18aSSreevani Sreejith    if (tgid == task->tgid && task->tgid != task->pid)
1448972e18aSSreevani Sreejith      count++;
1458972e18aSSreevani Sreejith
1468972e18aSSreevani Sreejith    if (last_tgid != task->tgid) {
1478972e18aSSreevani Sreejith      last_tgid = task->tgid;
1488972e18aSSreevani Sreejith      unique_tgid_count++;
1498972e18aSSreevani Sreejith    }
1508972e18aSSreevani Sreejith
1518972e18aSSreevani Sreejith    BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
1528972e18aSSreevani Sreejith            (long)file->f_op);
1538972e18aSSreevani Sreejith    return 0;
1548972e18aSSreevani Sreejith  }
1558972e18aSSreevani Sreejith
1568972e18aSSreevani SreejithIn the above example, the section name ``SEC(iter/task_file)``, indicates that
1578972e18aSSreevani Sreejiththe program is a BPF iterator program to iterate all files from all tasks. The
1588972e18aSSreevani Sreejithcontext of the program is ``bpf_iter__task_file`` struct.
1598972e18aSSreevani Sreejith
1608972e18aSSreevani SreejithThe user space program invokes the BPF iterator program running in the kernel
1618972e18aSSreevani Sreejithby issuing a ``read()`` syscall. Once invoked, the BPF
1628972e18aSSreevani Sreejithprogram can export data to user space using a variety of BPF helper functions.
1638972e18aSSreevani SreejithYou can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or
1648972e18aSSreevani Sreejith``bpf_seq_write()`` function based on whether you need formatted output or just
1658972e18aSSreevani Sreejithbinary data, respectively. For binary-encoded data, the user space applications
1668972e18aSSreevani Sreejithcan process the data from ``bpf_seq_write()`` as needed. For the formatted data,
1678972e18aSSreevani Sreejithyou can use ``cat <path>`` to print the results similar to ``cat
1688972e18aSSreevani Sreejith/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later,
1698972e18aSSreevani Sreejithuse  ``rm -f <path>`` to remove the pinned iterator.
1708972e18aSSreevani Sreejith
1718972e18aSSreevani SreejithFor example, you can use the following command to create a BPF iterator from the
1728972e18aSSreevani Sreejith``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route``
1738972e18aSSreevani Sreejithpath:
1748972e18aSSreevani Sreejith
1758972e18aSSreevani Sreejith::
1768972e18aSSreevani Sreejith
1778972e18aSSreevani Sreejith  $ bpftool iter pin ./bpf_iter_ipv6_route.o  /sys/fs/bpf/my_route
1788972e18aSSreevani Sreejith
1798972e18aSSreevani SreejithAnd then print out the results using the following command:
1808972e18aSSreevani Sreejith
1818972e18aSSreevani Sreejith::
1828972e18aSSreevani Sreejith
1838972e18aSSreevani Sreejith  $ cat /sys/fs/bpf/my_route
1848972e18aSSreevani Sreejith
1858972e18aSSreevani Sreejith
1868972e18aSSreevani Sreejith-------------------------------------------------------
1878972e18aSSreevani SreejithImplement Kernel Support for BPF Iterator Program Types
1888972e18aSSreevani Sreejith-------------------------------------------------------
1898972e18aSSreevani Sreejith
1908972e18aSSreevani SreejithTo implement a BPF iterator in the kernel, the developer must make a one-time
1918972e18aSSreevani Sreejithchange to the following key data structure defined in the `bpf.h
1928972e18aSSreevani Sreejith<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_
1938972e18aSSreevani Sreejithfile.
1948972e18aSSreevani Sreejith
1958972e18aSSreevani Sreejith::
1968972e18aSSreevani Sreejith
1978972e18aSSreevani Sreejith  struct bpf_iter_reg {
1988972e18aSSreevani Sreejith            const char *target;
1998972e18aSSreevani Sreejith            bpf_iter_attach_target_t attach_target;
2008972e18aSSreevani Sreejith            bpf_iter_detach_target_t detach_target;
2018972e18aSSreevani Sreejith            bpf_iter_show_fdinfo_t show_fdinfo;
2028972e18aSSreevani Sreejith            bpf_iter_fill_link_info_t fill_link_info;
2038972e18aSSreevani Sreejith            bpf_iter_get_func_proto_t get_func_proto;
2048972e18aSSreevani Sreejith            u32 ctx_arg_info_size;
2058972e18aSSreevani Sreejith            u32 feature;
2068972e18aSSreevani Sreejith            struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
2078972e18aSSreevani Sreejith            const struct bpf_iter_seq_info *seq_info;
2088972e18aSSreevani Sreejith  };
2098972e18aSSreevani Sreejith
2108972e18aSSreevani SreejithAfter filling the data structure fields, call ``bpf_iter_reg_target()`` to
2118972e18aSSreevani Sreejithregister the iterator to the main BPF iterator subsystem.
2128972e18aSSreevani Sreejith
2138972e18aSSreevani SreejithThe following is the breakdown for each field in struct ``bpf_iter_reg``.
2148972e18aSSreevani Sreejith
2158972e18aSSreevani Sreejith.. list-table::
2168972e18aSSreevani Sreejith   :widths: 25 50
2178972e18aSSreevani Sreejith   :header-rows: 1
2188972e18aSSreevani Sreejith
2198972e18aSSreevani Sreejith   * - Fields
2208972e18aSSreevani Sreejith     - Description
2218972e18aSSreevani Sreejith   * - target
2228972e18aSSreevani Sreejith     - Specifies the name of the BPF iterator. For example: ``bpf_map``,
2238972e18aSSreevani Sreejith       ``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel.
2248972e18aSSreevani Sreejith   * - attach_target and detach_target
2258972e18aSSreevani Sreejith     - Allows for target specific ``link_create`` action since some targets
2268972e18aSSreevani Sreejith       may need special processing. Called during the user space link_create stage.
2278972e18aSSreevani Sreejith   * - show_fdinfo and fill_link_info
2288972e18aSSreevani Sreejith     - Called to fill target specific information when user tries to get link
2298972e18aSSreevani Sreejith       info associated with the iterator.
2308972e18aSSreevani Sreejith   * - get_func_proto
2318972e18aSSreevani Sreejith     - Permits a BPF iterator to access BPF helpers specific to the iterator.
2328972e18aSSreevani Sreejith   * - ctx_arg_info_size and ctx_arg_info
2338972e18aSSreevani Sreejith     - Specifies the verifier states for BPF program arguments associated with
2348972e18aSSreevani Sreejith       the bpf iterator.
2358972e18aSSreevani Sreejith   * - feature
2368972e18aSSreevani Sreejith     - Specifies certain action requests in the kernel BPF iterator
2378972e18aSSreevani Sreejith       infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
2388972e18aSSreevani Sreejith       that the kernel function cond_resched() is called to avoid other kernel
2398972e18aSSreevani Sreejith       subsystem (e.g., rcu) misbehaving.
2408972e18aSSreevani Sreejith   * - seq_info
241*2404dd01SAnton Protopopov     - Specifies the set of seq operations for the BPF iterator and helpers to
242*2404dd01SAnton Protopopov       initialize/free the private data for the corresponding ``seq_file``.
2438972e18aSSreevani Sreejith
2448972e18aSSreevani Sreejith`Click here
2458972e18aSSreevani Sreejith<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_
2468972e18aSSreevani Sreejithto see an implementation of the ``task_vma`` BPF iterator in the kernel.
2478972e18aSSreevani Sreejith
2488972e18aSSreevani Sreejith---------------------------------
2498972e18aSSreevani SreejithParameterizing BPF Task Iterators
2508972e18aSSreevani Sreejith---------------------------------
2518972e18aSSreevani Sreejith
2528972e18aSSreevani SreejithBy default, BPF iterators walk through all the objects of the specified types
2538972e18aSSreevani Sreejith(processes, cgroups, maps, etc.) across the entire system to read relevant
2548972e18aSSreevani Sreejithkernel data. But often, there are cases where we only care about a much smaller
2558972e18aSSreevani Sreejithsubset of iterable kernel objects, such as only iterating tasks within a
2568972e18aSSreevani Sreejithspecific process. Therefore, BPF iterator programs support filtering out objects
2578972e18aSSreevani Sreejithfrom iteration by allowing user space to configure the iterator program when it
2588972e18aSSreevani Sreejithis attached.
2598972e18aSSreevani Sreejith
2608972e18aSSreevani Sreejith--------------------------
2618972e18aSSreevani SreejithBPF Task Iterator Program
2628972e18aSSreevani Sreejith--------------------------
2638972e18aSSreevani Sreejith
2648972e18aSSreevani SreejithThe following code is a BPF iterator program to print files and task information
2658972e18aSSreevani Sreejiththrough the ``seq_file`` of the iterator. It is a standard BPF iterator program
2668972e18aSSreevani Sreejiththat visits every file of an iterator. We will use this BPF program in our
2678972e18aSSreevani Sreejithexample later.
2688972e18aSSreevani Sreejith
2698972e18aSSreevani Sreejith::
2708972e18aSSreevani Sreejith
2718972e18aSSreevani Sreejith  #include <vmlinux.h>
2728972e18aSSreevani Sreejith  #include <bpf/bpf_helpers.h>
2738972e18aSSreevani Sreejith
2748972e18aSSreevani Sreejith  char _license[] SEC("license") = "GPL";
2758972e18aSSreevani Sreejith
2768972e18aSSreevani Sreejith  SEC("iter/task_file")
2778972e18aSSreevani Sreejith  int dump_task_file(struct bpf_iter__task_file *ctx)
2788972e18aSSreevani Sreejith  {
2798972e18aSSreevani Sreejith        struct seq_file *seq = ctx->meta->seq;
2808972e18aSSreevani Sreejith        struct task_struct *task = ctx->task;
2818972e18aSSreevani Sreejith        struct file *file = ctx->file;
2828972e18aSSreevani Sreejith        __u32 fd = ctx->fd;
2838972e18aSSreevani Sreejith        if (task == NULL || file == NULL)
2848972e18aSSreevani Sreejith                return 0;
2858972e18aSSreevani Sreejith        if (ctx->meta->seq_num == 0) {
2868972e18aSSreevani Sreejith                BPF_SEQ_PRINTF(seq, "    tgid      pid       fd      file\n");
2878972e18aSSreevani Sreejith        }
2888972e18aSSreevani Sreejith        BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
2898972e18aSSreevani Sreejith                        (long)file->f_op);
2908972e18aSSreevani Sreejith        return 0;
2918972e18aSSreevani Sreejith  }
2928972e18aSSreevani Sreejith
2938972e18aSSreevani Sreejith----------------------------------------
2948972e18aSSreevani SreejithCreating a File Iterator with Parameters
2958972e18aSSreevani Sreejith----------------------------------------
2968972e18aSSreevani Sreejith
2978972e18aSSreevani SreejithNow, let us look at how to create an iterator that includes only files of a
2988972e18aSSreevani Sreejithprocess.
2998972e18aSSreevani Sreejith
3008972e18aSSreevani SreejithFirst,  fill the ``bpf_iter_attach_opts`` struct as shown below:
3018972e18aSSreevani Sreejith
3028972e18aSSreevani Sreejith::
3038972e18aSSreevani Sreejith
3048972e18aSSreevani Sreejith  LIBBPF_OPTS(bpf_iter_attach_opts, opts);
3058972e18aSSreevani Sreejith  union bpf_iter_link_info linfo;
3068972e18aSSreevani Sreejith  memset(&linfo, 0, sizeof(linfo));
3078972e18aSSreevani Sreejith  linfo.task.pid = getpid();
3088972e18aSSreevani Sreejith  opts.link_info = &linfo;
3098972e18aSSreevani Sreejith  opts.link_info_len = sizeof(linfo);
3108972e18aSSreevani Sreejith
3118972e18aSSreevani Sreejith``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator
3128972e18aSSreevani Sreejiththat only includes opened files for the process with the specified ``pid``. In
3138972e18aSSreevani Sreejiththis example, we will only be iterating files for our process. If
3148972e18aSSreevani Sreejith``linfo.task.pid`` is zero, the iterator will visit every opened file of every
3158972e18aSSreevani Sreejithprocess. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator
3168972e18aSSreevani Sreejiththat visits opened files of a specific thread, not a process. In this example,
3178972e18aSSreevani Sreejith``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a
3188972e18aSSreevani Sreejithseparate file descriptor table. In most circumstances, all process threads share
3198972e18aSSreevani Sreejitha single file descriptor table.
3208972e18aSSreevani Sreejith
3218972e18aSSreevani SreejithNow, in the userspace program, pass the pointer of struct to the
3228972e18aSSreevani Sreejith``bpf_program__attach_iter()``.
3238972e18aSSreevani Sreejith
3248972e18aSSreevani Sreejith::
3258972e18aSSreevani Sreejith
3268972e18aSSreevani Sreejith  link = bpf_program__attach_iter(prog, &opts); iter_fd =
3278972e18aSSreevani Sreejith  bpf_iter_create(bpf_link__fd(link));
3288972e18aSSreevani Sreejith
3298972e18aSSreevani SreejithIf both *tid* and *pid* are zero, an iterator created from this struct
3308972e18aSSreevani Sreejith``bpf_iter_attach_opts`` will include every opened file of every task in the
3318972e18aSSreevani Sreejithsystem (in the namespace, actually.) It is the same as passing a NULL as the
3328972e18aSSreevani Sreejithsecond argument to ``bpf_program__attach_iter()``.
3338972e18aSSreevani Sreejith
3348972e18aSSreevani SreejithThe whole program looks like the following code:
3358972e18aSSreevani Sreejith
3368972e18aSSreevani Sreejith::
3378972e18aSSreevani Sreejith
3388972e18aSSreevani Sreejith  #include <stdio.h>
3398972e18aSSreevani Sreejith  #include <unistd.h>
3408972e18aSSreevani Sreejith  #include <bpf/bpf.h>
3418972e18aSSreevani Sreejith  #include <bpf/libbpf.h>
3428972e18aSSreevani Sreejith  #include "bpf_iter_task_ex.skel.h"
3438972e18aSSreevani Sreejith
3448972e18aSSreevani Sreejith  static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
3458972e18aSSreevani Sreejith  {
3468972e18aSSreevani Sreejith        struct bpf_link *link;
3478972e18aSSreevani Sreejith        char buf[16] = {};
3488972e18aSSreevani Sreejith        int iter_fd = -1, len;
3498972e18aSSreevani Sreejith        int ret = 0;
3508972e18aSSreevani Sreejith
3518972e18aSSreevani Sreejith        link = bpf_program__attach_iter(prog, opts);
3528972e18aSSreevani Sreejith        if (!link) {
3538972e18aSSreevani Sreejith                fprintf(stderr, "bpf_program__attach_iter() fails\n");
3548972e18aSSreevani Sreejith                return -1;
3558972e18aSSreevani Sreejith        }
3568972e18aSSreevani Sreejith        iter_fd = bpf_iter_create(bpf_link__fd(link));
3578972e18aSSreevani Sreejith        if (iter_fd < 0) {
3588972e18aSSreevani Sreejith                fprintf(stderr, "bpf_iter_create() fails\n");
3598972e18aSSreevani Sreejith                ret = -1;
3608972e18aSSreevani Sreejith                goto free_link;
3618972e18aSSreevani Sreejith        }
3628972e18aSSreevani Sreejith        /* not check contents, but ensure read() ends without error */
3638972e18aSSreevani Sreejith        while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) {
3648972e18aSSreevani Sreejith                buf[len] = 0;
3658972e18aSSreevani Sreejith                printf("%s", buf);
3668972e18aSSreevani Sreejith        }
3678972e18aSSreevani Sreejith        printf("\n");
3688972e18aSSreevani Sreejith  free_link:
3698972e18aSSreevani Sreejith        if (iter_fd >= 0)
3708972e18aSSreevani Sreejith                close(iter_fd);
3718972e18aSSreevani Sreejith        bpf_link__destroy(link);
3728972e18aSSreevani Sreejith        return 0;
3738972e18aSSreevani Sreejith  }
3748972e18aSSreevani Sreejith
3758972e18aSSreevani Sreejith  static void test_task_file(void)
3768972e18aSSreevani Sreejith  {
3778972e18aSSreevani Sreejith        LIBBPF_OPTS(bpf_iter_attach_opts, opts);
3788972e18aSSreevani Sreejith        struct bpf_iter_task_ex *skel;
3798972e18aSSreevani Sreejith        union bpf_iter_link_info linfo;
3808972e18aSSreevani Sreejith        skel = bpf_iter_task_ex__open_and_load();
3818972e18aSSreevani Sreejith        if (skel == NULL)
3828972e18aSSreevani Sreejith                return;
3838972e18aSSreevani Sreejith        memset(&linfo, 0, sizeof(linfo));
3848972e18aSSreevani Sreejith        linfo.task.pid = getpid();
3858972e18aSSreevani Sreejith        opts.link_info = &linfo;
3868972e18aSSreevani Sreejith        opts.link_info_len = sizeof(linfo);
3878972e18aSSreevani Sreejith        printf("PID %d\n", getpid());
3888972e18aSSreevani Sreejith        do_read_opts(skel->progs.dump_task_file, &opts);
3898972e18aSSreevani Sreejith        bpf_iter_task_ex__destroy(skel);
3908972e18aSSreevani Sreejith  }
3918972e18aSSreevani Sreejith
3928972e18aSSreevani Sreejith  int main(int argc, const char * const * argv)
3938972e18aSSreevani Sreejith  {
3948972e18aSSreevani Sreejith        test_task_file();
3958972e18aSSreevani Sreejith        return 0;
3968972e18aSSreevani Sreejith  }
3978972e18aSSreevani Sreejith
3988972e18aSSreevani SreejithThe following lines are the output of the program.
3998972e18aSSreevani Sreejith::
4008972e18aSSreevani Sreejith
4018972e18aSSreevani Sreejith  PID 1859
4028972e18aSSreevani Sreejith
4038972e18aSSreevani Sreejith     tgid      pid       fd      file
4048972e18aSSreevani Sreejith     1859     1859        0 ffffffff82270aa0
4058972e18aSSreevani Sreejith     1859     1859        1 ffffffff82270aa0
4068972e18aSSreevani Sreejith     1859     1859        2 ffffffff82270aa0
4078972e18aSSreevani Sreejith     1859     1859        3 ffffffff82272980
4088972e18aSSreevani Sreejith     1859     1859        4 ffffffff8225e120
4098972e18aSSreevani Sreejith     1859     1859        5 ffffffff82255120
4108972e18aSSreevani Sreejith     1859     1859        6 ffffffff82254f00
4118972e18aSSreevani Sreejith     1859     1859        7 ffffffff82254d80
4128972e18aSSreevani Sreejith     1859     1859        8 ffffffff8225abe0
4138972e18aSSreevani Sreejith
4148972e18aSSreevani Sreejith------------------
4158972e18aSSreevani SreejithWithout Parameters
4168972e18aSSreevani Sreejith------------------
4178972e18aSSreevani Sreejith
4188972e18aSSreevani SreejithLet us look at how a BPF iterator without parameters skips files of other
4198972e18aSSreevani Sreejithprocesses in the system. In this case, the BPF program has to check the pid or
4208972e18aSSreevani Sreejiththe tid of tasks, or it will receive every opened file in the system (in the
4218972e18aSSreevani Sreejithcurrent *pid* namespace, actually). So, we usually add a global variable in the
4228972e18aSSreevani SreejithBPF program to pass a *pid* to the BPF program.
4238972e18aSSreevani Sreejith
4248972e18aSSreevani SreejithThe BPF program would look like the following block.
4258972e18aSSreevani Sreejith
4268972e18aSSreevani Sreejith  ::
4278972e18aSSreevani Sreejith
4288972e18aSSreevani Sreejith    ......
4298972e18aSSreevani Sreejith    int target_pid = 0;
4308972e18aSSreevani Sreejith
4318972e18aSSreevani Sreejith    SEC("iter/task_file")
4328972e18aSSreevani Sreejith    int dump_task_file(struct bpf_iter__task_file *ctx)
4338972e18aSSreevani Sreejith    {
4348972e18aSSreevani Sreejith          ......
4358972e18aSSreevani Sreejith          if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */
4368972e18aSSreevani Sreejith                  return 0;
4378972e18aSSreevani Sreejith          BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
4388972e18aSSreevani Sreejith                          (long)file->f_op);
4398972e18aSSreevani Sreejith          return 0;
4408972e18aSSreevani Sreejith    }
4418972e18aSSreevani Sreejith
4428972e18aSSreevani SreejithThe user space program would look like the following block:
4438972e18aSSreevani Sreejith
4448972e18aSSreevani Sreejith  ::
4458972e18aSSreevani Sreejith
4468972e18aSSreevani Sreejith    ......
4478972e18aSSreevani Sreejith    static void test_task_file(void)
4488972e18aSSreevani Sreejith    {
4498972e18aSSreevani Sreejith          ......
4508972e18aSSreevani Sreejith          skel = bpf_iter_task_ex__open_and_load();
4518972e18aSSreevani Sreejith          if (skel == NULL)
4528972e18aSSreevani Sreejith                  return;
4538972e18aSSreevani Sreejith          skel->bss->target_pid = getpid(); /* process ID.  For thread id, use gettid() */
4548972e18aSSreevani Sreejith          memset(&linfo, 0, sizeof(linfo));
4558972e18aSSreevani Sreejith          linfo.task.pid = getpid();
4568972e18aSSreevani Sreejith          opts.link_info = &linfo;
4578972e18aSSreevani Sreejith          opts.link_info_len = sizeof(linfo);
4588972e18aSSreevani Sreejith          ......
4598972e18aSSreevani Sreejith    }
4608972e18aSSreevani Sreejith
4618972e18aSSreevani Sreejith``target_pid`` is a global variable in the BPF program. The user space program
4628972e18aSSreevani Sreejithshould initialize the variable with a process ID to skip opened files of other
4638972e18aSSreevani Sreejithprocesses in the BPF program. When you parametrize a BPF iterator, the iterator
4648972e18aSSreevani Sreejithcalls the BPF program fewer times which can save significant resources.
4658972e18aSSreevani Sreejith
4668972e18aSSreevani Sreejith---------------------------
4678972e18aSSreevani SreejithParametrizing VMA Iterators
4688972e18aSSreevani Sreejith---------------------------
4698972e18aSSreevani Sreejith
4708972e18aSSreevani SreejithBy default, a BPF VMA iterator includes every VMA in every process.  However,
4718972e18aSSreevani Sreejithyou can still specify a process or a thread to include only its VMAs. Unlike
4728972e18aSSreevani Sreejithfiles, a thread can not have a separate address space (since Linux 2.6.0-test6).
4738972e18aSSreevani SreejithHere, using *tid* makes no difference from using *pid*.
4748972e18aSSreevani Sreejith
4758972e18aSSreevani Sreejith----------------------------
4768972e18aSSreevani SreejithParametrizing Task Iterators
4778972e18aSSreevani Sreejith----------------------------
4788972e18aSSreevani Sreejith
4798972e18aSSreevani SreejithA BPF task iterator with *pid* includes all tasks (threads) of a process. The
4808972e18aSSreevani SreejithBPF program receives these tasks one after another. You can specify a BPF task
4818972e18aSSreevani Sreejithiterator with *tid* parameter to include only the tasks that match the given
4828972e18aSSreevani Sreejith*tid*.
483