1.. SPDX-License-Identifier: GPL-2.0 2 3========================== 4Fprobe-based Event Tracing 5========================== 6 7.. Author: Masami Hiramatsu <mhiramat@kernel.org> 8 9Overview 10-------- 11 12Fprobe event is similar to the kprobe event, but limited to probe on 13the function entry and exit only. It is good enough for many use cases 14which only traces some specific functions. 15 16This document also covers tracepoint probe events (tprobe) since this 17is also works only on the tracepoint entry. User can trace a part of 18tracepoint argument, or the tracepoint without trace-event, which is 19not exposed on tracefs. 20 21As same as other dynamic events, fprobe events and tracepoint probe 22events are defined via `dynamic_events` interface file on tracefs. 23 24Synopsis of fprobe-events 25------------------------- 26:: 27 28 f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry 29 f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit 30 t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint 31 32 GRP1 : Group name for fprobe. If omitted, use "fprobes" for it. 33 GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it. 34 EVENT1 : Event name for fprobe. If omitted, the event name is 35 "SYM__entry" or "SYM__exit". 36 EVENT2 : Event name for tprobe. If omitted, the event name is 37 the same as "TRACEPOINT", but if the "TRACEPOINT" starts 38 with a digit character, "_TRACEPOINT" is used. 39 MAXACTIVE : Maximum number of instances of the specified function that 40 can be probed simultaneously, or 0 for the default value 41 as defined in Documentation/trace/fprobe.rst 42 43 FETCHARGS : Arguments. Each probe can have up to 128 args. 44 ARG : Fetch "ARG" function argument using BTF (only for function 45 entry or tracepoint.) (\*1) 46 @ADDR : Fetch memory at ADDR (ADDR should be in kernel) 47 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 48 $stackN : Fetch Nth entry of stack (N >= 0) 49 $stack : Fetch stack address. 50 $argN : Fetch the Nth function argument. (N >= 1) (\*2) 51 $retval : Fetch return value.(\*3) 52 $comm : Fetch current task comm. 53 +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5) 54 \IMM : Store an immediate value to the argument. 55 NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 56 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 57 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types 58 (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr" 59 and bitfield are supported. 60 61 (\*1) This is available only when BTF is enabled. 62 (\*2) only for the probe on function entry (offs == 0). Note, this argument access 63 is best effort, because depending on the argument type, it may be passed on 64 the stack. But this only support the arguments via registers. 65 (\*3) only for return probe. Note that this is also best effort. Depending on the 66 return value type, it might be passed via a pair of registers. But this only 67 accesses one register. 68 (\*4) this is useful for fetching a field of data structures. 69 (\*5) "u" means user-space dereference. 70 71For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`. 72 73Function arguments at exit 74-------------------------- 75Function arguments can be accessed at exit probe using $arg<N> fetcharg. This 76is useful to record the function parameter and return value at once, and 77trace the difference of structure fields (for debugging a function whether it 78correctly updates the given data structure or not) 79See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works. 80 81BTF arguments 82------------- 83BTF (BPF Type Format) argument allows user to trace function and tracepoint 84parameters by its name instead of ``$argN``. This feature is available if the 85kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF. 86If user only specify the BTF argument, the event's argument name is also 87automatically set by the given name. :: 88 89 # echo 'f:myprobe vfs_read count pos' >> dynamic_events 90 # cat dynamic_events 91 f:fprobes/myprobe vfs_read count=count pos=pos 92 93It also chooses the fetch type from BTF information. For example, in the above 94example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, 95both are converted to 64bit unsigned long, but only ``pos`` has "%Lx" 96print-format as below :: 97 98 # cat events/fprobes/myprobe/format 99 name: myprobe 100 ID: 1313 101 format: 102 field:unsigned short common_type; offset:0; size:2; signed:0; 103 field:unsigned char common_flags; offset:2; size:1; signed:0; 104 field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 105 field:int common_pid; offset:4; size:4; signed:1; 106 107 field:unsigned long __probe_ip; offset:8; size:8; signed:0; 108 field:u64 count; offset:16; size:8; signed:0; 109 field:u64 pos; offset:24; size:8; signed:0; 110 111 print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos 112 113If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*`` 114is expanded to all function arguments of the function or the tracepoint. :: 115 116 # echo 'f:myprobe vfs_read $arg*' >> dynamic_events 117 # cat dynamic_events 118 f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos 119 120BTF also affects the ``$retval``. If user doesn't set any type, the retval 121type is automatically picked from the BTF. If the function returns ``void``, 122``$retval`` is rejected. 123 124You can access the data fields of a data structure using allow operator ``->`` 125(for pointer type) and dot operator ``.`` (for data structure type.):: 126 127# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events 128 129The field access operators, ``->`` and ``.`` can be combined for accessing deeper 130members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux`` 131If there is non-name union member, you can directly access it as the C code does. 132For example:: 133 134 struct { 135 union { 136 int a; 137 int b; 138 }; 139 } *foo; 140 141To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case. 142 143This data field access is available for the return value via ``$retval``, 144e.g. ``$retval->name``. 145 146For these BTF arguments and fields, ``:string`` and ``:ustring`` change the 147behavior. If these are used for BTF argument or field, it checks whether 148the BTF type of the argument or the data field is ``char *`` or ``char []``, 149or not. If not, it rejects applying the string types. Also, with the BTF 150support, you don't need a memory dereference operator (``+0(PTR)``) for 151accessing the string pointed by a ``PTR``. It automatically adds the memory 152dereference operator according to the BTF type. e.g. :: 153 154# echo 't sched_switch prev->comm:string' >> dynamic_events 155# echo 'f getname_flags%return $retval->name:string' >> dynamic_events 156 157The ``prev->comm`` is an embedded char array in the data structure, and 158``$retval->name`` is a char pointer in the data structure. But in both 159cases, you can use ``:string`` type to get the string. 160 161 162Usage examples 163-------------- 164Here is an example to add fprobe events on ``vfs_read()`` function entry 165and exit, with BTF arguments. 166:: 167 168 # echo 'f vfs_read $arg*' >> dynamic_events 169 # echo 'f vfs_read%return $retval' >> dynamic_events 170 # cat dynamic_events 171 f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos 172 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval 173 # echo 1 > events/fprobes/enable 174 # head -n 20 trace | tail 175 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 176 # | | | ||||| | | 177 sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 178 sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 179 sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 180 sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 181 sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08 182 sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 183 sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 184 sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 185 186You can see all function arguments and return values are recorded as signed int. 187 188Also, here is an example of tracepoint events on ``sched_switch`` tracepoint. 189To compare the result, this also enables the ``sched_switch`` traceevent too. 190:: 191 192 # echo 't sched_switch $arg*' >> dynamic_events 193 # echo 1 > events/sched/sched_switch/enable 194 # echo 1 > events/tracepoints/sched_switch/enable 195 # echo > trace 196 # head -n 20 trace | tail 197 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 198 # | | | ||||| | | 199 sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 200 sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1 201 <idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 202 <idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 203 rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120 204 rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026 205 <idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 206 <idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 207 208As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on 209the other hand, the ``sched_switch`` tracepoint probe event shows *raw* 210parameters. This means you can access any field values in the task 211structure pointed by the ``prev`` and ``next`` arguments. 212 213For example, usually ``task_struct::start_time`` is not traced, but with this 214traceprobe event, you can trace that field as below. 215:: 216 217 # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events 218 # head -n 20 trace | tail 219 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 220 # | | | ||||| | | 221 sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 222 rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526 223 sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 224 <idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 225 rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 226 <idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 227 kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 228 <idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 229 230.. _fprobetrace_exit_args_sample: 231 232The return probe allows us to access the results of some functions, which returns 233the error code and its results are passed via function parameter, such as an 234structure-initialization function. 235 236For example, vfs_open() will link the file structure to the inode and update 237mode. You can trace that changes with return probe. 238:: 239 240 # echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events 241 # echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events 242 # echo 1 > events/fprobes/enable 243 # cat trace 244 sh-131 [006] ...1. 1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0 245 sh-131 [006] ...1. 1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168 246 cat-143 [007] ...1. 1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0 247 cat-143 [007] ...1. 1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28 248 cat-143 [007] ...1. 1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0 249 cat-143 [007] ...1. 1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8 250 251You can see the `file::f_mode` and `file::f_inode` are updated in `vfs_open()`. 252