xref: /linux/Documentation/trace/eprobetrace.rst (revision d6f38c12396397e48092ad9e8a4d7be4de51b942)
1*623526baSSteven Rostedt.. SPDX-License-Identifier: GPL-2.0
2*623526baSSteven Rostedt
3*623526baSSteven Rostedt==================================
4*623526baSSteven RostedtEprobe - Event-based Probe Tracing
5*623526baSSteven Rostedt==================================
6*623526baSSteven Rostedt
7*623526baSSteven Rostedt:Author: Steven Rostedt <rostedt@goodmis.org>
8*623526baSSteven Rostedt
9*623526baSSteven Rostedt- Written for v6.17
10*623526baSSteven Rostedt
11*623526baSSteven RostedtOverview
12*623526baSSteven Rostedt========
13*623526baSSteven Rostedt
14*623526baSSteven RostedtEprobes are dynamic events that are placed on existing events to either
15*623526baSSteven Rostedtdereference a field that is a pointer, or simply to limit what fields are
16*623526baSSteven Rostedtrecorded in the trace event.
17*623526baSSteven Rostedt
18*623526baSSteven RostedtEprobes depend on kprobe events so to enable this feature, build your kernel
19*623526baSSteven Rostedtwith CONFIG_EPROBE_EVENTS=y.
20*623526baSSteven Rostedt
21*623526baSSteven RostedtEprobes are created via the /sys/kernel/tracing/dynamic_events file.
22*623526baSSteven Rostedt
23*623526baSSteven RostedtSynopsis of eprobe_events
24*623526baSSteven Rostedt-------------------------
25*623526baSSteven Rostedt::
26*623526baSSteven Rostedt
27*623526baSSteven Rostedt  e[:[EGRP/][EEVENT]] GRP.EVENT [FETCHARGS]	: Set a probe
28*623526baSSteven Rostedt  -:[EGRP/][EEVENT]				: Clear a probe
29*623526baSSteven Rostedt
30*623526baSSteven Rostedt EGRP		: Group name of the new event. If omitted, use "eprobes" for it.
31*623526baSSteven Rostedt EEVENT		: Event name. If omitted, the event name is generated and will
32*623526baSSteven Rostedt		  be the same event name as the event it attached to.
33*623526baSSteven Rostedt GRP		: Group name of the event to attach to.
34*623526baSSteven Rostedt EVENT		: Event name of the event to attach to.
35*623526baSSteven Rostedt
36*623526baSSteven Rostedt FETCHARGS	: Arguments. Each probe can have up to 128 args.
37*623526baSSteven Rostedt  $FIELD	: Fetch the value of the event field called FIELD.
38*623526baSSteven Rostedt  @ADDR		: Fetch memory at ADDR (ADDR should be in kernel)
39*623526baSSteven Rostedt  @SYM[+|-offs]	: Fetch memory at SYM +|- offs (SYM should be a data symbol)
40*623526baSSteven Rostedt  $comm		: Fetch current task comm.
41*623526baSSteven Rostedt  +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
42*623526baSSteven Rostedt  \IMM		: Store an immediate value to the argument.
43*623526baSSteven Rostedt  NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
44*623526baSSteven Rostedt  FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
45*623526baSSteven Rostedt		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
46*623526baSSteven Rostedt		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
47*623526baSSteven Rostedt                  "string", "ustring", "symbol", "symstr" and "bitfield" are
48*623526baSSteven Rostedt                  supported.
49*623526baSSteven Rostedt
50*623526baSSteven RostedtTypes
51*623526baSSteven Rostedt-----
52*623526baSSteven RostedtThe FETCHARGS above is very similar to the kprobe events as described in
53*623526baSSteven RostedtDocumentation/trace/kprobetrace.rst.
54*623526baSSteven Rostedt
55*623526baSSteven RostedtThe difference between eprobes and kprobes FETCHARGS is that eprobes has a
56*623526baSSteven Rostedt$FIELD command that returns the content of the event field of the event
57*623526baSSteven Rostedtthat is attached. Eprobes do not have access to registers, stacks and function
58*623526baSSteven Rostedtarguments that kprobes has.
59*623526baSSteven Rostedt
60*623526baSSteven RostedtIf a field argument is a pointer, it may be dereferenced just like a memory
61*623526baSSteven Rostedtaddress using the FETCHARGS syntax.
62*623526baSSteven Rostedt
63*623526baSSteven Rostedt
64*623526baSSteven RostedtAttaching to dynamic events
65*623526baSSteven Rostedt---------------------------
66*623526baSSteven Rostedt
67*623526baSSteven RostedtEprobes may attach to dynamic events as well as to normal events. It may
68*623526baSSteven Rostedtattach to a kprobe event, a synthetic event or a fprobe event. This is useful
69*623526baSSteven Rostedtif the type of a field needs to be changed. See Example 2 below.
70*623526baSSteven Rostedt
71*623526baSSteven RostedtUsage examples
72*623526baSSteven Rostedt==============
73*623526baSSteven Rostedt
74*623526baSSteven RostedtExample 1
75*623526baSSteven Rostedt---------
76*623526baSSteven Rostedt
77*623526baSSteven RostedtThe basic usage of eprobes is to limit the data that is being recorded into
78*623526baSSteven Rostedtthe tracing buffer. For example, a common event to trace is the sched_switch
79*623526baSSteven Rostedttrace event. That has a format of::
80*623526baSSteven Rostedt
81*623526baSSteven Rostedt	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
82*623526baSSteven Rostedt	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
83*623526baSSteven Rostedt	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
84*623526baSSteven Rostedt	field:int common_pid;	offset:4;	size:4;	signed:1;
85*623526baSSteven Rostedt
86*623526baSSteven Rostedt	field:char prev_comm[16];	offset:8;	size:16;	signed:0;
87*623526baSSteven Rostedt	field:pid_t prev_pid;	offset:24;	size:4;	signed:1;
88*623526baSSteven Rostedt	field:int prev_prio;	offset:28;	size:4;	signed:1;
89*623526baSSteven Rostedt	field:long prev_state;	offset:32;	size:8;	signed:1;
90*623526baSSteven Rostedt	field:char next_comm[16];	offset:40;	size:16;	signed:0;
91*623526baSSteven Rostedt	field:pid_t next_pid;	offset:56;	size:4;	signed:1;
92*623526baSSteven Rostedt	field:int next_prio;	offset:60;	size:4;	signed:1;
93*623526baSSteven Rostedt
94*623526baSSteven RostedtThe first four fields are common to all events and can not be limited. But the
95*623526baSSteven Rostedtrest of the event has 60 bytes of information. It records the names of the
96*623526baSSteven Rostedtprevious and next tasks being scheduled out and in, as well as their pids and
97*623526baSSteven Rostedtpriorities. It also records the state of the previous task. If only the pids
98*623526baSSteven Rostedtof the tasks are of interest, why waste the ring buffer with all the other
99*623526baSSteven Rostedtfields?
100*623526baSSteven Rostedt
101*623526baSSteven RostedtAn eprobe can limit what gets recorded. Note, it does not help in performance,
102*623526baSSteven Rostedtas all the fields are recorded in a temporary buffer to process the eprobe.
103*623526baSSteven Rostedt::
104*623526baSSteven Rostedt
105*623526baSSteven Rostedt # echo 'e:sched/switch sched.sched_switch prev=$prev_pid:u32 next=$next_pid:u32' >> /sys/kernel/tracing/dynamic_events
106*623526baSSteven Rostedt # echo 1 > /sys/kernel/tracing/events/sched/switch/enable
107*623526baSSteven Rostedt # cat /sys/kernel/tracing/trace
108*623526baSSteven Rostedt
109*623526baSSteven Rostedt # tracer: nop
110*623526baSSteven Rostedt #
111*623526baSSteven Rostedt # entries-in-buffer/entries-written: 2721/2721   #P:8
112*623526baSSteven Rostedt #
113*623526baSSteven Rostedt #                                _-----=> irqs-off/BH-disabled
114*623526baSSteven Rostedt #                               / _----=> need-resched
115*623526baSSteven Rostedt #                              | / _---=> hardirq/softirq
116*623526baSSteven Rostedt #                              || / _--=> preempt-depth
117*623526baSSteven Rostedt #                              ||| / _-=> migrate-disable
118*623526baSSteven Rostedt #                              |||| /     delay
119*623526baSSteven Rostedt #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
120*623526baSSteven Rostedt #              | |         |   |||||     |         |
121*623526baSSteven Rostedt     sshd-session-1082    [004] d..4.  5041.239906: switch: (sched.sched_switch) prev=1082 next=0
122*623526baSSteven Rostedt             bash-1085    [001] d..4.  5041.240198: switch: (sched.sched_switch) prev=1085 next=141
123*623526baSSteven Rostedt    kworker/u34:5-141     [001] d..4.  5041.240259: switch: (sched.sched_switch) prev=141 next=1085
124*623526baSSteven Rostedt           <idle>-0       [004] d..4.  5041.240354: switch: (sched.sched_switch) prev=0 next=1082
125*623526baSSteven Rostedt             bash-1085    [001] d..4.  5041.240385: switch: (sched.sched_switch) prev=1085 next=141
126*623526baSSteven Rostedt    kworker/u34:5-141     [001] d..4.  5041.240410: switch: (sched.sched_switch) prev=141 next=1085
127*623526baSSteven Rostedt             bash-1085    [001] d..4.  5041.240478: switch: (sched.sched_switch) prev=1085 next=0
128*623526baSSteven Rostedt     sshd-session-1082    [004] d..4.  5041.240526: switch: (sched.sched_switch) prev=1082 next=0
129*623526baSSteven Rostedt           <idle>-0       [001] d..4.  5041.247524: switch: (sched.sched_switch) prev=0 next=90
130*623526baSSteven Rostedt           <idle>-0       [002] d..4.  5041.247545: switch: (sched.sched_switch) prev=0 next=16
131*623526baSSteven Rostedt      kworker/1:1-90      [001] d..4.  5041.247580: switch: (sched.sched_switch) prev=90 next=0
132*623526baSSteven Rostedt        rcu_sched-16      [002] d..4.  5041.247591: switch: (sched.sched_switch) prev=16 next=0
133*623526baSSteven Rostedt           <idle>-0       [002] d..4.  5041.257536: switch: (sched.sched_switch) prev=0 next=16
134*623526baSSteven Rostedt        rcu_sched-16      [002] d..4.  5041.257573: switch: (sched.sched_switch) prev=16 next=0
135*623526baSSteven Rostedt
136*623526baSSteven RostedtNote, without adding the "u32" after the prev_pid and next_pid, the values
137*623526baSSteven Rostedtwould default showing in hexadecimal.
138*623526baSSteven Rostedt
139*623526baSSteven RostedtExample 2
140*623526baSSteven Rostedt---------
141*623526baSSteven Rostedt
142*623526baSSteven RostedtIf a specific system call is to be recorded but the syscalls events are not
143*623526baSSteven Rostedtenabled, the raw_syscalls can still be used (syscalls are system call
144*623526baSSteven Rostedtevents are not normal events, but are created from the raw_syscalls events
145*623526baSSteven Rostedtwithin the kernel). In order to trace the openat system call, one can create
146*623526baSSteven Rostedtan event probe on top of the raw_syscalls event:
147*623526baSSteven Rostedt::
148*623526baSSteven Rostedt
149*623526baSSteven Rostedt # cd /sys/kernel/tracing
150*623526baSSteven Rostedt # cat events/raw_syscalls/sys_enter/format
151*623526baSSteven Rostedt name: sys_enter
152*623526baSSteven Rostedt ID: 395
153*623526baSSteven Rostedt format:
154*623526baSSteven Rostedt	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
155*623526baSSteven Rostedt	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
156*623526baSSteven Rostedt	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
157*623526baSSteven Rostedt	field:int common_pid;	offset:4;	size:4;	signed:1;
158*623526baSSteven Rostedt
159*623526baSSteven Rostedt	field:long id;	offset:8;	size:8;	signed:1;
160*623526baSSteven Rostedt	field:unsigned long args[6];	offset:16;	size:48;	signed:0;
161*623526baSSteven Rostedt
162*623526baSSteven Rostedt print fmt: "NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)", REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3], REC->args[4], REC->args[5]
163*623526baSSteven Rostedt
164*623526baSSteven RostedtFrom the source code, the sys_openat() has:
165*623526baSSteven Rostedt::
166*623526baSSteven Rostedt
167*623526baSSteven Rostedt int sys_openat(int dirfd, const char *path, int flags, mode_t mode)
168*623526baSSteven Rostedt {
169*623526baSSteven Rostedt	return my_syscall4(__NR_openat, dirfd, path, flags, mode);
170*623526baSSteven Rostedt }
171*623526baSSteven Rostedt
172*623526baSSteven RostedtThe path is the second parameter, and that is what is wanted.
173*623526baSSteven Rostedt::
174*623526baSSteven Rostedt
175*623526baSSteven Rostedt # echo 'e:openat raw_syscalls.sys_enter nr=$id filename=+8($args):ustring' >> dynamic_events
176*623526baSSteven Rostedt
177*623526baSSteven RostedtThis is being run on x86_64 where the word size is 8 bytes and the openat
178*623526baSSteven Rostedtsystem call __NR_openat is set at 257.
179*623526baSSteven Rostedt::
180*623526baSSteven Rostedt
181*623526baSSteven Rostedt # echo 'nr == 257' > events/eprobes/openat/filter
182*623526baSSteven Rostedt
183*623526baSSteven RostedtNow enable the event and look at the trace.
184*623526baSSteven Rostedt::
185*623526baSSteven Rostedt
186*623526baSSteven Rostedt # echo 1 > events/eprobes/openat/enable
187*623526baSSteven Rostedt # cat trace
188*623526baSSteven Rostedt
189*623526baSSteven Rostedt # tracer: nop
190*623526baSSteven Rostedt #
191*623526baSSteven Rostedt # entries-in-buffer/entries-written: 4/4   #P:8
192*623526baSSteven Rostedt #
193*623526baSSteven Rostedt #                                _-----=> irqs-off/BH-disabled
194*623526baSSteven Rostedt #                               / _----=> need-resched
195*623526baSSteven Rostedt #                              | / _---=> hardirq/softirq
196*623526baSSteven Rostedt #                              || / _--=> preempt-depth
197*623526baSSteven Rostedt #                              ||| / _-=> migrate-disable
198*623526baSSteven Rostedt #                              |||| /     delay
199*623526baSSteven Rostedt #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
200*623526baSSteven Rostedt #              | |         |   |||||     |         |
201*623526baSSteven Rostedt              cat-1298    [003] ...2.  2060.875970: openat: (raw_syscalls.sys_enter) nr=0x101 filename=(fault)
202*623526baSSteven Rostedt              cat-1298    [003] ...2.  2060.876197: openat: (raw_syscalls.sys_enter) nr=0x101 filename=(fault)
203*623526baSSteven Rostedt              cat-1298    [003] ...2.  2060.879126: openat: (raw_syscalls.sys_enter) nr=0x101 filename=(fault)
204*623526baSSteven Rostedt              cat-1298    [003] ...2.  2060.879639: openat: (raw_syscalls.sys_enter) nr=0x101 filename=(fault)
205*623526baSSteven Rostedt
206*623526baSSteven RostedtThe filename shows "(fault)". This is likely because the filename has not been
207*623526baSSteven Rostedtpulled into memory yet and currently trace events cannot fault in memory that
208*623526baSSteven Rostedtis not present. When an eprobe tries to read memory that has not been faulted
209*623526baSSteven Rostedtin yet, it will show the "(fault)" text.
210*623526baSSteven Rostedt
211*623526baSSteven RostedtTo get around this, as the kernel will likely pull in this filename and make
212*623526baSSteven Rostedtit present, attaching it to a synthetic event that can pass the address of the
213*623526baSSteven Rostedtfilename from the entry of the event to the end of the event, this can be used
214*623526baSSteven Rostedtto show the filename when the system call returns.
215*623526baSSteven Rostedt
216*623526baSSteven RostedtRemove the old eprobe::
217*623526baSSteven Rostedt
218*623526baSSteven Rostedt # echo 1 > events/eprobes/openat/enable
219*623526baSSteven Rostedt # echo '-:openat' >> dynamic_events
220*623526baSSteven Rostedt
221*623526baSSteven RostedtThis time make an eprobe where the address of the filename is saved::
222*623526baSSteven Rostedt
223*623526baSSteven Rostedt # echo 'e:openat_start raw_syscalls.sys_enter nr=$id filename=+8($args):x64' >> dynamic_events
224*623526baSSteven Rostedt
225*623526baSSteven RostedtCreate a synthetic event that passes the address of the filename to the
226*623526baSSteven Rostedtend of the event::
227*623526baSSteven Rostedt
228*623526baSSteven Rostedt # echo 's:filename u64 file' >> dynamic_events
229*623526baSSteven Rostedt # echo 'hist:keys=common_pid:f=filename if nr == 257' > events/eprobes/openat_start/trigger
230*623526baSSteven Rostedt # echo 'hist:keys=common_pid:file=$f:onmatch(eprobes.openat_start).trace(filename,$file) if id == 257' > events/raw_syscalls/sys_exit/trigger
231*623526baSSteven Rostedt
232*623526baSSteven RostedtNow that the address of the filename has been passed to the end of the
233*623526baSSteven Rostedtsystem call, create another eprobe to attach to the exit event to show the
234*623526baSSteven Rostedtstring::
235*623526baSSteven Rostedt
236*623526baSSteven Rostedt # echo 'e:openat synthetic.filename filename=+0($file):ustring' >> dynamic_events
237*623526baSSteven Rostedt # echo 1 > events/eprobes/openat/enable
238*623526baSSteven Rostedt # cat trace
239*623526baSSteven Rostedt
240*623526baSSteven Rostedt # tracer: nop
241*623526baSSteven Rostedt #
242*623526baSSteven Rostedt # entries-in-buffer/entries-written: 4/4   #P:8
243*623526baSSteven Rostedt #
244*623526baSSteven Rostedt #                                _-----=> irqs-off/BH-disabled
245*623526baSSteven Rostedt #                               / _----=> need-resched
246*623526baSSteven Rostedt #                              | / _---=> hardirq/softirq
247*623526baSSteven Rostedt #                              || / _--=> preempt-depth
248*623526baSSteven Rostedt #                              ||| / _-=> migrate-disable
249*623526baSSteven Rostedt #                              |||| /     delay
250*623526baSSteven Rostedt #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
251*623526baSSteven Rostedt #              | |         |   |||||     |         |
252*623526baSSteven Rostedt              cat-1331    [001] ...5.  2944.787977: openat: (synthetic.filename) filename="/etc/ld.so.cache"
253*623526baSSteven Rostedt              cat-1331    [001] ...5.  2944.788480: openat: (synthetic.filename) filename="/lib/x86_64-linux-gnu/libc.so.6"
254*623526baSSteven Rostedt              cat-1331    [001] ...5.  2944.793426: openat: (synthetic.filename) filename="/usr/lib/locale/locale-archive"
255*623526baSSteven Rostedt              cat-1331    [001] ...5.  2944.831362: openat: (synthetic.filename) filename="trace"
256*623526baSSteven Rostedt
257*623526baSSteven RostedtExample 3
258*623526baSSteven Rostedt---------
259*623526baSSteven Rostedt
260*623526baSSteven RostedtIf syscall trace events are available, the above would not need the first
261*623526baSSteven Rostedteprobe, but it would still need the last one::
262*623526baSSteven Rostedt
263*623526baSSteven Rostedt # echo 's:filename u64 file' >> dynamic_events
264*623526baSSteven Rostedt # echo 'hist:keys=common_pid:f=filename' > events/syscalls/sys_enter_openat/trigger
265*623526baSSteven Rostedt # echo 'hist:keys=common_pid:file=$f:onmatch(syscalls.sys_enter_openat).trace(filename,$file)' > events/syscalls/sys_exit_openat/trigger
266*623526baSSteven Rostedt # echo 'e:openat synthetic.filename filename=+0($file):ustring' >> dynamic_events
267*623526baSSteven Rostedt # echo 1 > events/eprobes/openat/enable
268*623526baSSteven Rostedt
269*623526baSSteven RostedtAnd this would produce the same result as Example 2.
270