xref: /linux/Documentation/trace/kprobetrace.rst (revision 70a663205d5085f1d82f7058e9419ff7612e9396)
1263ee775SChangbin Du==========================
2263ee775SChangbin DuKprobe-based Event Tracing
3263ee775SChangbin Du==========================
4263ee775SChangbin Du
5263ee775SChangbin Du:Author: Masami Hiramatsu
6263ee775SChangbin Du
7263ee775SChangbin DuOverview
8263ee775SChangbin Du--------
9776b32b7SYoann CongalThese events are similar to tracepoint-based events. Instead of tracepoints,
10263ee775SChangbin Duthis is based on kprobes (kprobe and kretprobe). So it can probe wherever
11263ee775SChangbin Dukprobes can probe (this means, all functions except those with
12263ee775SChangbin Du__kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL).
13776b32b7SYoann CongalUnlike the tracepoint-based event, this can be added and removed
14263ee775SChangbin Dudynamically, on the fly.
15263ee775SChangbin Du
16263ee775SChangbin DuTo enable this feature, build your kernel with CONFIG_KPROBE_EVENTS=y.
17263ee775SChangbin Du
18776b32b7SYoann CongalSimilar to the event tracer, this doesn't need to be activated via
19263ee775SChangbin Ducurrent_tracer. Instead of that, add probe points via
202abfcd29SRoss Zwisler/sys/kernel/tracing/kprobe_events, and enable it via
212abfcd29SRoss Zwisler/sys/kernel/tracing/events/kprobes/<EVENT>/enable.
22263ee775SChangbin Du
232abfcd29SRoss ZwislerYou can also use /sys/kernel/tracing/dynamic_events instead of
246212dd29SMasami Hiramatsukprobe_events. That interface will provide unified access to other
256212dd29SMasami Hiramatsudynamic events too.
26263ee775SChangbin Du
27263ee775SChangbin DuSynopsis of kprobe_events
28263ee775SChangbin Du-------------------------
29263ee775SChangbin Du::
30263ee775SChangbin Du
3195c104c3SLinyu Yuan  p[:[GRP/][EVENT]] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]	: Set a probe
3295c104c3SLinyu Yuan  r[MAXACTIVE][:[GRP/][EVENT]] [MOD:]SYM[+0] [FETCHARGS]	: Set a return probe
3395c104c3SLinyu Yuan  p[:[GRP/][EVENT]] [MOD:]SYM[+0]%return [FETCHARGS]	: Set a return probe
3495c104c3SLinyu Yuan  -:[GRP/][EVENT]						: Clear a probe
35263ee775SChangbin Du
36263ee775SChangbin Du GRP		: Group name. If omitted, use "kprobes" for it.
37263ee775SChangbin Du EVENT		: Event name. If omitted, the event name is generated
38263ee775SChangbin Du		  based on SYM+offs or MEMADDR.
39263ee775SChangbin Du MOD		: Module name which has given SYM.
40263ee775SChangbin Du SYM[+offs]	: Symbol+offset where the probe is inserted.
41638e476dSMasami Hiramatsu SYM%return	: Return address of the symbol
42263ee775SChangbin Du MEMADDR	: Address where the probe is inserted.
43263ee775SChangbin Du MAXACTIVE	: Maximum number of instances of the specified function that
44263ee775SChangbin Du		  can be probed simultaneously, or 0 for the default value
457f9a2357SMauro Carvalho Chehab		  as defined in Documentation/trace/kprobes.rst section 1.3.1.
46263ee775SChangbin Du
47263ee775SChangbin Du FETCHARGS	: Arguments. Each probe can have up to 128 args.
48263ee775SChangbin Du  %REG		: Fetch register REG
49263ee775SChangbin Du  @ADDR		: Fetch memory at ADDR (ADDR should be in kernel)
50263ee775SChangbin Du  @SYM[+|-offs]	: Fetch memory at SYM +|- offs (SYM should be a data symbol)
51263ee775SChangbin Du  $stackN	: Fetch Nth entry of stack (N >= 0)
52263ee775SChangbin Du  $stack	: Fetch stack address.
53a1303af5SMasami Hiramatsu  $argN		: Fetch the Nth function argument. (N >= 1) (\*1)
54a1303af5SMasami Hiramatsu  $retval	: Fetch return value.(\*2)
55263ee775SChangbin Du  $comm		: Fetch current task comm.
56e65f7ae7SMasami Hiramatsu  +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
576218bf9fSMasami Hiramatsu  \IMM		: Store an immediate value to the argument.
58263ee775SChangbin Du  NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
59263ee775SChangbin Du  FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
60263ee775SChangbin Du		  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
61*5e37460fSYe Bin		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
62*5e37460fSYe Bin                  "string", "ustring", "symbol", "symstr" and bitfield are
63*5e37460fSYe Bin                  supported.
64263ee775SChangbin Du
65b2a86697SMasami Hiramatsu (Google)  (\*1) only for the probe on function entry (offs == 0). Note, this argument access
66b2a86697SMasami Hiramatsu (Google)        is best effort, because depending on the argument type, it may be passed on
67b2a86697SMasami Hiramatsu (Google)        the stack. But this only support the arguments via registers.
68b2a86697SMasami Hiramatsu (Google)  (\*2) only for return probe. Note that this is also best effort. Depending on the
69b2a86697SMasami Hiramatsu (Google)        return value type, it might be passed via a pair of registers. But this only
70b2a86697SMasami Hiramatsu (Google)        accesses one register.
71a1303af5SMasami Hiramatsu  (\*3) this is useful for fetching a field of data structures.
72e65f7ae7SMasami Hiramatsu  (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
73263ee775SChangbin Du
74e8c32f24SMasami Hiramatsu (Google)Function arguments at kretprobe
75e8c32f24SMasami Hiramatsu (Google)-------------------------------
76e8c32f24SMasami Hiramatsu (Google)Function arguments can be accessed at kretprobe using $arg<N> fetcharg. This
77e8c32f24SMasami Hiramatsu (Google)is useful to record the function parameter and return value at once, and
78dd29dfe7SSaurav Shahtrace the difference of structure fields (for debugging a function whether it
79e8c32f24SMasami Hiramatsu (Google)correctly updates the given data structure or not).
80e8c32f24SMasami Hiramatsu (Google)See the :ref:`sample<fprobetrace_exit_args_sample>` in fprobe event for how
81e8c32f24SMasami Hiramatsu (Google)it works.
82e8c32f24SMasami Hiramatsu (Google)
83590e7b28SMasami Hiramatsu (Google).. _kprobetrace_types:
84590e7b28SMasami Hiramatsu (Google)
85263ee775SChangbin DuTypes
86263ee775SChangbin Du-----
87776b32b7SYoann CongalSeveral types are supported for fetchargs. Kprobe tracer will access memory
88263ee775SChangbin Duby given type. Prefix 's' and 'u' means those types are signed and unsigned
89263ee775SChangbin Durespectively. 'x' prefix implies it is unsigned. Traced arguments are shown
90263ee775SChangbin Duin decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
91263ee775SChangbin Duor 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
92263ee775SChangbin Dux86-64 uses x64).
935d18c23cSYoann Congal
9440b53b77SMasami HiramatsuThese value types can be an array. To record array data, you can add '[N]'
9540b53b77SMasami Hiramatsu(where N is a fixed number, less than 64) to the base type.
96776b32b7SYoann CongalE.g. 'x16[4]' means an array of x16 (2-byte hex) with 4 elements.
9740b53b77SMasami HiramatsuNote that the array can be applied to memory type fetchargs, you can not
9840b53b77SMasami Hiramatsuapply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
9940b53b77SMasami Hiramatsuwrong, but '+8($stack):x8[8]' is OK.)
1005d18c23cSYoann Congal
1018478cca1SDonglin PengChar type can be used to show the character value of traced arguments.
1022b79eb73SLinus Torvalds
103263ee775SChangbin DuString type is a special type, which fetches a "null-terminated" string from
104263ee775SChangbin Dukernel space. This means it will fail and store NULL if the string container
10588903c46SMasami Hiramatsuhas been paged out. "ustring" type is an alternative of string for user-space.
106776b32b7SYoann CongalSee :ref:`user_mem_access` for more info.
107776b32b7SYoann Congal
10840b53b77SMasami HiramatsuThe string array type is a bit different from other types. For other base
10940b53b77SMasami Hiramatsutypes, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
11040b53b77SMasami Hiramatsuas +0(%di):x32.) But string[1] is not equal to string. The string type itself
11140b53b77SMasami Hiramatsurepresents "char array", but string array type represents "char * array".
11240b53b77SMasami HiramatsuSo, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
113263ee775SChangbin DuBitfield is another special type, which takes 3 parameters, bit-width, bit-
114263ee775SChangbin Duoffset, and container-size (usually 32). The syntax is::
115263ee775SChangbin Du
116263ee775SChangbin Du b<bit-width>@<bit-offset>/<container-size>
117263ee775SChangbin Du
11860c2e0ceSMasami HiramatsuSymbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
11960c2e0ceSMasami Hiramatsuwhich shows given pointer in "symbol+offset" style.
120b26a124cSMasami Hiramatsu (Google)On the other hand, symbol-string type ('symstr') converts the given address to
121b26a124cSMasami Hiramatsu (Google)"symbol+offset/symbolsize" style and stores it as a null-terminated string.
122b26a124cSMasami Hiramatsu (Google)With 'symstr' type, you can filter the event with wildcard pattern of the
123b26a124cSMasami Hiramatsu (Google)symbols, and you don't need to solve symbol name by yourself.
124263ee775SChangbin DuFor $comm, the default type is "string"; any other type is invalid.
125263ee775SChangbin Du
126*5e37460fSYe BinVFS layer common type(%pd/%pD) is a special type, which fetches dentry's or
127*5e37460fSYe Binfile's name from struct dentry's address or struct file's address.
128*5e37460fSYe Bin
129e65f7ae7SMasami Hiramatsu.. _user_mem_access:
130ff1e81a7SSameer Rahmani
131e65f7ae7SMasami HiramatsuUser Memory Access
132e65f7ae7SMasami Hiramatsu------------------
133e65f7ae7SMasami HiramatsuKprobe events supports user-space memory access. For that purpose, you can use
134e65f7ae7SMasami Hiramatsueither user-space dereference syntax or 'ustring' type.
135e65f7ae7SMasami Hiramatsu
136e65f7ae7SMasami HiramatsuThe user-space dereference syntax allows you to access a field of a data
137e65f7ae7SMasami Hiramatsustructure in user-space. This is done by adding the "u" prefix to the
138e65f7ae7SMasami Hiramatsudereference syntax. For example, +u4(%si) means it will read memory from the
139e65f7ae7SMasami Hiramatsuaddress in the register %si offset by 4, and the memory is expected to be in
140e65f7ae7SMasami Hiramatsuuser-space. You can use this for strings too, e.g. +u0(%si):string will read
141e65f7ae7SMasami Hiramatsua string from the address in the register %si that is expected to be in user-
142e65f7ae7SMasami Hiramatsuspace. 'ustring' is a shortcut way of performing the same task. That is,
143e65f7ae7SMasami Hiramatsu+0(%si):ustring is equivalent to +u0(%si):string.
144e65f7ae7SMasami Hiramatsu
145e65f7ae7SMasami HiramatsuNote that kprobe-event provides the user-memory access syntax but it doesn't
146e65f7ae7SMasami Hiramatsuuse it transparently. This means if you use normal dereference or string type
147776b32b7SYoann Congalfor user memory, it might fail, and may always fail on some architectures. The
148776b32b7SYoann Congaluser has to carefully check if the target data is in kernel or user space.
149263ee775SChangbin Du
150263ee775SChangbin DuPer-Probe Event Filtering
151263ee775SChangbin Du-------------------------
152263ee775SChangbin DuPer-probe event filtering feature allows you to set different filter on each
153263ee775SChangbin Duprobe and gives you what arguments will be shown in trace buffer. If an event
154263ee775SChangbin Duname is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
155263ee775SChangbin Duunder tracing/events/kprobes/<EVENT>, at the directory you can see 'id',
15631130c8eSAndreas Ziegler'enable', 'format', 'filter' and 'trigger'.
157263ee775SChangbin Du
158e50891d6SAndreas Zieglerenable:
159263ee775SChangbin Du  You can enable/disable the probe by writing 1 or 0 on it.
160263ee775SChangbin Du
161263ee775SChangbin Duformat:
162263ee775SChangbin Du  This shows the format of this probe event.
163263ee775SChangbin Du
164263ee775SChangbin Dufilter:
165263ee775SChangbin Du  You can write filtering rules of this event.
166263ee775SChangbin Du
167263ee775SChangbin Duid:
168263ee775SChangbin Du  This shows the id of this probe event.
169263ee775SChangbin Du
17031130c8eSAndreas Zieglertrigger:
17131130c8eSAndreas Ziegler  This allows to install trigger commands which are executed when the event is
17231130c8eSAndreas Ziegler  hit (for details, see Documentation/trace/events.rst, section 6).
173263ee775SChangbin Du
174263ee775SChangbin DuEvent Profiling
175263ee775SChangbin Du---------------
176263ee775SChangbin DuYou can check the total number of probe hits and probe miss-hits via
1772abfcd29SRoss Zwisler/sys/kernel/tracing/kprobe_profile.
178263ee775SChangbin DuThe first column is event name, the second is the number of probe hits,
179263ee775SChangbin Duthe third is the number of probe miss-hits.
180263ee775SChangbin Du
181970988e1SMasami HiramatsuKernel Boot Parameter
182970988e1SMasami Hiramatsu---------------------
183970988e1SMasami HiramatsuYou can add and enable new kprobe events when booting up the kernel by
184970988e1SMasami Hiramatsu"kprobe_event=" parameter. The parameter accepts a semicolon-delimited
185970988e1SMasami Hiramatsukprobe events, which format is similar to the kprobe_events.
186970988e1SMasami HiramatsuThe difference is that the probe definition parameters are comma-delimited
187015b5162SYoann Congalinstead of space. For example, adding myprobe event on do_sys_open like below::
188970988e1SMasami Hiramatsu
189970988e1SMasami Hiramatsu  p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
190970988e1SMasami Hiramatsu
191015b5162SYoann Congalshould be below for kernel boot parameter (just replace spaces with comma)::
192970988e1SMasami Hiramatsu
193970988e1SMasami Hiramatsu  p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)
194970988e1SMasami Hiramatsu
195263ee775SChangbin Du
196263ee775SChangbin DuUsage examples
197263ee775SChangbin Du--------------
198263ee775SChangbin DuTo add a probe as a new event, write a new definition to kprobe_events
199263ee775SChangbin Duas below::
200263ee775SChangbin Du
2012abfcd29SRoss Zwisler  echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/tracing/kprobe_events
202263ee775SChangbin Du
203263ee775SChangbin DuThis sets a kprobe on the top of do_sys_open() function with recording
204263ee775SChangbin Du1st to 4th arguments as "myprobe" event. Note, which register/stack entry is
205263ee775SChangbin Duassigned to each function argument depends on arch-specific ABI. If you unsure
206263ee775SChangbin Duthe ABI, please try to use probe subcommand of perf-tools (you can find it
207263ee775SChangbin Duunder tools/perf/).
208263ee775SChangbin DuAs this example shows, users can choose more familiar names for each arguments.
209263ee775SChangbin Du::
210263ee775SChangbin Du
2112abfcd29SRoss Zwisler  echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/tracing/kprobe_events
212263ee775SChangbin Du
213263ee775SChangbin DuThis sets a kretprobe on the return point of do_sys_open() function with
214263ee775SChangbin Durecording return value as "myretprobe" event.
215263ee775SChangbin DuYou can see the format of these events via
2162abfcd29SRoss Zwisler/sys/kernel/tracing/events/kprobes/<EVENT>/format.
217263ee775SChangbin Du::
218263ee775SChangbin Du
2192abfcd29SRoss Zwisler  cat /sys/kernel/tracing/events/kprobes/myprobe/format
220263ee775SChangbin Du  name: myprobe
221263ee775SChangbin Du  ID: 780
222263ee775SChangbin Du  format:
223263ee775SChangbin Du          field:unsigned short common_type;       offset:0;       size:2; signed:0;
224263ee775SChangbin Du          field:unsigned char common_flags;       offset:2;       size:1; signed:0;
225263ee775SChangbin Du          field:unsigned char common_preempt_count;       offset:3; size:1;signed:0;
226263ee775SChangbin Du          field:int common_pid;   offset:4;       size:4; signed:1;
227263ee775SChangbin Du
228263ee775SChangbin Du          field:unsigned long __probe_ip; offset:12;      size:4; signed:0;
229263ee775SChangbin Du          field:int __probe_nargs;        offset:16;      size:4; signed:1;
230263ee775SChangbin Du          field:unsigned long dfd;        offset:20;      size:4; signed:0;
231263ee775SChangbin Du          field:unsigned long filename;   offset:24;      size:4; signed:0;
232263ee775SChangbin Du          field:unsigned long flags;      offset:28;      size:4; signed:0;
233263ee775SChangbin Du          field:unsigned long mode;       offset:32;      size:4; signed:0;
234263ee775SChangbin Du
235263ee775SChangbin Du
236263ee775SChangbin Du  print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
237263ee775SChangbin Du  REC->dfd, REC->filename, REC->flags, REC->mode
238263ee775SChangbin Du
239263ee775SChangbin DuYou can see that the event has 4 arguments as in the expressions you specified.
240263ee775SChangbin Du::
241263ee775SChangbin Du
2422abfcd29SRoss Zwisler  echo > /sys/kernel/tracing/kprobe_events
243263ee775SChangbin Du
244263ee775SChangbin DuThis clears all probe points.
245263ee775SChangbin Du
246263ee775SChangbin DuOr,
247263ee775SChangbin Du::
248263ee775SChangbin Du
249263ee775SChangbin Du  echo -:myprobe >> kprobe_events
250263ee775SChangbin Du
251263ee775SChangbin DuThis clears probe points selectively.
252263ee775SChangbin Du
253263ee775SChangbin DuRight after definition, each event is disabled by default. For tracing these
254263ee775SChangbin Duevents, you need to enable it.
255263ee775SChangbin Du::
256263ee775SChangbin Du
2572abfcd29SRoss Zwisler  echo 1 > /sys/kernel/tracing/events/kprobes/myprobe/enable
2582abfcd29SRoss Zwisler  echo 1 > /sys/kernel/tracing/events/kprobes/myretprobe/enable
259263ee775SChangbin Du
26078a89463SLecopzer ChenUse the following command to start tracing in an interval.
26178a89463SLecopzer Chen::
2627e6294cdSMauro Carvalho Chehab
26378a89463SLecopzer Chen    # echo 1 > tracing_on
26478a89463SLecopzer Chen    Open something...
26578a89463SLecopzer Chen    # echo 0 > tracing_on
26678a89463SLecopzer Chen
2672abfcd29SRoss ZwislerAnd you can see the traced information via /sys/kernel/tracing/trace.
268263ee775SChangbin Du::
269263ee775SChangbin Du
2702abfcd29SRoss Zwisler  cat /sys/kernel/tracing/trace
271263ee775SChangbin Du  # tracer: nop
272263ee775SChangbin Du  #
273263ee775SChangbin Du  #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
274263ee775SChangbin Du  #              | |       |          |         |
275263ee775SChangbin Du             <...>-1447  [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
276263ee775SChangbin Du             <...>-1447  [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
277263ee775SChangbin Du             <...>-1447  [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
278263ee775SChangbin Du             <...>-1447  [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
279263ee775SChangbin Du             <...>-1447  [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
280263ee775SChangbin Du             <...>-1447  [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
281263ee775SChangbin Du
282263ee775SChangbin Du
283263ee775SChangbin DuEach line shows when the kernel hits an event, and <- SYMBOL means kernel
284263ee775SChangbin Dureturns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
285263ee775SChangbin Dureturns from do_sys_open to sys_open+0x1b).
286