1263ee775SChangbin Du========================== 2263ee775SChangbin DuKprobe-based Event Tracing 3263ee775SChangbin Du========================== 4263ee775SChangbin Du 5263ee775SChangbin Du:Author: Masami Hiramatsu 6263ee775SChangbin Du 7263ee775SChangbin DuOverview 8263ee775SChangbin Du-------- 9776b32b7SYoann CongalThese events are similar to tracepoint-based events. Instead of tracepoints, 10263ee775SChangbin Duthis is based on kprobes (kprobe and kretprobe). So it can probe wherever 11263ee775SChangbin Dukprobes can probe (this means, all functions except those with 12263ee775SChangbin Du__kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL). 13776b32b7SYoann CongalUnlike the tracepoint-based event, this can be added and removed 14263ee775SChangbin Dudynamically, on the fly. 15263ee775SChangbin Du 16263ee775SChangbin DuTo enable this feature, build your kernel with CONFIG_KPROBE_EVENTS=y. 17263ee775SChangbin Du 18776b32b7SYoann CongalSimilar to the event tracer, this doesn't need to be activated via 19263ee775SChangbin Ducurrent_tracer. Instead of that, add probe points via 202abfcd29SRoss Zwisler/sys/kernel/tracing/kprobe_events, and enable it via 212abfcd29SRoss Zwisler/sys/kernel/tracing/events/kprobes/<EVENT>/enable. 22263ee775SChangbin Du 232abfcd29SRoss ZwislerYou can also use /sys/kernel/tracing/dynamic_events instead of 246212dd29SMasami Hiramatsukprobe_events. That interface will provide unified access to other 256212dd29SMasami Hiramatsudynamic events too. 26263ee775SChangbin Du 27263ee775SChangbin DuSynopsis of kprobe_events 28263ee775SChangbin Du------------------------- 29263ee775SChangbin Du:: 30263ee775SChangbin Du 3195c104c3SLinyu Yuan p[:[GRP/][EVENT]] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe 3295c104c3SLinyu Yuan r[MAXACTIVE][:[GRP/][EVENT]] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe 3395c104c3SLinyu Yuan p[:[GRP/][EVENT]] [MOD:]SYM[+0]%return [FETCHARGS] : Set a return probe 3495c104c3SLinyu Yuan -:[GRP/][EVENT] : Clear a probe 35263ee775SChangbin Du 36263ee775SChangbin Du GRP : Group name. If omitted, use "kprobes" for it. 37263ee775SChangbin Du EVENT : Event name. If omitted, the event name is generated 38263ee775SChangbin Du based on SYM+offs or MEMADDR. 39263ee775SChangbin Du MOD : Module name which has given SYM. 40263ee775SChangbin Du SYM[+offs] : Symbol+offset where the probe is inserted. 41638e476dSMasami Hiramatsu SYM%return : Return address of the symbol 42263ee775SChangbin Du MEMADDR : Address where the probe is inserted. 43263ee775SChangbin Du MAXACTIVE : Maximum number of instances of the specified function that 44263ee775SChangbin Du can be probed simultaneously, or 0 for the default value 457f9a2357SMauro Carvalho Chehab as defined in Documentation/trace/kprobes.rst section 1.3.1. 46263ee775SChangbin Du 47263ee775SChangbin Du FETCHARGS : Arguments. Each probe can have up to 128 args. 48263ee775SChangbin Du %REG : Fetch register REG 49263ee775SChangbin Du @ADDR : Fetch memory at ADDR (ADDR should be in kernel) 50263ee775SChangbin Du @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 51263ee775SChangbin Du $stackN : Fetch Nth entry of stack (N >= 0) 52263ee775SChangbin Du $stack : Fetch stack address. 53a1303af5SMasami Hiramatsu $argN : Fetch the Nth function argument. (N >= 1) (\*1) 54a1303af5SMasami Hiramatsu $retval : Fetch return value.(\*2) 55263ee775SChangbin Du $comm : Fetch current task comm. 56e65f7ae7SMasami Hiramatsu +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4) 576218bf9fSMasami Hiramatsu \IMM : Store an immediate value to the argument. 58263ee775SChangbin Du NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 59263ee775SChangbin Du FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 60263ee775SChangbin Du (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types 61*5e37460fSYe Bin (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char", 62*5e37460fSYe Bin "string", "ustring", "symbol", "symstr" and bitfield are 63*5e37460fSYe Bin supported. 64263ee775SChangbin Du 65b2a86697SMasami Hiramatsu (Google) (\*1) only for the probe on function entry (offs == 0). Note, this argument access 66b2a86697SMasami Hiramatsu (Google) is best effort, because depending on the argument type, it may be passed on 67b2a86697SMasami Hiramatsu (Google) the stack. But this only support the arguments via registers. 68b2a86697SMasami Hiramatsu (Google) (\*2) only for return probe. Note that this is also best effort. Depending on the 69b2a86697SMasami Hiramatsu (Google) return value type, it might be passed via a pair of registers. But this only 70b2a86697SMasami Hiramatsu (Google) accesses one register. 71a1303af5SMasami Hiramatsu (\*3) this is useful for fetching a field of data structures. 72e65f7ae7SMasami Hiramatsu (\*4) "u" means user-space dereference. See :ref:`user_mem_access`. 73263ee775SChangbin Du 74e8c32f24SMasami Hiramatsu (Google)Function arguments at kretprobe 75e8c32f24SMasami Hiramatsu (Google)------------------------------- 76e8c32f24SMasami Hiramatsu (Google)Function arguments can be accessed at kretprobe using $arg<N> fetcharg. This 77e8c32f24SMasami Hiramatsu (Google)is useful to record the function parameter and return value at once, and 78dd29dfe7SSaurav Shahtrace the difference of structure fields (for debugging a function whether it 79e8c32f24SMasami Hiramatsu (Google)correctly updates the given data structure or not). 80e8c32f24SMasami Hiramatsu (Google)See the :ref:`sample<fprobetrace_exit_args_sample>` in fprobe event for how 81e8c32f24SMasami Hiramatsu (Google)it works. 82e8c32f24SMasami Hiramatsu (Google) 83590e7b28SMasami Hiramatsu (Google).. _kprobetrace_types: 84590e7b28SMasami Hiramatsu (Google) 85263ee775SChangbin DuTypes 86263ee775SChangbin Du----- 87776b32b7SYoann CongalSeveral types are supported for fetchargs. Kprobe tracer will access memory 88263ee775SChangbin Duby given type. Prefix 's' and 'u' means those types are signed and unsigned 89263ee775SChangbin Durespectively. 'x' prefix implies it is unsigned. Traced arguments are shown 90263ee775SChangbin Duin decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32' 91263ee775SChangbin Duor 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and 92263ee775SChangbin Dux86-64 uses x64). 935d18c23cSYoann Congal 9440b53b77SMasami HiramatsuThese value types can be an array. To record array data, you can add '[N]' 9540b53b77SMasami Hiramatsu(where N is a fixed number, less than 64) to the base type. 96776b32b7SYoann CongalE.g. 'x16[4]' means an array of x16 (2-byte hex) with 4 elements. 9740b53b77SMasami HiramatsuNote that the array can be applied to memory type fetchargs, you can not 9840b53b77SMasami Hiramatsuapply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is 9940b53b77SMasami Hiramatsuwrong, but '+8($stack):x8[8]' is OK.) 1005d18c23cSYoann Congal 1018478cca1SDonglin PengChar type can be used to show the character value of traced arguments. 1022b79eb73SLinus Torvalds 103263ee775SChangbin DuString type is a special type, which fetches a "null-terminated" string from 104263ee775SChangbin Dukernel space. This means it will fail and store NULL if the string container 10588903c46SMasami Hiramatsuhas been paged out. "ustring" type is an alternative of string for user-space. 106776b32b7SYoann CongalSee :ref:`user_mem_access` for more info. 107776b32b7SYoann Congal 10840b53b77SMasami HiramatsuThe string array type is a bit different from other types. For other base 10940b53b77SMasami Hiramatsutypes, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same 11040b53b77SMasami Hiramatsuas +0(%di):x32.) But string[1] is not equal to string. The string type itself 11140b53b77SMasami Hiramatsurepresents "char array", but string array type represents "char * array". 11240b53b77SMasami HiramatsuSo, for example, +0(%di):string[1] is equal to +0(+0(%di)):string. 113263ee775SChangbin DuBitfield is another special type, which takes 3 parameters, bit-width, bit- 114263ee775SChangbin Duoffset, and container-size (usually 32). The syntax is:: 115263ee775SChangbin Du 116263ee775SChangbin Du b<bit-width>@<bit-offset>/<container-size> 117263ee775SChangbin Du 11860c2e0ceSMasami HiramatsuSymbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG) 11960c2e0ceSMasami Hiramatsuwhich shows given pointer in "symbol+offset" style. 120b26a124cSMasami Hiramatsu (Google)On the other hand, symbol-string type ('symstr') converts the given address to 121b26a124cSMasami Hiramatsu (Google)"symbol+offset/symbolsize" style and stores it as a null-terminated string. 122b26a124cSMasami Hiramatsu (Google)With 'symstr' type, you can filter the event with wildcard pattern of the 123b26a124cSMasami Hiramatsu (Google)symbols, and you don't need to solve symbol name by yourself. 124263ee775SChangbin DuFor $comm, the default type is "string"; any other type is invalid. 125263ee775SChangbin Du 126*5e37460fSYe BinVFS layer common type(%pd/%pD) is a special type, which fetches dentry's or 127*5e37460fSYe Binfile's name from struct dentry's address or struct file's address. 128*5e37460fSYe Bin 129e65f7ae7SMasami Hiramatsu.. _user_mem_access: 130ff1e81a7SSameer Rahmani 131e65f7ae7SMasami HiramatsuUser Memory Access 132e65f7ae7SMasami Hiramatsu------------------ 133e65f7ae7SMasami HiramatsuKprobe events supports user-space memory access. For that purpose, you can use 134e65f7ae7SMasami Hiramatsueither user-space dereference syntax or 'ustring' type. 135e65f7ae7SMasami Hiramatsu 136e65f7ae7SMasami HiramatsuThe user-space dereference syntax allows you to access a field of a data 137e65f7ae7SMasami Hiramatsustructure in user-space. This is done by adding the "u" prefix to the 138e65f7ae7SMasami Hiramatsudereference syntax. For example, +u4(%si) means it will read memory from the 139e65f7ae7SMasami Hiramatsuaddress in the register %si offset by 4, and the memory is expected to be in 140e65f7ae7SMasami Hiramatsuuser-space. You can use this for strings too, e.g. +u0(%si):string will read 141e65f7ae7SMasami Hiramatsua string from the address in the register %si that is expected to be in user- 142e65f7ae7SMasami Hiramatsuspace. 'ustring' is a shortcut way of performing the same task. That is, 143e65f7ae7SMasami Hiramatsu+0(%si):ustring is equivalent to +u0(%si):string. 144e65f7ae7SMasami Hiramatsu 145e65f7ae7SMasami HiramatsuNote that kprobe-event provides the user-memory access syntax but it doesn't 146e65f7ae7SMasami Hiramatsuuse it transparently. This means if you use normal dereference or string type 147776b32b7SYoann Congalfor user memory, it might fail, and may always fail on some architectures. The 148776b32b7SYoann Congaluser has to carefully check if the target data is in kernel or user space. 149263ee775SChangbin Du 150263ee775SChangbin DuPer-Probe Event Filtering 151263ee775SChangbin Du------------------------- 152263ee775SChangbin DuPer-probe event filtering feature allows you to set different filter on each 153263ee775SChangbin Duprobe and gives you what arguments will be shown in trace buffer. If an event 154263ee775SChangbin Duname is specified right after 'p:' or 'r:' in kprobe_events, it adds an event 155263ee775SChangbin Duunder tracing/events/kprobes/<EVENT>, at the directory you can see 'id', 15631130c8eSAndreas Ziegler'enable', 'format', 'filter' and 'trigger'. 157263ee775SChangbin Du 158e50891d6SAndreas Zieglerenable: 159263ee775SChangbin Du You can enable/disable the probe by writing 1 or 0 on it. 160263ee775SChangbin Du 161263ee775SChangbin Duformat: 162263ee775SChangbin Du This shows the format of this probe event. 163263ee775SChangbin Du 164263ee775SChangbin Dufilter: 165263ee775SChangbin Du You can write filtering rules of this event. 166263ee775SChangbin Du 167263ee775SChangbin Duid: 168263ee775SChangbin Du This shows the id of this probe event. 169263ee775SChangbin Du 17031130c8eSAndreas Zieglertrigger: 17131130c8eSAndreas Ziegler This allows to install trigger commands which are executed when the event is 17231130c8eSAndreas Ziegler hit (for details, see Documentation/trace/events.rst, section 6). 173263ee775SChangbin Du 174263ee775SChangbin DuEvent Profiling 175263ee775SChangbin Du--------------- 176263ee775SChangbin DuYou can check the total number of probe hits and probe miss-hits via 1772abfcd29SRoss Zwisler/sys/kernel/tracing/kprobe_profile. 178263ee775SChangbin DuThe first column is event name, the second is the number of probe hits, 179263ee775SChangbin Duthe third is the number of probe miss-hits. 180263ee775SChangbin Du 181970988e1SMasami HiramatsuKernel Boot Parameter 182970988e1SMasami Hiramatsu--------------------- 183970988e1SMasami HiramatsuYou can add and enable new kprobe events when booting up the kernel by 184970988e1SMasami Hiramatsu"kprobe_event=" parameter. The parameter accepts a semicolon-delimited 185970988e1SMasami Hiramatsukprobe events, which format is similar to the kprobe_events. 186970988e1SMasami HiramatsuThe difference is that the probe definition parameters are comma-delimited 187015b5162SYoann Congalinstead of space. For example, adding myprobe event on do_sys_open like below:: 188970988e1SMasami Hiramatsu 189970988e1SMasami Hiramatsu p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack) 190970988e1SMasami Hiramatsu 191015b5162SYoann Congalshould be below for kernel boot parameter (just replace spaces with comma):: 192970988e1SMasami Hiramatsu 193970988e1SMasami Hiramatsu p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack) 194970988e1SMasami Hiramatsu 195263ee775SChangbin Du 196263ee775SChangbin DuUsage examples 197263ee775SChangbin Du-------------- 198263ee775SChangbin DuTo add a probe as a new event, write a new definition to kprobe_events 199263ee775SChangbin Duas below:: 200263ee775SChangbin Du 2012abfcd29SRoss Zwisler echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/tracing/kprobe_events 202263ee775SChangbin Du 203263ee775SChangbin DuThis sets a kprobe on the top of do_sys_open() function with recording 204263ee775SChangbin Du1st to 4th arguments as "myprobe" event. Note, which register/stack entry is 205263ee775SChangbin Duassigned to each function argument depends on arch-specific ABI. If you unsure 206263ee775SChangbin Duthe ABI, please try to use probe subcommand of perf-tools (you can find it 207263ee775SChangbin Duunder tools/perf/). 208263ee775SChangbin DuAs this example shows, users can choose more familiar names for each arguments. 209263ee775SChangbin Du:: 210263ee775SChangbin Du 2112abfcd29SRoss Zwisler echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/tracing/kprobe_events 212263ee775SChangbin Du 213263ee775SChangbin DuThis sets a kretprobe on the return point of do_sys_open() function with 214263ee775SChangbin Durecording return value as "myretprobe" event. 215263ee775SChangbin DuYou can see the format of these events via 2162abfcd29SRoss Zwisler/sys/kernel/tracing/events/kprobes/<EVENT>/format. 217263ee775SChangbin Du:: 218263ee775SChangbin Du 2192abfcd29SRoss Zwisler cat /sys/kernel/tracing/events/kprobes/myprobe/format 220263ee775SChangbin Du name: myprobe 221263ee775SChangbin Du ID: 780 222263ee775SChangbin Du format: 223263ee775SChangbin Du field:unsigned short common_type; offset:0; size:2; signed:0; 224263ee775SChangbin Du field:unsigned char common_flags; offset:2; size:1; signed:0; 225263ee775SChangbin Du field:unsigned char common_preempt_count; offset:3; size:1;signed:0; 226263ee775SChangbin Du field:int common_pid; offset:4; size:4; signed:1; 227263ee775SChangbin Du 228263ee775SChangbin Du field:unsigned long __probe_ip; offset:12; size:4; signed:0; 229263ee775SChangbin Du field:int __probe_nargs; offset:16; size:4; signed:1; 230263ee775SChangbin Du field:unsigned long dfd; offset:20; size:4; signed:0; 231263ee775SChangbin Du field:unsigned long filename; offset:24; size:4; signed:0; 232263ee775SChangbin Du field:unsigned long flags; offset:28; size:4; signed:0; 233263ee775SChangbin Du field:unsigned long mode; offset:32; size:4; signed:0; 234263ee775SChangbin Du 235263ee775SChangbin Du 236263ee775SChangbin Du print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip, 237263ee775SChangbin Du REC->dfd, REC->filename, REC->flags, REC->mode 238263ee775SChangbin Du 239263ee775SChangbin DuYou can see that the event has 4 arguments as in the expressions you specified. 240263ee775SChangbin Du:: 241263ee775SChangbin Du 2422abfcd29SRoss Zwisler echo > /sys/kernel/tracing/kprobe_events 243263ee775SChangbin Du 244263ee775SChangbin DuThis clears all probe points. 245263ee775SChangbin Du 246263ee775SChangbin DuOr, 247263ee775SChangbin Du:: 248263ee775SChangbin Du 249263ee775SChangbin Du echo -:myprobe >> kprobe_events 250263ee775SChangbin Du 251263ee775SChangbin DuThis clears probe points selectively. 252263ee775SChangbin Du 253263ee775SChangbin DuRight after definition, each event is disabled by default. For tracing these 254263ee775SChangbin Duevents, you need to enable it. 255263ee775SChangbin Du:: 256263ee775SChangbin Du 2572abfcd29SRoss Zwisler echo 1 > /sys/kernel/tracing/events/kprobes/myprobe/enable 2582abfcd29SRoss Zwisler echo 1 > /sys/kernel/tracing/events/kprobes/myretprobe/enable 259263ee775SChangbin Du 26078a89463SLecopzer ChenUse the following command to start tracing in an interval. 26178a89463SLecopzer Chen:: 2627e6294cdSMauro Carvalho Chehab 26378a89463SLecopzer Chen # echo 1 > tracing_on 26478a89463SLecopzer Chen Open something... 26578a89463SLecopzer Chen # echo 0 > tracing_on 26678a89463SLecopzer Chen 2672abfcd29SRoss ZwislerAnd you can see the traced information via /sys/kernel/tracing/trace. 268263ee775SChangbin Du:: 269263ee775SChangbin Du 2702abfcd29SRoss Zwisler cat /sys/kernel/tracing/trace 271263ee775SChangbin Du # tracer: nop 272263ee775SChangbin Du # 273263ee775SChangbin Du # TASK-PID CPU# TIMESTAMP FUNCTION 274263ee775SChangbin Du # | | | | | 275263ee775SChangbin Du <...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0 276263ee775SChangbin Du <...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe 277263ee775SChangbin Du <...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6 278263ee775SChangbin Du <...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3 279263ee775SChangbin Du <...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10 280263ee775SChangbin Du <...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3 281263ee775SChangbin Du 282263ee775SChangbin Du 283263ee775SChangbin DuEach line shows when the kernel hits an event, and <- SYMBOL means kernel 284263ee775SChangbin Dureturns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel 285263ee775SChangbin Dureturns from do_sys_open to sys_open+0x1b). 286