173d98127SChangbin Du============= 273d98127SChangbin DuEvent Tracing 373d98127SChangbin Du============= 473d98127SChangbin Du 573d98127SChangbin Du:Author: Theodore Ts'o 673d98127SChangbin Du:Updated: Li Zefan and Tom Zanussi 773d98127SChangbin Du 873d98127SChangbin Du1. Introduction 973d98127SChangbin Du=============== 1073d98127SChangbin Du 11ec15872dSMauro Carvalho ChehabTracepoints (see Documentation/trace/tracepoints.rst) can be used 1273d98127SChangbin Duwithout creating custom kernel modules to register probe functions 1373d98127SChangbin Duusing the event tracing infrastructure. 1473d98127SChangbin Du 1573d98127SChangbin DuNot all tracepoints can be traced using the event tracing system; 1673d98127SChangbin Duthe kernel developer must provide code snippets which define how the 1773d98127SChangbin Dutracing information is saved into the tracing buffer, and how the 1873d98127SChangbin Dutracing information should be printed. 1973d98127SChangbin Du 2073d98127SChangbin Du2. Using Event Tracing 2173d98127SChangbin Du====================== 2273d98127SChangbin Du 2373d98127SChangbin Du2.1 Via the 'set_event' interface 2473d98127SChangbin Du--------------------------------- 2573d98127SChangbin Du 2673d98127SChangbin DuThe events which are available for tracing can be found in the file 272abfcd29SRoss Zwisler/sys/kernel/tracing/available_events. 2873d98127SChangbin Du 2973d98127SChangbin DuTo enable a particular event, such as 'sched_wakeup', simply echo it 302abfcd29SRoss Zwislerto /sys/kernel/tracing/set_event. For example:: 3173d98127SChangbin Du 322abfcd29SRoss Zwisler # echo sched_wakeup >> /sys/kernel/tracing/set_event 3373d98127SChangbin Du 3473d98127SChangbin Du.. Note:: '>>' is necessary, otherwise it will firstly disable all the events. 3573d98127SChangbin Du 3673d98127SChangbin DuTo disable an event, echo the event name to the set_event file prefixed 3773d98127SChangbin Duwith an exclamation point:: 3873d98127SChangbin Du 392abfcd29SRoss Zwisler # echo '!sched_wakeup' >> /sys/kernel/tracing/set_event 4073d98127SChangbin Du 4173d98127SChangbin DuTo disable all events, echo an empty line to the set_event file:: 4273d98127SChangbin Du 432abfcd29SRoss Zwisler # echo > /sys/kernel/tracing/set_event 4473d98127SChangbin Du 456234c7bdSJonathan CorbetTo enable all events, echo ``*:*`` or ``*:`` to the set_event file:: 4673d98127SChangbin Du 472abfcd29SRoss Zwisler # echo *:* > /sys/kernel/tracing/set_event 4873d98127SChangbin Du 4973d98127SChangbin DuThe events are organized into subsystems, such as ext4, irq, sched, 5073d98127SChangbin Duetc., and a full event name looks like this: <subsystem>:<event>. The 5173d98127SChangbin Dusubsystem name is optional, but it is displayed in the available_events 5273d98127SChangbin Dufile. All of the events in a subsystem can be specified via the syntax 536234c7bdSJonathan Corbet``<subsystem>:*``; for example, to enable all irq events, you can use the 5473d98127SChangbin Ducommand:: 5573d98127SChangbin Du 562abfcd29SRoss Zwisler # echo 'irq:*' > /sys/kernel/tracing/set_event 5773d98127SChangbin Du 5873d98127SChangbin Du2.2 Via the 'enable' toggle 5973d98127SChangbin Du--------------------------- 6073d98127SChangbin Du 612abfcd29SRoss ZwislerThe events available are also listed in /sys/kernel/tracing/events/ hierarchy 6273d98127SChangbin Duof directories. 6373d98127SChangbin Du 6473d98127SChangbin DuTo enable event 'sched_wakeup':: 6573d98127SChangbin Du 662abfcd29SRoss Zwisler # echo 1 > /sys/kernel/tracing/events/sched/sched_wakeup/enable 6773d98127SChangbin Du 6873d98127SChangbin DuTo disable it:: 6973d98127SChangbin Du 702abfcd29SRoss Zwisler # echo 0 > /sys/kernel/tracing/events/sched/sched_wakeup/enable 7173d98127SChangbin Du 7273d98127SChangbin DuTo enable all events in sched subsystem:: 7373d98127SChangbin Du 742abfcd29SRoss Zwisler # echo 1 > /sys/kernel/tracing/events/sched/enable 7573d98127SChangbin Du 7673d98127SChangbin DuTo enable all events:: 7773d98127SChangbin Du 782abfcd29SRoss Zwisler # echo 1 > /sys/kernel/tracing/events/enable 7973d98127SChangbin Du 8073d98127SChangbin DuWhen reading one of these enable files, there are four results: 8173d98127SChangbin Du 8273d98127SChangbin Du - 0 - all events this file affects are disabled 8373d98127SChangbin Du - 1 - all events this file affects are enabled 8473d98127SChangbin Du - X - there is a mixture of events enabled and disabled 8573d98127SChangbin Du - ? - this file does not affect any event 8673d98127SChangbin Du 8773d98127SChangbin Du2.3 Boot option 8873d98127SChangbin Du--------------- 8973d98127SChangbin Du 9073d98127SChangbin DuIn order to facilitate early boot debugging, use boot option:: 9173d98127SChangbin Du 9273d98127SChangbin Du trace_event=[event-list] 9373d98127SChangbin Du 9473d98127SChangbin Duevent-list is a comma separated list of events. See section 2.1 for event 9573d98127SChangbin Duformat. 9673d98127SChangbin Du 9773d98127SChangbin Du3. Defining an event-enabled tracepoint 9873d98127SChangbin Du======================================= 9973d98127SChangbin Du 10073d98127SChangbin DuSee The example provided in samples/trace_events 10173d98127SChangbin Du 10273d98127SChangbin Du4. Event formats 10373d98127SChangbin Du================ 10473d98127SChangbin Du 10573d98127SChangbin DuEach trace event has a 'format' file associated with it that contains 10673d98127SChangbin Dua description of each field in a logged event. This information can 10773d98127SChangbin Dube used to parse the binary trace stream, and is also the place to 10873d98127SChangbin Dufind the field names that can be used in event filters (see section 5). 10973d98127SChangbin Du 11073d98127SChangbin DuIt also displays the format string that will be used to print the 11173d98127SChangbin Duevent in text mode, along with the event name and ID used for 11273d98127SChangbin Duprofiling. 11373d98127SChangbin Du 1146234c7bdSJonathan CorbetEvery event has a set of ``common`` fields associated with it; these are 1156234c7bdSJonathan Corbetthe fields prefixed with ``common_``. The other fields vary between 11673d98127SChangbin Duevents and correspond to the fields defined in the TRACE_EVENT 11773d98127SChangbin Dudefinition for that event. 11873d98127SChangbin Du 11973d98127SChangbin DuEach field in the format has the form:: 12073d98127SChangbin Du 12173d98127SChangbin Du field:field-type field-name; offset:N; size:N; 12273d98127SChangbin Du 12373d98127SChangbin Duwhere offset is the offset of the field in the trace record and size 12473d98127SChangbin Duis the size of the data item, in bytes. 12573d98127SChangbin Du 12673d98127SChangbin DuFor example, here's the information displayed for the 'sched_wakeup' 12773d98127SChangbin Duevent:: 12873d98127SChangbin Du 1292abfcd29SRoss Zwisler # cat /sys/kernel/tracing/events/sched/sched_wakeup/format 13073d98127SChangbin Du 13173d98127SChangbin Du name: sched_wakeup 13273d98127SChangbin Du ID: 60 13373d98127SChangbin Du format: 13473d98127SChangbin Du field:unsigned short common_type; offset:0; size:2; 13573d98127SChangbin Du field:unsigned char common_flags; offset:2; size:1; 13673d98127SChangbin Du field:unsigned char common_preempt_count; offset:3; size:1; 13773d98127SChangbin Du field:int common_pid; offset:4; size:4; 13873d98127SChangbin Du field:int common_tgid; offset:8; size:4; 13973d98127SChangbin Du 14073d98127SChangbin Du field:char comm[TASK_COMM_LEN]; offset:12; size:16; 14173d98127SChangbin Du field:pid_t pid; offset:28; size:4; 14273d98127SChangbin Du field:int prio; offset:32; size:4; 14373d98127SChangbin Du field:int success; offset:36; size:4; 14473d98127SChangbin Du field:int cpu; offset:40; size:4; 14573d98127SChangbin Du 14673d98127SChangbin Du print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, 14773d98127SChangbin Du REC->prio, REC->success, REC->cpu 14873d98127SChangbin Du 14973d98127SChangbin DuThis event contains 10 fields, the first 5 common and the remaining 5 15073d98127SChangbin Duevent-specific. All the fields for this event are numeric, except for 15173d98127SChangbin Du'comm' which is a string, a distinction important for event filtering. 15273d98127SChangbin Du 15373d98127SChangbin Du5. Event filtering 15473d98127SChangbin Du================== 15573d98127SChangbin Du 15673d98127SChangbin DuTrace events can be filtered in the kernel by associating boolean 15773d98127SChangbin Du'filter expressions' with them. As soon as an event is logged into 15873d98127SChangbin Duthe trace buffer, its fields are checked against the filter expression 15973d98127SChangbin Duassociated with that event type. An event with field values that 16073d98127SChangbin Du'match' the filter will appear in the trace output, and an event whose 16173d98127SChangbin Duvalues don't match will be discarded. An event with no filter 16273d98127SChangbin Duassociated with it matches everything, and is the default when no 16373d98127SChangbin Dufilter has been set for an event. 16473d98127SChangbin Du 16573d98127SChangbin Du5.1 Expression syntax 16673d98127SChangbin Du--------------------- 16773d98127SChangbin Du 16873d98127SChangbin DuA filter expression consists of one or more 'predicates' that can be 16973d98127SChangbin Ducombined using the logical operators '&&' and '||'. A predicate is 17073d98127SChangbin Dusimply a clause that compares the value of a field contained within a 17173d98127SChangbin Dulogged event with a constant value and returns either 0 or 1 depending 17273d98127SChangbin Duon whether the field value matched (1) or didn't match (0):: 17373d98127SChangbin Du 17473d98127SChangbin Du field-name relational-operator value 17573d98127SChangbin Du 17673d98127SChangbin DuParentheses can be used to provide arbitrary logical groupings and 17773d98127SChangbin Dudouble-quotes can be used to prevent the shell from interpreting 17873d98127SChangbin Duoperators as shell metacharacters. 17973d98127SChangbin Du 18073d98127SChangbin DuThe field-names available for use in filters can be found in the 18173d98127SChangbin Du'format' files for trace events (see section 4). 18273d98127SChangbin Du 18373d98127SChangbin DuThe relational-operators depend on the type of the field being tested: 18473d98127SChangbin Du 18573d98127SChangbin DuThe operators available for numeric fields are: 18673d98127SChangbin Du 18773d98127SChangbin Du==, !=, <, <=, >, >=, & 18873d98127SChangbin Du 18973d98127SChangbin DuAnd for string fields they are: 19073d98127SChangbin Du 19173d98127SChangbin Du==, !=, ~ 19273d98127SChangbin Du 1936234c7bdSJonathan CorbetThe glob (~) accepts a wild card character (\*,?) and character classes 19473d98127SChangbin Du([). For example:: 19573d98127SChangbin Du 19673d98127SChangbin Du prev_comm ~ "*sh" 19773d98127SChangbin Du prev_comm ~ "sh*" 19873d98127SChangbin Du prev_comm ~ "*sh*" 19973d98127SChangbin Du prev_comm ~ "ba*sh" 20073d98127SChangbin Du 201f37c3bbcSSteven RostedtIf the field is a pointer that points into user space (for example 202f37c3bbcSSteven Rostedt"filename" from sys_enter_openat), then you have to append ".ustring" to the 203f37c3bbcSSteven Rostedtfield name:: 204f37c3bbcSSteven Rostedt 205f37c3bbcSSteven Rostedt filename.ustring ~ "password" 206f37c3bbcSSteven Rostedt 207f37c3bbcSSteven RostedtAs the kernel will have to know how to retrieve the memory that the pointer 208f37c3bbcSSteven Rostedtis at from user space. 209f37c3bbcSSteven Rostedt 210e6745a4dSSteven Rostedt (Google)You can convert any long type to a function address and search by function name:: 211e6745a4dSSteven Rostedt (Google) 212e6745a4dSSteven Rostedt (Google) call_site.function == security_prepare_creds 213e6745a4dSSteven Rostedt (Google) 214e6745a4dSSteven Rostedt (Google)The above will filter when the field "call_site" falls on the address within 215e6745a4dSSteven Rostedt (Google)"security_prepare_creds". That is, it will compare the value of "call_site" and 216e6745a4dSSteven Rostedt (Google)the filter will return true if it is greater than or equal to the start of 217e6745a4dSSteven Rostedt (Google)the function "security_prepare_creds" and less than the end of that function. 218e6745a4dSSteven Rostedt (Google) 219e6745a4dSSteven Rostedt (Google)The ".function" postfix can only be attached to values of size long, and can only 220e6745a4dSSteven Rostedt (Google)be compared with "==" or "!=". 221e6745a4dSSteven Rostedt (Google) 222*fa828efbSValentin SchneiderCpumask fields or scalar fields that encode a CPU number can be filtered using 223*fa828efbSValentin Schneidera user-provided cpumask in cpulist format. The format is as follows:: 224*fa828efbSValentin Schneider 225*fa828efbSValentin Schneider CPUS{$cpulist} 226*fa828efbSValentin Schneider 227*fa828efbSValentin SchneiderOperators available to cpumask filtering are: 228*fa828efbSValentin Schneider 229*fa828efbSValentin Schneider& (intersection), ==, != 230*fa828efbSValentin Schneider 231*fa828efbSValentin SchneiderFor example, this will filter events that have their .target_cpu field present 232*fa828efbSValentin Schneiderin the given cpumask:: 233*fa828efbSValentin Schneider 234*fa828efbSValentin Schneider target_cpu & CPUS{17-42} 235*fa828efbSValentin Schneider 23673d98127SChangbin Du5.2 Setting filters 23773d98127SChangbin Du------------------- 23873d98127SChangbin Du 23973d98127SChangbin DuA filter for an individual event is set by writing a filter expression 24073d98127SChangbin Duto the 'filter' file for the given event. 24173d98127SChangbin Du 24273d98127SChangbin DuFor example:: 24373d98127SChangbin Du 2442abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/sched/sched_wakeup 24573d98127SChangbin Du # echo "common_preempt_count > 4" > filter 24673d98127SChangbin Du 24773d98127SChangbin DuA slightly more involved example:: 24873d98127SChangbin Du 2492abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/signal/signal_generate 25073d98127SChangbin Du # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter 25173d98127SChangbin Du 25273d98127SChangbin DuIf there is an error in the expression, you'll get an 'Invalid 25373d98127SChangbin Duargument' error when setting it, and the erroneous string along with 25473d98127SChangbin Duan error message can be seen by looking at the filter e.g.:: 25573d98127SChangbin Du 2562abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/signal/signal_generate 25773d98127SChangbin Du # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter 25873d98127SChangbin Du -bash: echo: write error: Invalid argument 25973d98127SChangbin Du # cat filter 26073d98127SChangbin Du ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash 26173d98127SChangbin Du ^ 26273d98127SChangbin Du parse_error: Field not found 26373d98127SChangbin Du 26473d98127SChangbin DuCurrently the caret ('^') for an error always appears at the beginning of 26573d98127SChangbin Duthe filter string; the error message should still be useful though 26673d98127SChangbin Dueven without more accurate position info. 26773d98127SChangbin Du 26877360f9bSSteven Rostedt5.2.1 Filter limitations 26977360f9bSSteven Rostedt------------------------ 27077360f9bSSteven Rostedt 27177360f9bSSteven RostedtIf a filter is placed on a string pointer ``(char *)`` that does not point 27277360f9bSSteven Rostedtto a string on the ring buffer, but instead points to kernel or user space 27377360f9bSSteven Rostedtmemory, then, for safety reasons, at most 1024 bytes of the content is 27477360f9bSSteven Rostedtcopied onto a temporary buffer to do the compare. If the copy of the memory 27577360f9bSSteven Rostedtfaults (the pointer points to memory that should not be accessed), then the 27677360f9bSSteven Rostedtstring compare will be treated as not matching. 27777360f9bSSteven Rostedt 27873d98127SChangbin Du5.3 Clearing filters 27973d98127SChangbin Du-------------------- 28073d98127SChangbin Du 28173d98127SChangbin DuTo clear the filter for an event, write a '0' to the event's filter 28273d98127SChangbin Dufile. 28373d98127SChangbin Du 28473d98127SChangbin DuTo clear the filters for all events in a subsystem, write a '0' to the 28573d98127SChangbin Dusubsystem's filter file. 28673d98127SChangbin Du 28771240f94SDonglin Peng5.4 Subsystem filters 28873d98127SChangbin Du--------------------- 28973d98127SChangbin Du 29073d98127SChangbin DuFor convenience, filters for every event in a subsystem can be set or 29173d98127SChangbin Ducleared as a group by writing a filter expression into the filter file 29273d98127SChangbin Duat the root of the subsystem. Note however, that if a filter for any 29373d98127SChangbin Duevent within the subsystem lacks a field specified in the subsystem 29473d98127SChangbin Dufilter, or if the filter can't be applied for any other reason, the 29573d98127SChangbin Dufilter for that event will retain its previous setting. This can 29673d98127SChangbin Duresult in an unintended mixture of filters which could lead to 29773d98127SChangbin Duconfusing (to the user who might think different filters are in 29873d98127SChangbin Dueffect) trace output. Only filters that reference just the common 29973d98127SChangbin Dufields can be guaranteed to propagate successfully to all events. 30073d98127SChangbin Du 30173d98127SChangbin DuHere are a few subsystem filter examples that also illustrate the 30273d98127SChangbin Duabove points: 30373d98127SChangbin Du 30473d98127SChangbin DuClear the filters on all events in the sched subsystem:: 30573d98127SChangbin Du 3062abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/sched 30773d98127SChangbin Du # echo 0 > filter 30873d98127SChangbin Du # cat sched_switch/filter 30973d98127SChangbin Du none 31073d98127SChangbin Du # cat sched_wakeup/filter 31173d98127SChangbin Du none 31273d98127SChangbin Du 31373d98127SChangbin DuSet a filter using only common fields for all events in the sched 31473d98127SChangbin Dusubsystem (all events end up with the same filter):: 31573d98127SChangbin Du 3162abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/sched 31773d98127SChangbin Du # echo common_pid == 0 > filter 31873d98127SChangbin Du # cat sched_switch/filter 31973d98127SChangbin Du common_pid == 0 32073d98127SChangbin Du # cat sched_wakeup/filter 32173d98127SChangbin Du common_pid == 0 32273d98127SChangbin Du 32373d98127SChangbin DuAttempt to set a filter using a non-common field for all events in the 32473d98127SChangbin Dusched subsystem (all events but those that have a prev_pid field retain 32573d98127SChangbin Dutheir old filters):: 32673d98127SChangbin Du 3272abfcd29SRoss Zwisler # cd /sys/kernel/tracing/events/sched 32873d98127SChangbin Du # echo prev_pid == 0 > filter 32973d98127SChangbin Du # cat sched_switch/filter 33073d98127SChangbin Du prev_pid == 0 33173d98127SChangbin Du # cat sched_wakeup/filter 33273d98127SChangbin Du common_pid == 0 33373d98127SChangbin Du 33471240f94SDonglin Peng5.5 PID filtering 33573d98127SChangbin Du----------------- 33673d98127SChangbin Du 33773d98127SChangbin DuThe set_event_pid file in the same directory as the top events directory 33873d98127SChangbin Duexists, will filter all events from tracing any task that does not have the 33973d98127SChangbin DuPID listed in the set_event_pid file. 34073d98127SChangbin Du:: 34173d98127SChangbin Du 3422abfcd29SRoss Zwisler # cd /sys/kernel/tracing 34373d98127SChangbin Du # echo $$ > set_event_pid 34473d98127SChangbin Du # echo 1 > events/enable 34573d98127SChangbin Du 34673d98127SChangbin DuWill only trace events for the current task. 34773d98127SChangbin Du 34873d98127SChangbin DuTo add more PIDs without losing the PIDs already included, use '>>'. 34973d98127SChangbin Du:: 35073d98127SChangbin Du 35173d98127SChangbin Du # echo 123 244 1 >> set_event_pid 35273d98127SChangbin Du 35373d98127SChangbin Du 35473d98127SChangbin Du6. Event triggers 35573d98127SChangbin Du================= 35673d98127SChangbin Du 35773d98127SChangbin DuTrace events can be made to conditionally invoke trigger 'commands' 35873d98127SChangbin Duwhich can take various forms and are described in detail below; 35973d98127SChangbin Duexamples would be enabling or disabling other trace events or invoking 36073d98127SChangbin Dua stack trace whenever the trace event is hit. Whenever a trace event 36173d98127SChangbin Duwith attached triggers is invoked, the set of trigger commands 36273d98127SChangbin Duassociated with that event is invoked. Any given trigger can 36373d98127SChangbin Duadditionally have an event filter of the same form as described in 36473d98127SChangbin Dusection 5 (Event filtering) associated with it - the command will only 36573d98127SChangbin Dube invoked if the event being invoked passes the associated filter. 36673d98127SChangbin DuIf no filter is associated with the trigger, it always passes. 36773d98127SChangbin Du 36873d98127SChangbin DuTriggers are added to and removed from a particular event by writing 36973d98127SChangbin Dutrigger expressions to the 'trigger' file for the given event. 37073d98127SChangbin Du 37173d98127SChangbin DuA given event can have any number of triggers associated with it, 37273d98127SChangbin Dusubject to any restrictions that individual commands may have in that 37373d98127SChangbin Duregard. 37473d98127SChangbin Du 37573d98127SChangbin DuEvent triggers are implemented on top of "soft" mode, which means that 37673d98127SChangbin Duwhenever a trace event has one or more triggers associated with it, 37773d98127SChangbin Duthe event is activated even if it isn't actually enabled, but is 37873d98127SChangbin Dudisabled in a "soft" mode. That is, the tracepoint will be called, 37973d98127SChangbin Dubut just will not be traced, unless of course it's actually enabled. 38073d98127SChangbin DuThis scheme allows triggers to be invoked even for events that aren't 38173d98127SChangbin Duenabled, and also allows the current event filter implementation to be 38273d98127SChangbin Duused for conditionally invoking triggers. 38373d98127SChangbin Du 38473d98127SChangbin DuThe syntax for event triggers is roughly based on the syntax for 38573d98127SChangbin Duset_ftrace_filter 'ftrace filter commands' (see the 'Filter commands' 386d3439f9dSSteven Rostedt (VMware)section of Documentation/trace/ftrace.rst), but there are major 38773d98127SChangbin Dudifferences and the implementation isn't currently tied to it in any 38873d98127SChangbin Duway, so beware about making generalizations between the two. 38973d98127SChangbin Du 3908206de7dSMauro Carvalho Chehab.. Note:: 3918206de7dSMauro Carvalho Chehab Writing into trace_marker (See Documentation/trace/ftrace.rst) 392d3439f9dSSteven Rostedt (VMware) can also enable triggers that are written into 393d3439f9dSSteven Rostedt (VMware) /sys/kernel/tracing/events/ftrace/print/trigger 394d3439f9dSSteven Rostedt (VMware) 39573d98127SChangbin Du6.1 Expression syntax 39673d98127SChangbin Du--------------------- 39773d98127SChangbin Du 39873d98127SChangbin DuTriggers are added by echoing the command to the 'trigger' file:: 39973d98127SChangbin Du 40073d98127SChangbin Du # echo 'command[:count] [if filter]' > trigger 40173d98127SChangbin Du 40273d98127SChangbin DuTriggers are removed by echoing the same command but starting with '!' 40373d98127SChangbin Duto the 'trigger' file:: 40473d98127SChangbin Du 40573d98127SChangbin Du # echo '!command[:count] [if filter]' > trigger 40673d98127SChangbin Du 40773d98127SChangbin DuThe [if filter] part isn't used in matching commands when removing, so 40873d98127SChangbin Duleaving that off in a '!' command will accomplish the same thing as 40973d98127SChangbin Duhaving it in. 41073d98127SChangbin Du 41173d98127SChangbin DuThe filter syntax is the same as that described in the 'Event 41273d98127SChangbin Dufiltering' section above. 41373d98127SChangbin Du 41473d98127SChangbin DuFor ease of use, writing to the trigger file using '>' currently just 41573d98127SChangbin Duadds or removes a single trigger and there's no explicit '>>' support 41673d98127SChangbin Du('>' actually behaves like '>>') or truncation support to remove all 41773d98127SChangbin Dutriggers (you have to use '!' for each one added.) 41873d98127SChangbin Du 41973d98127SChangbin Du6.2 Supported trigger commands 42073d98127SChangbin Du------------------------------ 42173d98127SChangbin Du 42273d98127SChangbin DuThe following commands are supported: 42373d98127SChangbin Du 42473d98127SChangbin Du- enable_event/disable_event 42573d98127SChangbin Du 42673d98127SChangbin Du These commands can enable or disable another trace event whenever 42773d98127SChangbin Du the triggering event is hit. When these commands are registered, 42873d98127SChangbin Du the other trace event is activated, but disabled in a "soft" mode. 42973d98127SChangbin Du That is, the tracepoint will be called, but just will not be traced. 43073d98127SChangbin Du The event tracepoint stays in this mode as long as there's a trigger 43173d98127SChangbin Du in effect that can trigger it. 43273d98127SChangbin Du 43373d98127SChangbin Du For example, the following trigger causes kmalloc events to be 43473d98127SChangbin Du traced when a read system call is entered, and the :1 at the end 43573d98127SChangbin Du specifies that this enablement happens only once:: 43673d98127SChangbin Du 43773d98127SChangbin Du # echo 'enable_event:kmem:kmalloc:1' > \ 4382abfcd29SRoss Zwisler /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger 43973d98127SChangbin Du 44073d98127SChangbin Du The following trigger causes kmalloc events to stop being traced 44173d98127SChangbin Du when a read system call exits. This disablement happens on every 44273d98127SChangbin Du read system call exit:: 44373d98127SChangbin Du 44473d98127SChangbin Du # echo 'disable_event:kmem:kmalloc' > \ 4452abfcd29SRoss Zwisler /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger 44673d98127SChangbin Du 44773d98127SChangbin Du The format is:: 44873d98127SChangbin Du 44973d98127SChangbin Du enable_event:<system>:<event>[:count] 45073d98127SChangbin Du disable_event:<system>:<event>[:count] 45173d98127SChangbin Du 45273d98127SChangbin Du To remove the above commands:: 45373d98127SChangbin Du 45473d98127SChangbin Du # echo '!enable_event:kmem:kmalloc:1' > \ 4552abfcd29SRoss Zwisler /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger 45673d98127SChangbin Du 45773d98127SChangbin Du # echo '!disable_event:kmem:kmalloc' > \ 4582abfcd29SRoss Zwisler /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger 45973d98127SChangbin Du 46073d98127SChangbin Du Note that there can be any number of enable/disable_event triggers 46173d98127SChangbin Du per triggering event, but there can only be one trigger per 46273d98127SChangbin Du triggered event. e.g. sys_enter_read can have triggers enabling both 46373d98127SChangbin Du kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc 46473d98127SChangbin Du versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if 46573d98127SChangbin Du bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they 46673d98127SChangbin Du could be combined into a single filter on kmem:kmalloc though). 46773d98127SChangbin Du 46873d98127SChangbin Du- stacktrace 46973d98127SChangbin Du 47073d98127SChangbin Du This command dumps a stacktrace in the trace buffer whenever the 47173d98127SChangbin Du triggering event occurs. 47273d98127SChangbin Du 47373d98127SChangbin Du For example, the following trigger dumps a stacktrace every time the 47473d98127SChangbin Du kmalloc tracepoint is hit:: 47573d98127SChangbin Du 47673d98127SChangbin Du # echo 'stacktrace' > \ 4772abfcd29SRoss Zwisler /sys/kernel/tracing/events/kmem/kmalloc/trigger 47873d98127SChangbin Du 47973d98127SChangbin Du The following trigger dumps a stacktrace the first 5 times a kmalloc 48073d98127SChangbin Du request happens with a size >= 64K:: 48173d98127SChangbin Du 48273d98127SChangbin Du # echo 'stacktrace:5 if bytes_req >= 65536' > \ 4832abfcd29SRoss Zwisler /sys/kernel/tracing/events/kmem/kmalloc/trigger 48473d98127SChangbin Du 48573d98127SChangbin Du The format is:: 48673d98127SChangbin Du 48773d98127SChangbin Du stacktrace[:count] 48873d98127SChangbin Du 48973d98127SChangbin Du To remove the above commands:: 49073d98127SChangbin Du 49173d98127SChangbin Du # echo '!stacktrace' > \ 4922abfcd29SRoss Zwisler /sys/kernel/tracing/events/kmem/kmalloc/trigger 49373d98127SChangbin Du 49473d98127SChangbin Du # echo '!stacktrace:5 if bytes_req >= 65536' > \ 4952abfcd29SRoss Zwisler /sys/kernel/tracing/events/kmem/kmalloc/trigger 49673d98127SChangbin Du 49773d98127SChangbin Du The latter can also be removed more simply by the following (without 49873d98127SChangbin Du the filter):: 49973d98127SChangbin Du 50073d98127SChangbin Du # echo '!stacktrace:5' > \ 5012abfcd29SRoss Zwisler /sys/kernel/tracing/events/kmem/kmalloc/trigger 50273d98127SChangbin Du 50373d98127SChangbin Du Note that there can be only one stacktrace trigger per triggering 50473d98127SChangbin Du event. 50573d98127SChangbin Du 50673d98127SChangbin Du- snapshot 50773d98127SChangbin Du 50873d98127SChangbin Du This command causes a snapshot to be triggered whenever the 50973d98127SChangbin Du triggering event occurs. 51073d98127SChangbin Du 51173d98127SChangbin Du The following command creates a snapshot every time a block request 51273d98127SChangbin Du queue is unplugged with a depth > 1. If you were tracing a set of 51373d98127SChangbin Du events or functions at the time, the snapshot trace buffer would 51473d98127SChangbin Du capture those events when the trigger event occurred:: 51573d98127SChangbin Du 51673d98127SChangbin Du # echo 'snapshot if nr_rq > 1' > \ 5172abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 51873d98127SChangbin Du 51973d98127SChangbin Du To only snapshot once:: 52073d98127SChangbin Du 52173d98127SChangbin Du # echo 'snapshot:1 if nr_rq > 1' > \ 5222abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 52373d98127SChangbin Du 52473d98127SChangbin Du To remove the above commands:: 52573d98127SChangbin Du 52673d98127SChangbin Du # echo '!snapshot if nr_rq > 1' > \ 5272abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 52873d98127SChangbin Du 52973d98127SChangbin Du # echo '!snapshot:1 if nr_rq > 1' > \ 5302abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 53173d98127SChangbin Du 53273d98127SChangbin Du Note that there can be only one snapshot trigger per triggering 53373d98127SChangbin Du event. 53473d98127SChangbin Du 53573d98127SChangbin Du- traceon/traceoff 53673d98127SChangbin Du 53773d98127SChangbin Du These commands turn tracing on and off when the specified events are 53873d98127SChangbin Du hit. The parameter determines how many times the tracing system is 53973d98127SChangbin Du turned on and off. If unspecified, there is no limit. 54073d98127SChangbin Du 54173d98127SChangbin Du The following command turns tracing off the first time a block 54273d98127SChangbin Du request queue is unplugged with a depth > 1. If you were tracing a 54373d98127SChangbin Du set of events or functions at the time, you could then examine the 54473d98127SChangbin Du trace buffer to see the sequence of events that led up to the 54573d98127SChangbin Du trigger event:: 54673d98127SChangbin Du 54773d98127SChangbin Du # echo 'traceoff:1 if nr_rq > 1' > \ 5482abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 54973d98127SChangbin Du 55073d98127SChangbin Du To always disable tracing when nr_rq > 1:: 55173d98127SChangbin Du 55273d98127SChangbin Du # echo 'traceoff if nr_rq > 1' > \ 5532abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 55473d98127SChangbin Du 55573d98127SChangbin Du To remove the above commands:: 55673d98127SChangbin Du 55773d98127SChangbin Du # echo '!traceoff:1 if nr_rq > 1' > \ 5582abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 55973d98127SChangbin Du 56073d98127SChangbin Du # echo '!traceoff if nr_rq > 1' > \ 5612abfcd29SRoss Zwisler /sys/kernel/tracing/events/block/block_unplug/trigger 56273d98127SChangbin Du 56373d98127SChangbin Du Note that there can be only one traceon or traceoff trigger per 56473d98127SChangbin Du triggering event. 56573d98127SChangbin Du 56673d98127SChangbin Du- hist 56773d98127SChangbin Du 56873d98127SChangbin Du This command aggregates event hits into a hash table keyed on one or 56973d98127SChangbin Du more trace event format fields (or stacktrace) and a set of running 57073d98127SChangbin Du totals derived from one or more trace event format fields and/or 57173d98127SChangbin Du event counts (hitcount). 57273d98127SChangbin Du 573ea272257SMauro Carvalho Chehab See Documentation/trace/histogram.rst for details and examples. 57434ed6357STom Zanussi 575b8170fadSTom Zanussi7. In-kernel trace event API 576b8170fadSTom Zanussi============================ 57734ed6357STom Zanussi 57834ed6357STom ZanussiIn most cases, the command-line interface to trace events is more than 57934ed6357STom Zanussisufficient. Sometimes, however, applications might find the need for 58034ed6357STom Zanussimore complex relationships than can be expressed through a simple 58134ed6357STom Zanussiseries of linked command-line expressions, or putting together sets of 58234ed6357STom Zanussicommands may be simply too cumbersome. An example might be an 58334ed6357STom Zanussiapplication that needs to 'listen' to the trace stream in order to 58434ed6357STom Zanussimaintain an in-kernel state machine detecting, for instance, when an 58534ed6357STom Zanussiillegal kernel state occurs in the scheduler. 58634ed6357STom Zanussi 58734ed6357STom ZanussiThe trace event subsystem provides an in-kernel API allowing modules 58834ed6357STom Zanussior other kernel code to generate user-defined 'synthetic' events at 58934ed6357STom Zanussiwill, which can be used to either augment the existing trace stream 59034ed6357STom Zanussiand/or signal that a particular important state has occurred. 59134ed6357STom Zanussi 59234ed6357STom ZanussiA similar in-kernel API is also available for creating kprobe and 59334ed6357STom Zanussikretprobe events. 59434ed6357STom Zanussi 59534ed6357STom ZanussiBoth the synthetic event and k/ret/probe event APIs are built on top 59634ed6357STom Zanussiof a lower-level "dynevent_cmd" event command API, which is also 59734ed6357STom Zanussiavailable for more specialized applications, or as the basis of other 59834ed6357STom Zanussihigher-level trace event APIs. 59934ed6357STom Zanussi 60034ed6357STom ZanussiThe API provided for these purposes is describe below and allows the 60134ed6357STom Zanussifollowing: 60234ed6357STom Zanussi 60334ed6357STom Zanussi - dynamically creating synthetic event definitions 60434ed6357STom Zanussi - dynamically creating kprobe and kretprobe event definitions 60534ed6357STom Zanussi - tracing synthetic events from in-kernel code 60634ed6357STom Zanussi - the low-level "dynevent_cmd" API 60734ed6357STom Zanussi 608b8170fadSTom Zanussi7.1 Dyamically creating synthetic event definitions 609b8170fadSTom Zanussi--------------------------------------------------- 61034ed6357STom Zanussi 61134ed6357STom ZanussiThere are a couple ways to create a new synthetic event from a kernel 61234ed6357STom Zanussimodule or other kernel code. 61334ed6357STom Zanussi 61434ed6357STom ZanussiThe first creates the event in one step, using synth_event_create(). 61534ed6357STom ZanussiIn this method, the name of the event to create and an array defining 61634ed6357STom Zanussithe fields is supplied to synth_event_create(). If successful, a 61734ed6357STom Zanussisynthetic event with that name and fields will exist following that 6188206de7dSMauro Carvalho Chehabcall. For example, to create a new "schedtest" synthetic event:: 61934ed6357STom Zanussi 62034ed6357STom Zanussi ret = synth_event_create("schedtest", sched_fields, 62134ed6357STom Zanussi ARRAY_SIZE(sched_fields), THIS_MODULE); 62234ed6357STom Zanussi 62334ed6357STom ZanussiThe sched_fields param in this example points to an array of struct 62434ed6357STom Zanussisynth_field_desc, each of which describes an event field by type and 6258206de7dSMauro Carvalho Chehabname:: 62634ed6357STom Zanussi 62734ed6357STom Zanussi static struct synth_field_desc sched_fields[] = { 62834ed6357STom Zanussi { .type = "pid_t", .name = "next_pid_field" }, 62934ed6357STom Zanussi { .type = "char[16]", .name = "next_comm_field" }, 63034ed6357STom Zanussi { .type = "u64", .name = "ts_ns" }, 63134ed6357STom Zanussi { .type = "u64", .name = "ts_ms" }, 63234ed6357STom Zanussi { .type = "unsigned int", .name = "cpu" }, 63334ed6357STom Zanussi { .type = "char[64]", .name = "my_string_field" }, 63434ed6357STom Zanussi { .type = "int", .name = "my_int_field" }, 63534ed6357STom Zanussi }; 63634ed6357STom Zanussi 637bd82631dSTom ZanussiSee synth_field_size() for available types. 638bd82631dSTom Zanussi 639bd82631dSTom ZanussiIf field_name contains [n], the field is considered to be a static array. 640bd82631dSTom Zanussi 641bd82631dSTom ZanussiIf field_names contains[] (no subscript), the field is considered to 642bd82631dSTom Zanussibe a dynamic array, which will only take as much space in the event as 643bd82631dSTom Zanussiis required to hold the array. 644bd82631dSTom Zanussi 645bd82631dSTom ZanussiBecause space for an event is reserved before assigning field values 646bd82631dSTom Zanussito the event, using dynamic arrays implies that the piecewise 647bd82631dSTom Zanussiin-kernel API described below can't be used with dynamic arrays. The 648bd82631dSTom Zanussiother non-piecewise in-kernel APIs can, however, be used with dynamic 649bd82631dSTom Zanussiarrays. 65034ed6357STom Zanussi 65134ed6357STom ZanussiIf the event is created from within a module, a pointer to the module 65234ed6357STom Zanussimust be passed to synth_event_create(). This will ensure that the 65334ed6357STom Zanussitrace buffer won't contain unreadable events when the module is 65434ed6357STom Zanussiremoved. 65534ed6357STom Zanussi 65634ed6357STom ZanussiAt this point, the event object is ready to be used for generating new 65734ed6357STom Zanussievents. 65834ed6357STom Zanussi 65934ed6357STom ZanussiIn the second method, the event is created in several steps. This 66034ed6357STom Zanussiallows events to be created dynamically and without the need to create 66134ed6357STom Zanussiand populate an array of fields beforehand. 66234ed6357STom Zanussi 66334ed6357STom ZanussiTo use this method, an empty or partially empty synthetic event should 66434ed6357STom Zanussifirst be created using synth_event_gen_cmd_start() or 66534ed6357STom Zanussisynth_event_gen_cmd_array_start(). For synth_event_gen_cmd_start(), 66634ed6357STom Zanussithe name of the event along with one or more pairs of args each pair 66734ed6357STom Zanussirepresenting a 'type field_name;' field specification should be 66834ed6357STom Zanussisupplied. For synth_event_gen_cmd_array_start(), the name of the 66934ed6357STom Zanussievent along with an array of struct synth_field_desc should be 67034ed6357STom Zanussisupplied. Before calling synth_event_gen_cmd_start() or 67134ed6357STom Zanussisynth_event_gen_cmd_array_start(), the user should create and 67234ed6357STom Zanussiinitialize a dynevent_cmd object using synth_event_cmd_init(). 67334ed6357STom Zanussi 67434ed6357STom ZanussiFor example, to create a new "schedtest" synthetic event with two 6758206de7dSMauro Carvalho Chehabfields:: 67634ed6357STom Zanussi 67734ed6357STom Zanussi struct dynevent_cmd cmd; 67834ed6357STom Zanussi char *buf; 67934ed6357STom Zanussi 68034ed6357STom Zanussi /* Create a buffer to hold the generated command */ 68134ed6357STom Zanussi buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 68234ed6357STom Zanussi 68334ed6357STom Zanussi /* Before generating the command, initialize the cmd object */ 68434ed6357STom Zanussi synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); 68534ed6357STom Zanussi 68634ed6357STom Zanussi ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE, 68734ed6357STom Zanussi "pid_t", "next_pid_field", 68834ed6357STom Zanussi "u64", "ts_ns"); 68934ed6357STom Zanussi 69034ed6357STom ZanussiAlternatively, using an array of struct synth_field_desc fields 6918206de7dSMauro Carvalho Chehabcontaining the same information:: 69234ed6357STom Zanussi 69334ed6357STom Zanussi ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE, 69434ed6357STom Zanussi fields, n_fields); 69534ed6357STom Zanussi 69634ed6357STom ZanussiOnce the synthetic event object has been created, it can then be 69734ed6357STom Zanussipopulated with more fields. Fields are added one by one using 69834ed6357STom Zanussisynth_event_add_field(), supplying the dynevent_cmd object, a field 69934ed6357STom Zanussitype, and a field name. For example, to add a new int field named 7008206de7dSMauro Carvalho Chehab"intfield", the following call should be made:: 70134ed6357STom Zanussi 70234ed6357STom Zanussi ret = synth_event_add_field(&cmd, "int", "intfield"); 70334ed6357STom Zanussi 70434ed6357STom ZanussiSee synth_field_size() for available types. If field_name contains [n] 70534ed6357STom Zanussithe field is considered to be an array. 70634ed6357STom Zanussi 70734ed6357STom ZanussiA group of fields can also be added all at once using an array of 70834ed6357STom Zanussisynth_field_desc with add_synth_fields(). For example, this would add 7098206de7dSMauro Carvalho Chehabjust the first four sched_fields:: 71034ed6357STom Zanussi 71134ed6357STom Zanussi ret = synth_event_add_fields(&cmd, sched_fields, 4); 71234ed6357STom Zanussi 71334ed6357STom ZanussiIf you already have a string of the form 'type field_name', 71434ed6357STom Zanussisynth_event_add_field_str() can be used to add it as-is; it will 71534ed6357STom Zanussialso automatically append a ';' to the string. 71634ed6357STom Zanussi 71734ed6357STom ZanussiOnce all the fields have been added, the event should be finalized and 7188206de7dSMauro Carvalho Chehabregistered by calling the synth_event_gen_cmd_end() function:: 71934ed6357STom Zanussi 72034ed6357STom Zanussi ret = synth_event_gen_cmd_end(&cmd); 72134ed6357STom Zanussi 72234ed6357STom ZanussiAt this point, the event object is ready to be used for tracing new 72334ed6357STom Zanussievents. 72434ed6357STom Zanussi 725b8170fadSTom Zanussi7.2 Tracing synthetic events from in-kernel code 726b8170fadSTom Zanussi------------------------------------------------ 72734ed6357STom Zanussi 72834ed6357STom ZanussiTo trace a synthetic event, there are several options. The first 72934ed6357STom Zanussioption is to trace the event in one call, using synth_event_trace() 73034ed6357STom Zanussiwith a variable number of values, or synth_event_trace_array() with an 73134ed6357STom Zanussiarray of values to be set. A second option can be used to avoid the 73234ed6357STom Zanussineed for a pre-formed array of values or list of arguments, via 73334ed6357STom Zanussisynth_event_trace_start() and synth_event_trace_end() along with 73434ed6357STom Zanussisynth_event_add_next_val() or synth_event_add_val() to add the values 73534ed6357STom Zanussipiecewise. 73634ed6357STom Zanussi 737b8170fadSTom Zanussi7.2.1 Tracing a synthetic event all at once 738b8170fadSTom Zanussi------------------------------------------- 73934ed6357STom Zanussi 74034ed6357STom ZanussiTo trace a synthetic event all at once, the synth_event_trace() or 74134ed6357STom Zanussisynth_event_trace_array() functions can be used. 74234ed6357STom Zanussi 74334ed6357STom ZanussiThe synth_event_trace() function is passed the trace_event_file 74434ed6357STom Zanussirepresenting the synthetic event (which can be retrieved using 74534ed6357STom Zanussitrace_get_event_file() using the synthetic event name, "synthetic" as 74634ed6357STom Zanussithe system name, and the trace instance name (NULL if using the global 74734ed6357STom Zanussitrace array)), along with an variable number of u64 args, one for each 74834ed6357STom Zanussisynthetic event field, and the number of values being passed. 74934ed6357STom Zanussi 75034ed6357STom ZanussiSo, to trace an event corresponding to the synthetic event definition 7518206de7dSMauro Carvalho Chehababove, code like the following could be used:: 75234ed6357STom Zanussi 75334ed6357STom Zanussi ret = synth_event_trace(create_synth_test, 7, /* number of values */ 75434ed6357STom Zanussi 444, /* next_pid_field */ 75534ed6357STom Zanussi (u64)"clackers", /* next_comm_field */ 75634ed6357STom Zanussi 1000000, /* ts_ns */ 75734ed6357STom Zanussi 1000, /* ts_ms */ 75834ed6357STom Zanussi smp_processor_id(),/* cpu */ 75934ed6357STom Zanussi (u64)"Thneed", /* my_string_field */ 76034ed6357STom Zanussi 999); /* my_int_field */ 76134ed6357STom Zanussi 76234ed6357STom ZanussiAll vals should be cast to u64, and string vals are just pointers to 76334ed6357STom Zanussistrings, cast to u64. Strings will be copied into space reserved in 76434ed6357STom Zanussithe event for the string, using these pointers. 76534ed6357STom Zanussi 76634ed6357STom ZanussiAlternatively, the synth_event_trace_array() function can be used to 76734ed6357STom Zanussiaccomplish the same thing. It is passed the trace_event_file 76834ed6357STom Zanussirepresenting the synthetic event (which can be retrieved using 76934ed6357STom Zanussitrace_get_event_file() using the synthetic event name, "synthetic" as 77034ed6357STom Zanussithe system name, and the trace instance name (NULL if using the global 77134ed6357STom Zanussitrace array)), along with an array of u64, one for each synthetic 77234ed6357STom Zanussievent field. 77334ed6357STom Zanussi 77434ed6357STom ZanussiTo trace an event corresponding to the synthetic event definition 7758206de7dSMauro Carvalho Chehababove, code like the following could be used:: 77634ed6357STom Zanussi 77734ed6357STom Zanussi u64 vals[7]; 77834ed6357STom Zanussi 77934ed6357STom Zanussi vals[0] = 777; /* next_pid_field */ 78034ed6357STom Zanussi vals[1] = (u64)"tiddlywinks"; /* next_comm_field */ 78134ed6357STom Zanussi vals[2] = 1000000; /* ts_ns */ 78234ed6357STom Zanussi vals[3] = 1000; /* ts_ms */ 78334ed6357STom Zanussi vals[4] = smp_processor_id(); /* cpu */ 78434ed6357STom Zanussi vals[5] = (u64)"thneed"; /* my_string_field */ 78534ed6357STom Zanussi vals[6] = 398; /* my_int_field */ 78634ed6357STom Zanussi 78734ed6357STom ZanussiThe 'vals' array is just an array of u64, the number of which must 78834ed6357STom Zanussimatch the number of field in the synthetic event, and which must be in 78934ed6357STom Zanussithe same order as the synthetic event fields. 79034ed6357STom Zanussi 79134ed6357STom ZanussiAll vals should be cast to u64, and string vals are just pointers to 79234ed6357STom Zanussistrings, cast to u64. Strings will be copied into space reserved in 79334ed6357STom Zanussithe event for the string, using these pointers. 79434ed6357STom Zanussi 79534ed6357STom ZanussiIn order to trace a synthetic event, a pointer to the trace event file 79634ed6357STom Zanussiis needed. The trace_get_event_file() function can be used to get 79734ed6357STom Zanussiit - it will find the file in the given trace instance (in this case 79834ed6357STom ZanussiNULL since the top trace array is being used) while at the same time 7998206de7dSMauro Carvalho Chehabpreventing the instance containing it from going away:: 80034ed6357STom Zanussi 80134ed6357STom Zanussi schedtest_event_file = trace_get_event_file(NULL, "synthetic", 80234ed6357STom Zanussi "schedtest"); 80334ed6357STom Zanussi 80434ed6357STom ZanussiBefore tracing the event, it should be enabled in some way, otherwise 80534ed6357STom Zanussithe synthetic event won't actually show up in the trace buffer. 80634ed6357STom Zanussi 80734ed6357STom ZanussiTo enable a synthetic event from the kernel, trace_array_set_clr_event() 80834ed6357STom Zanussican be used (which is not specific to synthetic events, so does need 80934ed6357STom Zanussithe "synthetic" system name to be specified explicitly). 81034ed6357STom Zanussi 8118206de7dSMauro Carvalho ChehabTo enable the event, pass 'true' to it:: 81234ed6357STom Zanussi 81334ed6357STom Zanussi trace_array_set_clr_event(schedtest_event_file->tr, 81434ed6357STom Zanussi "synthetic", "schedtest", true); 81534ed6357STom Zanussi 8168206de7dSMauro Carvalho ChehabTo disable it pass false:: 81734ed6357STom Zanussi 81834ed6357STom Zanussi trace_array_set_clr_event(schedtest_event_file->tr, 81934ed6357STom Zanussi "synthetic", "schedtest", false); 82034ed6357STom Zanussi 82134ed6357STom ZanussiFinally, synth_event_trace_array() can be used to actually trace the 8228206de7dSMauro Carvalho Chehabevent, which should be visible in the trace buffer afterwards:: 82334ed6357STom Zanussi 82434ed6357STom Zanussi ret = synth_event_trace_array(schedtest_event_file, vals, 82534ed6357STom Zanussi ARRAY_SIZE(vals)); 82634ed6357STom Zanussi 82734ed6357STom ZanussiTo remove the synthetic event, the event should be disabled, and the 8288206de7dSMauro Carvalho Chehabtrace instance should be 'put' back using trace_put_event_file():: 82934ed6357STom Zanussi 83034ed6357STom Zanussi trace_array_set_clr_event(schedtest_event_file->tr, 83134ed6357STom Zanussi "synthetic", "schedtest", false); 83234ed6357STom Zanussi trace_put_event_file(schedtest_event_file); 83334ed6357STom Zanussi 83434ed6357STom ZanussiIf those have been successful, synth_event_delete() can be called to 8358206de7dSMauro Carvalho Chehabremove the event:: 83634ed6357STom Zanussi 83734ed6357STom Zanussi ret = synth_event_delete("schedtest"); 83834ed6357STom Zanussi 839b8170fadSTom Zanussi7.2.2 Tracing a synthetic event piecewise 840b8170fadSTom Zanussi----------------------------------------- 84134ed6357STom Zanussi 84234ed6357STom ZanussiTo trace a synthetic using the piecewise method described above, the 84334ed6357STom Zanussisynth_event_trace_start() function is used to 'open' the synthetic 8448206de7dSMauro Carvalho Chehabevent trace:: 84534ed6357STom Zanussi 846301de546SArtem Bityutskiy struct synth_event_trace_state trace_state; 84734ed6357STom Zanussi 84834ed6357STom Zanussi ret = synth_event_trace_start(schedtest_event_file, &trace_state); 84934ed6357STom Zanussi 85034ed6357STom ZanussiIt's passed the trace_event_file representing the synthetic event 85134ed6357STom Zanussiusing the same methods as described above, along with a pointer to a 852301de546SArtem Bityutskiystruct synth_event_trace_state object, which will be zeroed before use and 85334ed6357STom Zanussiused to maintain state between this and following calls. 85434ed6357STom Zanussi 85534ed6357STom ZanussiOnce the event has been opened, which means space for it has been 85634ed6357STom Zanussireserved in the trace buffer, the individual fields can be set. There 85734ed6357STom Zanussiare two ways to do that, either one after another for each field in 85834ed6357STom Zanussithe event, which requires no lookups, or by name, which does. The 85934ed6357STom Zanussitradeoff is flexibility in doing the assignments vs the cost of a 86034ed6357STom Zanussilookup per field. 86134ed6357STom Zanussi 86234ed6357STom ZanussiTo assign the values one after the other without lookups, 86334ed6357STom Zanussisynth_event_add_next_val() should be used. Each call is passed the 864301de546SArtem Bityutskiysame synth_event_trace_state object used in the synth_event_trace_start(), 86534ed6357STom Zanussialong with the value to set the next field in the event. After each 86634ed6357STom Zanussifield is set, the 'cursor' points to the next field, which will be set 86734ed6357STom Zanussiby the subsequent call, continuing until all the fields have been set 86834ed6357STom Zanussiin order. The same sequence of calls as in the above examples using 8698206de7dSMauro Carvalho Chehabthis method would be (without error-handling code):: 87034ed6357STom Zanussi 87134ed6357STom Zanussi /* next_pid_field */ 87234ed6357STom Zanussi ret = synth_event_add_next_val(777, &trace_state); 87334ed6357STom Zanussi 87434ed6357STom Zanussi /* next_comm_field */ 87534ed6357STom Zanussi ret = synth_event_add_next_val((u64)"slinky", &trace_state); 87634ed6357STom Zanussi 87734ed6357STom Zanussi /* ts_ns */ 87834ed6357STom Zanussi ret = synth_event_add_next_val(1000000, &trace_state); 87934ed6357STom Zanussi 88034ed6357STom Zanussi /* ts_ms */ 88134ed6357STom Zanussi ret = synth_event_add_next_val(1000, &trace_state); 88234ed6357STom Zanussi 88334ed6357STom Zanussi /* cpu */ 88434ed6357STom Zanussi ret = synth_event_add_next_val(smp_processor_id(), &trace_state); 88534ed6357STom Zanussi 88634ed6357STom Zanussi /* my_string_field */ 88734ed6357STom Zanussi ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state); 88834ed6357STom Zanussi 88934ed6357STom Zanussi /* my_int_field */ 89034ed6357STom Zanussi ret = synth_event_add_next_val(395, &trace_state); 89134ed6357STom Zanussi 89234ed6357STom ZanussiTo assign the values in any order, synth_event_add_val() should be 893301de546SArtem Bityutskiyused. Each call is passed the same synth_event_trace_state object used in 89434ed6357STom Zanussithe synth_event_trace_start(), along with the field name of the field 89534ed6357STom Zanussito set and the value to set it to. The same sequence of calls as in 89634ed6357STom Zanussithe above examples using this method would be (without error-handling 8978206de7dSMauro Carvalho Chehabcode):: 89834ed6357STom Zanussi 89934ed6357STom Zanussi ret = synth_event_add_val("next_pid_field", 777, &trace_state); 90034ed6357STom Zanussi ret = synth_event_add_val("next_comm_field", (u64)"silly putty", 90134ed6357STom Zanussi &trace_state); 90234ed6357STom Zanussi ret = synth_event_add_val("ts_ns", 1000000, &trace_state); 90334ed6357STom Zanussi ret = synth_event_add_val("ts_ms", 1000, &trace_state); 90434ed6357STom Zanussi ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state); 90534ed6357STom Zanussi ret = synth_event_add_val("my_string_field", (u64)"thneed_9", 90634ed6357STom Zanussi &trace_state); 90734ed6357STom Zanussi ret = synth_event_add_val("my_int_field", 3999, &trace_state); 90834ed6357STom Zanussi 90934ed6357STom ZanussiNote that synth_event_add_next_val() and synth_event_add_val() are 91034ed6357STom Zanussiincompatible if used within the same trace of an event - either one 91134ed6357STom Zanussican be used but not both at the same time. 91234ed6357STom Zanussi 91334ed6357STom ZanussiFinally, the event won't be actually traced until it's 'closed', 91434ed6357STom Zanussiwhich is done using synth_event_trace_end(), which takes only the 915301de546SArtem Bityutskiystruct synth_event_trace_state object used in the previous calls:: 91634ed6357STom Zanussi 91734ed6357STom Zanussi ret = synth_event_trace_end(&trace_state); 91834ed6357STom Zanussi 91934ed6357STom ZanussiNote that synth_event_trace_end() must be called at the end regardless 92034ed6357STom Zanussiof whether any of the add calls failed (say due to a bad field name 92134ed6357STom Zanussibeing passed in). 92234ed6357STom Zanussi 923b8170fadSTom Zanussi7.3 Dyamically creating kprobe and kretprobe event definitions 924b8170fadSTom Zanussi-------------------------------------------------------------- 92534ed6357STom Zanussi 92634ed6357STom ZanussiTo create a kprobe or kretprobe trace event from kernel code, the 92734ed6357STom Zanussikprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start() 92834ed6357STom Zanussifunctions can be used. 92934ed6357STom Zanussi 93034ed6357STom ZanussiTo create a kprobe event, an empty or partially empty kprobe event 93134ed6357STom Zanussishould first be created using kprobe_event_gen_cmd_start(). The name 932d56b699dSBjorn Helgaasof the event and the probe location should be specified along with one 93334ed6357STom Zanussior args each representing a probe field should be supplied to this 93434ed6357STom Zanussifunction. Before calling kprobe_event_gen_cmd_start(), the user 93534ed6357STom Zanussishould create and initialize a dynevent_cmd object using 93634ed6357STom Zanussikprobe_event_cmd_init(). 93734ed6357STom Zanussi 9388206de7dSMauro Carvalho ChehabFor example, to create a new "schedtest" kprobe event with two fields:: 93934ed6357STom Zanussi 94034ed6357STom Zanussi struct dynevent_cmd cmd; 94134ed6357STom Zanussi char *buf; 94234ed6357STom Zanussi 94334ed6357STom Zanussi /* Create a buffer to hold the generated command */ 94434ed6357STom Zanussi buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 94534ed6357STom Zanussi 94634ed6357STom Zanussi /* Before generating the command, initialize the cmd object */ 94734ed6357STom Zanussi kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); 94834ed6357STom Zanussi 94934ed6357STom Zanussi /* 95034ed6357STom Zanussi * Define the gen_kprobe_test event with the first 2 kprobe 95134ed6357STom Zanussi * fields. 95234ed6357STom Zanussi */ 95334ed6357STom Zanussi ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open", 95434ed6357STom Zanussi "dfd=%ax", "filename=%dx"); 95534ed6357STom Zanussi 95634ed6357STom ZanussiOnce the kprobe event object has been created, it can then be 95734ed6357STom Zanussipopulated with more fields. Fields can be added using 95834ed6357STom Zanussikprobe_event_add_fields(), supplying the dynevent_cmd object along 95934ed6357STom Zanussiwith a variable arg list of probe fields. For example, to add a 9608206de7dSMauro Carvalho Chehabcouple additional fields, the following call could be made:: 96134ed6357STom Zanussi 96234ed6357STom Zanussi ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)"); 96334ed6357STom Zanussi 96434ed6357STom ZanussiOnce all the fields have been added, the event should be finalized and 96534ed6357STom Zanussiregistered by calling the kprobe_event_gen_cmd_end() or 96634ed6357STom Zanussikretprobe_event_gen_cmd_end() functions, depending on whether a kprobe 9678206de7dSMauro Carvalho Chehabor kretprobe command was started:: 96834ed6357STom Zanussi 96934ed6357STom Zanussi ret = kprobe_event_gen_cmd_end(&cmd); 97034ed6357STom Zanussi 9718206de7dSMauro Carvalho Chehabor:: 97234ed6357STom Zanussi 97334ed6357STom Zanussi ret = kretprobe_event_gen_cmd_end(&cmd); 97434ed6357STom Zanussi 97534ed6357STom ZanussiAt this point, the event object is ready to be used for tracing new 97634ed6357STom Zanussievents. 97734ed6357STom Zanussi 97834ed6357STom ZanussiSimilarly, a kretprobe event can be created using 97934ed6357STom Zanussikretprobe_event_gen_cmd_start() with a probe name and location and 9808206de7dSMauro Carvalho Chehabadditional params such as $retval:: 98134ed6357STom Zanussi 98234ed6357STom Zanussi ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test", 98334ed6357STom Zanussi "do_sys_open", "$retval"); 98434ed6357STom Zanussi 98534ed6357STom ZanussiSimilar to the synthetic event case, code like the following can be 9868206de7dSMauro Carvalho Chehabused to enable the newly created kprobe event:: 98734ed6357STom Zanussi 98834ed6357STom Zanussi gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test"); 98934ed6357STom Zanussi 99034ed6357STom Zanussi ret = trace_array_set_clr_event(gen_kprobe_test->tr, 99134ed6357STom Zanussi "kprobes", "gen_kprobe_test", true); 99234ed6357STom Zanussi 99334ed6357STom ZanussiFinally, also similar to synthetic events, the following code can be 9948206de7dSMauro Carvalho Chehabused to give the kprobe event file back and delete the event:: 99534ed6357STom Zanussi 99634ed6357STom Zanussi trace_put_event_file(gen_kprobe_test); 99734ed6357STom Zanussi 99834ed6357STom Zanussi ret = kprobe_event_delete("gen_kprobe_test"); 99934ed6357STom Zanussi 1000b8170fadSTom Zanussi7.4 The "dynevent_cmd" low-level API 1001b8170fadSTom Zanussi------------------------------------ 100234ed6357STom Zanussi 100334ed6357STom ZanussiBoth the in-kernel synthetic event and kprobe interfaces are built on 100434ed6357STom Zanussitop of a lower-level "dynevent_cmd" interface. This interface is 100534ed6357STom Zanussimeant to provide the basis for higher-level interfaces such as the 100634ed6357STom Zanussisynthetic and kprobe interfaces, which can be used as examples. 100734ed6357STom Zanussi 100834ed6357STom ZanussiThe basic idea is simple and amounts to providing a general-purpose 100934ed6357STom Zanussilayer that can be used to generate trace event commands. The 101034ed6357STom Zanussigenerated command strings can then be passed to the command-parsing 101134ed6357STom Zanussiand event creation code that already exists in the trace event 1012d56b699dSBjorn Helgaassubsystem for creating the corresponding trace events. 101334ed6357STom Zanussi 101434ed6357STom ZanussiIn a nutshell, the way it works is that the higher-level interface 101534ed6357STom Zanussicode creates a struct dynevent_cmd object, then uses a couple 101634ed6357STom Zanussifunctions, dynevent_arg_add() and dynevent_arg_pair_add() to build up 101734ed6357STom Zanussia command string, which finally causes the command to be executed 101834ed6357STom Zanussiusing the dynevent_create() function. The details of the interface 101934ed6357STom Zanussiare described below. 102034ed6357STom Zanussi 102134ed6357STom ZanussiThe first step in building a new command string is to create and 102234ed6357STom Zanussiinitialize an instance of a dynevent_cmd. Here, for instance, we 10238206de7dSMauro Carvalho Chehabcreate a dynevent_cmd on the stack and initialize it:: 102434ed6357STom Zanussi 102534ed6357STom Zanussi struct dynevent_cmd cmd; 102634ed6357STom Zanussi char *buf; 102734ed6357STom Zanussi int ret; 102834ed6357STom Zanussi 102934ed6357STom Zanussi buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 103034ed6357STom Zanussi 103134ed6357STom Zanussi dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO, 103234ed6357STom Zanussi foo_event_run_command); 103334ed6357STom Zanussi 103434ed6357STom ZanussiThe dynevent_cmd initialization needs to be given a user-specified 103534ed6357STom Zanussibuffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used 103634ed6357STom Zanussifor this purpose - at 2k it's generally too big to be comfortably put 103734ed6357STom Zanussion the stack, so is dynamically allocated), a dynevent type id, which 103834ed6357STom Zanussiis meant to be used to check that further API calls are for the 103934ed6357STom Zanussicorrect command type, and a pointer to an event-specific run_command() 104034ed6357STom Zanussicallback that will be called to actually execute the event-specific 104134ed6357STom Zanussicommand function. 104234ed6357STom Zanussi 104334ed6357STom ZanussiOnce that's done, the command string can by built up by successive 104434ed6357STom Zanussicalls to argument-adding functions. 104534ed6357STom Zanussi 104634ed6357STom ZanussiTo add a single argument, define and initialize a struct dynevent_arg 104734ed6357STom Zanussior struct dynevent_arg_pair object. Here's an example of the simplest 104834ed6357STom Zanussipossible arg addition, which is simply to append the given string as 10498206de7dSMauro Carvalho Chehaba whitespace-separated argument to the command:: 105034ed6357STom Zanussi 105134ed6357STom Zanussi struct dynevent_arg arg; 105234ed6357STom Zanussi 105334ed6357STom Zanussi dynevent_arg_init(&arg, NULL, 0); 105434ed6357STom Zanussi 105534ed6357STom Zanussi arg.str = name; 105634ed6357STom Zanussi 105734ed6357STom Zanussi ret = dynevent_arg_add(cmd, &arg); 105834ed6357STom Zanussi 105934ed6357STom ZanussiThe arg object is first initialized using dynevent_arg_init() and in 106034ed6357STom Zanussithis case the parameters are NULL or 0, which means there's no 106134ed6357STom Zanussioptional sanity-checking function or separator appended to the end of 106234ed6357STom Zanussithe arg. 106334ed6357STom Zanussi 106434ed6357STom ZanussiHere's another more complicated example using an 'arg pair', which is 106534ed6357STom Zanussiused to create an argument that consists of a couple components added 106634ed6357STom Zanussitogether as a unit, for example, a 'type field_name;' arg or a simple 10678206de7dSMauro Carvalho Chehabexpression arg e.g. 'flags=%cx':: 106834ed6357STom Zanussi 106934ed6357STom Zanussi struct dynevent_arg_pair arg_pair; 107034ed6357STom Zanussi 107134ed6357STom Zanussi dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';'); 107234ed6357STom Zanussi 107334ed6357STom Zanussi arg_pair.lhs = type; 107434ed6357STom Zanussi arg_pair.rhs = name; 107534ed6357STom Zanussi 107634ed6357STom Zanussi ret = dynevent_arg_pair_add(cmd, &arg_pair); 107734ed6357STom Zanussi 107834ed6357STom ZanussiAgain, the arg_pair is first initialized, in this case with a callback 107934ed6357STom Zanussifunction used to check the sanity of the args (for example, that 108034ed6357STom Zanussineither part of the pair is NULL), along with a character to be used 108134ed6357STom Zanussito add an operator between the pair (here none) and a separator to be 108234ed6357STom Zanussiappended onto the end of the arg pair (here ';'). 108334ed6357STom Zanussi 108434ed6357STom ZanussiThere's also a dynevent_str_add() function that can be used to simply 1085d56b699dSBjorn Helgaasadd a string as-is, with no spaces, delimiters, or arg check. 108634ed6357STom Zanussi 108734ed6357STom ZanussiAny number of dynevent_*_add() calls can be made to build up the string 108834ed6357STom Zanussi(until its length surpasses cmd->maxlen). When all the arguments have 108934ed6357STom Zanussibeen added and the command string is complete, the only thing left to 109034ed6357STom Zanussido is run the command, which happens by simply calling 10918206de7dSMauro Carvalho Chehabdynevent_create():: 109234ed6357STom Zanussi 109334ed6357STom Zanussi ret = dynevent_create(&cmd); 109434ed6357STom Zanussi 109534ed6357STom ZanussiAt that point, if the return value is 0, the dynamic event has been 109634ed6357STom Zanussicreated and is ready to use. 109734ed6357STom Zanussi 109834ed6357STom ZanussiSee the dynevent_cmd function definitions themselves for the details 109934ed6357STom Zanussiof the API. 1100