xref: /linux/Documentation/trace/events.rst (revision a1c613ae4c322ddd58d5a8539dbfba2a0380a8c0)
173d98127SChangbin Du=============
273d98127SChangbin DuEvent Tracing
373d98127SChangbin Du=============
473d98127SChangbin Du
573d98127SChangbin Du:Author: Theodore Ts'o
673d98127SChangbin Du:Updated: Li Zefan and Tom Zanussi
773d98127SChangbin Du
873d98127SChangbin Du1. Introduction
973d98127SChangbin Du===============
1073d98127SChangbin Du
11ec15872dSMauro Carvalho ChehabTracepoints (see Documentation/trace/tracepoints.rst) can be used
1273d98127SChangbin Duwithout creating custom kernel modules to register probe functions
1373d98127SChangbin Duusing the event tracing infrastructure.
1473d98127SChangbin Du
1573d98127SChangbin DuNot all tracepoints can be traced using the event tracing system;
1673d98127SChangbin Duthe kernel developer must provide code snippets which define how the
1773d98127SChangbin Dutracing information is saved into the tracing buffer, and how the
1873d98127SChangbin Dutracing information should be printed.
1973d98127SChangbin Du
2073d98127SChangbin Du2. Using Event Tracing
2173d98127SChangbin Du======================
2273d98127SChangbin Du
2373d98127SChangbin Du2.1 Via the 'set_event' interface
2473d98127SChangbin Du---------------------------------
2573d98127SChangbin Du
2673d98127SChangbin DuThe events which are available for tracing can be found in the file
272abfcd29SRoss Zwisler/sys/kernel/tracing/available_events.
2873d98127SChangbin Du
2973d98127SChangbin DuTo enable a particular event, such as 'sched_wakeup', simply echo it
302abfcd29SRoss Zwislerto /sys/kernel/tracing/set_event. For example::
3173d98127SChangbin Du
322abfcd29SRoss Zwisler	# echo sched_wakeup >> /sys/kernel/tracing/set_event
3373d98127SChangbin Du
3473d98127SChangbin Du.. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
3573d98127SChangbin Du
3673d98127SChangbin DuTo disable an event, echo the event name to the set_event file prefixed
3773d98127SChangbin Duwith an exclamation point::
3873d98127SChangbin Du
392abfcd29SRoss Zwisler	# echo '!sched_wakeup' >> /sys/kernel/tracing/set_event
4073d98127SChangbin Du
4173d98127SChangbin DuTo disable all events, echo an empty line to the set_event file::
4273d98127SChangbin Du
432abfcd29SRoss Zwisler	# echo > /sys/kernel/tracing/set_event
4473d98127SChangbin Du
456234c7bdSJonathan CorbetTo enable all events, echo ``*:*`` or ``*:`` to the set_event file::
4673d98127SChangbin Du
472abfcd29SRoss Zwisler	# echo *:* > /sys/kernel/tracing/set_event
4873d98127SChangbin Du
4973d98127SChangbin DuThe events are organized into subsystems, such as ext4, irq, sched,
5073d98127SChangbin Duetc., and a full event name looks like this: <subsystem>:<event>.  The
5173d98127SChangbin Dusubsystem name is optional, but it is displayed in the available_events
5273d98127SChangbin Dufile.  All of the events in a subsystem can be specified via the syntax
536234c7bdSJonathan Corbet``<subsystem>:*``; for example, to enable all irq events, you can use the
5473d98127SChangbin Ducommand::
5573d98127SChangbin Du
562abfcd29SRoss Zwisler	# echo 'irq:*' > /sys/kernel/tracing/set_event
5773d98127SChangbin Du
5873d98127SChangbin Du2.2 Via the 'enable' toggle
5973d98127SChangbin Du---------------------------
6073d98127SChangbin Du
612abfcd29SRoss ZwislerThe events available are also listed in /sys/kernel/tracing/events/ hierarchy
6273d98127SChangbin Duof directories.
6373d98127SChangbin Du
6473d98127SChangbin DuTo enable event 'sched_wakeup'::
6573d98127SChangbin Du
662abfcd29SRoss Zwisler	# echo 1 > /sys/kernel/tracing/events/sched/sched_wakeup/enable
6773d98127SChangbin Du
6873d98127SChangbin DuTo disable it::
6973d98127SChangbin Du
702abfcd29SRoss Zwisler	# echo 0 > /sys/kernel/tracing/events/sched/sched_wakeup/enable
7173d98127SChangbin Du
7273d98127SChangbin DuTo enable all events in sched subsystem::
7373d98127SChangbin Du
742abfcd29SRoss Zwisler	# echo 1 > /sys/kernel/tracing/events/sched/enable
7573d98127SChangbin Du
7673d98127SChangbin DuTo enable all events::
7773d98127SChangbin Du
782abfcd29SRoss Zwisler	# echo 1 > /sys/kernel/tracing/events/enable
7973d98127SChangbin Du
8073d98127SChangbin DuWhen reading one of these enable files, there are four results:
8173d98127SChangbin Du
8273d98127SChangbin Du - 0 - all events this file affects are disabled
8373d98127SChangbin Du - 1 - all events this file affects are enabled
8473d98127SChangbin Du - X - there is a mixture of events enabled and disabled
8573d98127SChangbin Du - ? - this file does not affect any event
8673d98127SChangbin Du
8773d98127SChangbin Du2.3 Boot option
8873d98127SChangbin Du---------------
8973d98127SChangbin Du
9073d98127SChangbin DuIn order to facilitate early boot debugging, use boot option::
9173d98127SChangbin Du
9273d98127SChangbin Du	trace_event=[event-list]
9373d98127SChangbin Du
9473d98127SChangbin Duevent-list is a comma separated list of events. See section 2.1 for event
9573d98127SChangbin Duformat.
9673d98127SChangbin Du
9773d98127SChangbin Du3. Defining an event-enabled tracepoint
9873d98127SChangbin Du=======================================
9973d98127SChangbin Du
10073d98127SChangbin DuSee The example provided in samples/trace_events
10173d98127SChangbin Du
10273d98127SChangbin Du4. Event formats
10373d98127SChangbin Du================
10473d98127SChangbin Du
10573d98127SChangbin DuEach trace event has a 'format' file associated with it that contains
10673d98127SChangbin Dua description of each field in a logged event.  This information can
10773d98127SChangbin Dube used to parse the binary trace stream, and is also the place to
10873d98127SChangbin Dufind the field names that can be used in event filters (see section 5).
10973d98127SChangbin Du
11073d98127SChangbin DuIt also displays the format string that will be used to print the
11173d98127SChangbin Duevent in text mode, along with the event name and ID used for
11273d98127SChangbin Duprofiling.
11373d98127SChangbin Du
1146234c7bdSJonathan CorbetEvery event has a set of ``common`` fields associated with it; these are
1156234c7bdSJonathan Corbetthe fields prefixed with ``common_``.  The other fields vary between
11673d98127SChangbin Duevents and correspond to the fields defined in the TRACE_EVENT
11773d98127SChangbin Dudefinition for that event.
11873d98127SChangbin Du
11973d98127SChangbin DuEach field in the format has the form::
12073d98127SChangbin Du
12173d98127SChangbin Du     field:field-type field-name; offset:N; size:N;
12273d98127SChangbin Du
12373d98127SChangbin Duwhere offset is the offset of the field in the trace record and size
12473d98127SChangbin Duis the size of the data item, in bytes.
12573d98127SChangbin Du
12673d98127SChangbin DuFor example, here's the information displayed for the 'sched_wakeup'
12773d98127SChangbin Duevent::
12873d98127SChangbin Du
1292abfcd29SRoss Zwisler	# cat /sys/kernel/tracing/events/sched/sched_wakeup/format
13073d98127SChangbin Du
13173d98127SChangbin Du	name: sched_wakeup
13273d98127SChangbin Du	ID: 60
13373d98127SChangbin Du	format:
13473d98127SChangbin Du		field:unsigned short common_type;	offset:0;	size:2;
13573d98127SChangbin Du		field:unsigned char common_flags;	offset:2;	size:1;
13673d98127SChangbin Du		field:unsigned char common_preempt_count;	offset:3;	size:1;
13773d98127SChangbin Du		field:int common_pid;	offset:4;	size:4;
13873d98127SChangbin Du		field:int common_tgid;	offset:8;	size:4;
13973d98127SChangbin Du
14073d98127SChangbin Du		field:char comm[TASK_COMM_LEN];	offset:12;	size:16;
14173d98127SChangbin Du		field:pid_t pid;	offset:28;	size:4;
14273d98127SChangbin Du		field:int prio;	offset:32;	size:4;
14373d98127SChangbin Du		field:int success;	offset:36;	size:4;
14473d98127SChangbin Du		field:int cpu;	offset:40;	size:4;
14573d98127SChangbin Du
14673d98127SChangbin Du	print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
14773d98127SChangbin Du		   REC->prio, REC->success, REC->cpu
14873d98127SChangbin Du
14973d98127SChangbin DuThis event contains 10 fields, the first 5 common and the remaining 5
15073d98127SChangbin Duevent-specific.  All the fields for this event are numeric, except for
15173d98127SChangbin Du'comm' which is a string, a distinction important for event filtering.
15273d98127SChangbin Du
15373d98127SChangbin Du5. Event filtering
15473d98127SChangbin Du==================
15573d98127SChangbin Du
15673d98127SChangbin DuTrace events can be filtered in the kernel by associating boolean
15773d98127SChangbin Du'filter expressions' with them.  As soon as an event is logged into
15873d98127SChangbin Duthe trace buffer, its fields are checked against the filter expression
15973d98127SChangbin Duassociated with that event type.  An event with field values that
16073d98127SChangbin Du'match' the filter will appear in the trace output, and an event whose
16173d98127SChangbin Duvalues don't match will be discarded.  An event with no filter
16273d98127SChangbin Duassociated with it matches everything, and is the default when no
16373d98127SChangbin Dufilter has been set for an event.
16473d98127SChangbin Du
16573d98127SChangbin Du5.1 Expression syntax
16673d98127SChangbin Du---------------------
16773d98127SChangbin Du
16873d98127SChangbin DuA filter expression consists of one or more 'predicates' that can be
16973d98127SChangbin Ducombined using the logical operators '&&' and '||'.  A predicate is
17073d98127SChangbin Dusimply a clause that compares the value of a field contained within a
17173d98127SChangbin Dulogged event with a constant value and returns either 0 or 1 depending
17273d98127SChangbin Duon whether the field value matched (1) or didn't match (0)::
17373d98127SChangbin Du
17473d98127SChangbin Du	  field-name relational-operator value
17573d98127SChangbin Du
17673d98127SChangbin DuParentheses can be used to provide arbitrary logical groupings and
17773d98127SChangbin Dudouble-quotes can be used to prevent the shell from interpreting
17873d98127SChangbin Duoperators as shell metacharacters.
17973d98127SChangbin Du
18073d98127SChangbin DuThe field-names available for use in filters can be found in the
18173d98127SChangbin Du'format' files for trace events (see section 4).
18273d98127SChangbin Du
18373d98127SChangbin DuThe relational-operators depend on the type of the field being tested:
18473d98127SChangbin Du
18573d98127SChangbin DuThe operators available for numeric fields are:
18673d98127SChangbin Du
18773d98127SChangbin Du==, !=, <, <=, >, >=, &
18873d98127SChangbin Du
18973d98127SChangbin DuAnd for string fields they are:
19073d98127SChangbin Du
19173d98127SChangbin Du==, !=, ~
19273d98127SChangbin Du
1936234c7bdSJonathan CorbetThe glob (~) accepts a wild card character (\*,?) and character classes
19473d98127SChangbin Du([). For example::
19573d98127SChangbin Du
19673d98127SChangbin Du  prev_comm ~ "*sh"
19773d98127SChangbin Du  prev_comm ~ "sh*"
19873d98127SChangbin Du  prev_comm ~ "*sh*"
19973d98127SChangbin Du  prev_comm ~ "ba*sh"
20073d98127SChangbin Du
201f37c3bbcSSteven RostedtIf the field is a pointer that points into user space (for example
202f37c3bbcSSteven Rostedt"filename" from sys_enter_openat), then you have to append ".ustring" to the
203f37c3bbcSSteven Rostedtfield name::
204f37c3bbcSSteven Rostedt
205f37c3bbcSSteven Rostedt  filename.ustring ~ "password"
206f37c3bbcSSteven Rostedt
207f37c3bbcSSteven RostedtAs the kernel will have to know how to retrieve the memory that the pointer
208f37c3bbcSSteven Rostedtis at from user space.
209f37c3bbcSSteven Rostedt
210e6745a4dSSteven Rostedt (Google)You can convert any long type to a function address and search by function name::
211e6745a4dSSteven Rostedt (Google)
212e6745a4dSSteven Rostedt (Google)  call_site.function == security_prepare_creds
213e6745a4dSSteven Rostedt (Google)
214e6745a4dSSteven Rostedt (Google)The above will filter when the field "call_site" falls on the address within
215e6745a4dSSteven Rostedt (Google)"security_prepare_creds". That is, it will compare the value of "call_site" and
216e6745a4dSSteven Rostedt (Google)the filter will return true if it is greater than or equal to the start of
217e6745a4dSSteven Rostedt (Google)the function "security_prepare_creds" and less than the end of that function.
218e6745a4dSSteven Rostedt (Google)
219e6745a4dSSteven Rostedt (Google)The ".function" postfix can only be attached to values of size long, and can only
220e6745a4dSSteven Rostedt (Google)be compared with "==" or "!=".
221e6745a4dSSteven Rostedt (Google)
222*fa828efbSValentin SchneiderCpumask fields or scalar fields that encode a CPU number can be filtered using
223*fa828efbSValentin Schneidera user-provided cpumask in cpulist format. The format is as follows::
224*fa828efbSValentin Schneider
225*fa828efbSValentin Schneider  CPUS{$cpulist}
226*fa828efbSValentin Schneider
227*fa828efbSValentin SchneiderOperators available to cpumask filtering are:
228*fa828efbSValentin Schneider
229*fa828efbSValentin Schneider& (intersection), ==, !=
230*fa828efbSValentin Schneider
231*fa828efbSValentin SchneiderFor example, this will filter events that have their .target_cpu field present
232*fa828efbSValentin Schneiderin the given cpumask::
233*fa828efbSValentin Schneider
234*fa828efbSValentin Schneider  target_cpu & CPUS{17-42}
235*fa828efbSValentin Schneider
23673d98127SChangbin Du5.2 Setting filters
23773d98127SChangbin Du-------------------
23873d98127SChangbin Du
23973d98127SChangbin DuA filter for an individual event is set by writing a filter expression
24073d98127SChangbin Duto the 'filter' file for the given event.
24173d98127SChangbin Du
24273d98127SChangbin DuFor example::
24373d98127SChangbin Du
2442abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/sched/sched_wakeup
24573d98127SChangbin Du	# echo "common_preempt_count > 4" > filter
24673d98127SChangbin Du
24773d98127SChangbin DuA slightly more involved example::
24873d98127SChangbin Du
2492abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/signal/signal_generate
25073d98127SChangbin Du	# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
25173d98127SChangbin Du
25273d98127SChangbin DuIf there is an error in the expression, you'll get an 'Invalid
25373d98127SChangbin Duargument' error when setting it, and the erroneous string along with
25473d98127SChangbin Duan error message can be seen by looking at the filter e.g.::
25573d98127SChangbin Du
2562abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/signal/signal_generate
25773d98127SChangbin Du	# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
25873d98127SChangbin Du	-bash: echo: write error: Invalid argument
25973d98127SChangbin Du	# cat filter
26073d98127SChangbin Du	((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
26173d98127SChangbin Du	^
26273d98127SChangbin Du	parse_error: Field not found
26373d98127SChangbin Du
26473d98127SChangbin DuCurrently the caret ('^') for an error always appears at the beginning of
26573d98127SChangbin Duthe filter string; the error message should still be useful though
26673d98127SChangbin Dueven without more accurate position info.
26773d98127SChangbin Du
26877360f9bSSteven Rostedt5.2.1 Filter limitations
26977360f9bSSteven Rostedt------------------------
27077360f9bSSteven Rostedt
27177360f9bSSteven RostedtIf a filter is placed on a string pointer ``(char *)`` that does not point
27277360f9bSSteven Rostedtto a string on the ring buffer, but instead points to kernel or user space
27377360f9bSSteven Rostedtmemory, then, for safety reasons, at most 1024 bytes of the content is
27477360f9bSSteven Rostedtcopied onto a temporary buffer to do the compare. If the copy of the memory
27577360f9bSSteven Rostedtfaults (the pointer points to memory that should not be accessed), then the
27677360f9bSSteven Rostedtstring compare will be treated as not matching.
27777360f9bSSteven Rostedt
27873d98127SChangbin Du5.3 Clearing filters
27973d98127SChangbin Du--------------------
28073d98127SChangbin Du
28173d98127SChangbin DuTo clear the filter for an event, write a '0' to the event's filter
28273d98127SChangbin Dufile.
28373d98127SChangbin Du
28473d98127SChangbin DuTo clear the filters for all events in a subsystem, write a '0' to the
28573d98127SChangbin Dusubsystem's filter file.
28673d98127SChangbin Du
28771240f94SDonglin Peng5.4 Subsystem filters
28873d98127SChangbin Du---------------------
28973d98127SChangbin Du
29073d98127SChangbin DuFor convenience, filters for every event in a subsystem can be set or
29173d98127SChangbin Ducleared as a group by writing a filter expression into the filter file
29273d98127SChangbin Duat the root of the subsystem.  Note however, that if a filter for any
29373d98127SChangbin Duevent within the subsystem lacks a field specified in the subsystem
29473d98127SChangbin Dufilter, or if the filter can't be applied for any other reason, the
29573d98127SChangbin Dufilter for that event will retain its previous setting.  This can
29673d98127SChangbin Duresult in an unintended mixture of filters which could lead to
29773d98127SChangbin Duconfusing (to the user who might think different filters are in
29873d98127SChangbin Dueffect) trace output.  Only filters that reference just the common
29973d98127SChangbin Dufields can be guaranteed to propagate successfully to all events.
30073d98127SChangbin Du
30173d98127SChangbin DuHere are a few subsystem filter examples that also illustrate the
30273d98127SChangbin Duabove points:
30373d98127SChangbin Du
30473d98127SChangbin DuClear the filters on all events in the sched subsystem::
30573d98127SChangbin Du
3062abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/sched
30773d98127SChangbin Du	# echo 0 > filter
30873d98127SChangbin Du	# cat sched_switch/filter
30973d98127SChangbin Du	none
31073d98127SChangbin Du	# cat sched_wakeup/filter
31173d98127SChangbin Du	none
31273d98127SChangbin Du
31373d98127SChangbin DuSet a filter using only common fields for all events in the sched
31473d98127SChangbin Dusubsystem (all events end up with the same filter)::
31573d98127SChangbin Du
3162abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/sched
31773d98127SChangbin Du	# echo common_pid == 0 > filter
31873d98127SChangbin Du	# cat sched_switch/filter
31973d98127SChangbin Du	common_pid == 0
32073d98127SChangbin Du	# cat sched_wakeup/filter
32173d98127SChangbin Du	common_pid == 0
32273d98127SChangbin Du
32373d98127SChangbin DuAttempt to set a filter using a non-common field for all events in the
32473d98127SChangbin Dusched subsystem (all events but those that have a prev_pid field retain
32573d98127SChangbin Dutheir old filters)::
32673d98127SChangbin Du
3272abfcd29SRoss Zwisler	# cd /sys/kernel/tracing/events/sched
32873d98127SChangbin Du	# echo prev_pid == 0 > filter
32973d98127SChangbin Du	# cat sched_switch/filter
33073d98127SChangbin Du	prev_pid == 0
33173d98127SChangbin Du	# cat sched_wakeup/filter
33273d98127SChangbin Du	common_pid == 0
33373d98127SChangbin Du
33471240f94SDonglin Peng5.5 PID filtering
33573d98127SChangbin Du-----------------
33673d98127SChangbin Du
33773d98127SChangbin DuThe set_event_pid file in the same directory as the top events directory
33873d98127SChangbin Duexists, will filter all events from tracing any task that does not have the
33973d98127SChangbin DuPID listed in the set_event_pid file.
34073d98127SChangbin Du::
34173d98127SChangbin Du
3422abfcd29SRoss Zwisler	# cd /sys/kernel/tracing
34373d98127SChangbin Du	# echo $$ > set_event_pid
34473d98127SChangbin Du	# echo 1 > events/enable
34573d98127SChangbin Du
34673d98127SChangbin DuWill only trace events for the current task.
34773d98127SChangbin Du
34873d98127SChangbin DuTo add more PIDs without losing the PIDs already included, use '>>'.
34973d98127SChangbin Du::
35073d98127SChangbin Du
35173d98127SChangbin Du	# echo 123 244 1 >> set_event_pid
35273d98127SChangbin Du
35373d98127SChangbin Du
35473d98127SChangbin Du6. Event triggers
35573d98127SChangbin Du=================
35673d98127SChangbin Du
35773d98127SChangbin DuTrace events can be made to conditionally invoke trigger 'commands'
35873d98127SChangbin Duwhich can take various forms and are described in detail below;
35973d98127SChangbin Duexamples would be enabling or disabling other trace events or invoking
36073d98127SChangbin Dua stack trace whenever the trace event is hit.  Whenever a trace event
36173d98127SChangbin Duwith attached triggers is invoked, the set of trigger commands
36273d98127SChangbin Duassociated with that event is invoked.  Any given trigger can
36373d98127SChangbin Duadditionally have an event filter of the same form as described in
36473d98127SChangbin Dusection 5 (Event filtering) associated with it - the command will only
36573d98127SChangbin Dube invoked if the event being invoked passes the associated filter.
36673d98127SChangbin DuIf no filter is associated with the trigger, it always passes.
36773d98127SChangbin Du
36873d98127SChangbin DuTriggers are added to and removed from a particular event by writing
36973d98127SChangbin Dutrigger expressions to the 'trigger' file for the given event.
37073d98127SChangbin Du
37173d98127SChangbin DuA given event can have any number of triggers associated with it,
37273d98127SChangbin Dusubject to any restrictions that individual commands may have in that
37373d98127SChangbin Duregard.
37473d98127SChangbin Du
37573d98127SChangbin DuEvent triggers are implemented on top of "soft" mode, which means that
37673d98127SChangbin Duwhenever a trace event has one or more triggers associated with it,
37773d98127SChangbin Duthe event is activated even if it isn't actually enabled, but is
37873d98127SChangbin Dudisabled in a "soft" mode.  That is, the tracepoint will be called,
37973d98127SChangbin Dubut just will not be traced, unless of course it's actually enabled.
38073d98127SChangbin DuThis scheme allows triggers to be invoked even for events that aren't
38173d98127SChangbin Duenabled, and also allows the current event filter implementation to be
38273d98127SChangbin Duused for conditionally invoking triggers.
38373d98127SChangbin Du
38473d98127SChangbin DuThe syntax for event triggers is roughly based on the syntax for
38573d98127SChangbin Duset_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
386d3439f9dSSteven Rostedt (VMware)section of Documentation/trace/ftrace.rst), but there are major
38773d98127SChangbin Dudifferences and the implementation isn't currently tied to it in any
38873d98127SChangbin Duway, so beware about making generalizations between the two.
38973d98127SChangbin Du
3908206de7dSMauro Carvalho Chehab.. Note::
3918206de7dSMauro Carvalho Chehab     Writing into trace_marker (See Documentation/trace/ftrace.rst)
392d3439f9dSSteven Rostedt (VMware)     can also enable triggers that are written into
393d3439f9dSSteven Rostedt (VMware)     /sys/kernel/tracing/events/ftrace/print/trigger
394d3439f9dSSteven Rostedt (VMware)
39573d98127SChangbin Du6.1 Expression syntax
39673d98127SChangbin Du---------------------
39773d98127SChangbin Du
39873d98127SChangbin DuTriggers are added by echoing the command to the 'trigger' file::
39973d98127SChangbin Du
40073d98127SChangbin Du  # echo 'command[:count] [if filter]' > trigger
40173d98127SChangbin Du
40273d98127SChangbin DuTriggers are removed by echoing the same command but starting with '!'
40373d98127SChangbin Duto the 'trigger' file::
40473d98127SChangbin Du
40573d98127SChangbin Du  # echo '!command[:count] [if filter]' > trigger
40673d98127SChangbin Du
40773d98127SChangbin DuThe [if filter] part isn't used in matching commands when removing, so
40873d98127SChangbin Duleaving that off in a '!' command will accomplish the same thing as
40973d98127SChangbin Duhaving it in.
41073d98127SChangbin Du
41173d98127SChangbin DuThe filter syntax is the same as that described in the 'Event
41273d98127SChangbin Dufiltering' section above.
41373d98127SChangbin Du
41473d98127SChangbin DuFor ease of use, writing to the trigger file using '>' currently just
41573d98127SChangbin Duadds or removes a single trigger and there's no explicit '>>' support
41673d98127SChangbin Du('>' actually behaves like '>>') or truncation support to remove all
41773d98127SChangbin Dutriggers (you have to use '!' for each one added.)
41873d98127SChangbin Du
41973d98127SChangbin Du6.2 Supported trigger commands
42073d98127SChangbin Du------------------------------
42173d98127SChangbin Du
42273d98127SChangbin DuThe following commands are supported:
42373d98127SChangbin Du
42473d98127SChangbin Du- enable_event/disable_event
42573d98127SChangbin Du
42673d98127SChangbin Du  These commands can enable or disable another trace event whenever
42773d98127SChangbin Du  the triggering event is hit.  When these commands are registered,
42873d98127SChangbin Du  the other trace event is activated, but disabled in a "soft" mode.
42973d98127SChangbin Du  That is, the tracepoint will be called, but just will not be traced.
43073d98127SChangbin Du  The event tracepoint stays in this mode as long as there's a trigger
43173d98127SChangbin Du  in effect that can trigger it.
43273d98127SChangbin Du
43373d98127SChangbin Du  For example, the following trigger causes kmalloc events to be
43473d98127SChangbin Du  traced when a read system call is entered, and the :1 at the end
43573d98127SChangbin Du  specifies that this enablement happens only once::
43673d98127SChangbin Du
43773d98127SChangbin Du	  # echo 'enable_event:kmem:kmalloc:1' > \
4382abfcd29SRoss Zwisler	      /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger
43973d98127SChangbin Du
44073d98127SChangbin Du  The following trigger causes kmalloc events to stop being traced
44173d98127SChangbin Du  when a read system call exits.  This disablement happens on every
44273d98127SChangbin Du  read system call exit::
44373d98127SChangbin Du
44473d98127SChangbin Du	  # echo 'disable_event:kmem:kmalloc' > \
4452abfcd29SRoss Zwisler	      /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger
44673d98127SChangbin Du
44773d98127SChangbin Du  The format is::
44873d98127SChangbin Du
44973d98127SChangbin Du      enable_event:<system>:<event>[:count]
45073d98127SChangbin Du      disable_event:<system>:<event>[:count]
45173d98127SChangbin Du
45273d98127SChangbin Du  To remove the above commands::
45373d98127SChangbin Du
45473d98127SChangbin Du	  # echo '!enable_event:kmem:kmalloc:1' > \
4552abfcd29SRoss Zwisler	      /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger
45673d98127SChangbin Du
45773d98127SChangbin Du	  # echo '!disable_event:kmem:kmalloc' > \
4582abfcd29SRoss Zwisler	      /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger
45973d98127SChangbin Du
46073d98127SChangbin Du  Note that there can be any number of enable/disable_event triggers
46173d98127SChangbin Du  per triggering event, but there can only be one trigger per
46273d98127SChangbin Du  triggered event. e.g. sys_enter_read can have triggers enabling both
46373d98127SChangbin Du  kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
46473d98127SChangbin Du  versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
46573d98127SChangbin Du  bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
46673d98127SChangbin Du  could be combined into a single filter on kmem:kmalloc though).
46773d98127SChangbin Du
46873d98127SChangbin Du- stacktrace
46973d98127SChangbin Du
47073d98127SChangbin Du  This command dumps a stacktrace in the trace buffer whenever the
47173d98127SChangbin Du  triggering event occurs.
47273d98127SChangbin Du
47373d98127SChangbin Du  For example, the following trigger dumps a stacktrace every time the
47473d98127SChangbin Du  kmalloc tracepoint is hit::
47573d98127SChangbin Du
47673d98127SChangbin Du	  # echo 'stacktrace' > \
4772abfcd29SRoss Zwisler		/sys/kernel/tracing/events/kmem/kmalloc/trigger
47873d98127SChangbin Du
47973d98127SChangbin Du  The following trigger dumps a stacktrace the first 5 times a kmalloc
48073d98127SChangbin Du  request happens with a size >= 64K::
48173d98127SChangbin Du
48273d98127SChangbin Du	  # echo 'stacktrace:5 if bytes_req >= 65536' > \
4832abfcd29SRoss Zwisler		/sys/kernel/tracing/events/kmem/kmalloc/trigger
48473d98127SChangbin Du
48573d98127SChangbin Du  The format is::
48673d98127SChangbin Du
48773d98127SChangbin Du      stacktrace[:count]
48873d98127SChangbin Du
48973d98127SChangbin Du  To remove the above commands::
49073d98127SChangbin Du
49173d98127SChangbin Du	  # echo '!stacktrace' > \
4922abfcd29SRoss Zwisler		/sys/kernel/tracing/events/kmem/kmalloc/trigger
49373d98127SChangbin Du
49473d98127SChangbin Du	  # echo '!stacktrace:5 if bytes_req >= 65536' > \
4952abfcd29SRoss Zwisler		/sys/kernel/tracing/events/kmem/kmalloc/trigger
49673d98127SChangbin Du
49773d98127SChangbin Du  The latter can also be removed more simply by the following (without
49873d98127SChangbin Du  the filter)::
49973d98127SChangbin Du
50073d98127SChangbin Du	  # echo '!stacktrace:5' > \
5012abfcd29SRoss Zwisler		/sys/kernel/tracing/events/kmem/kmalloc/trigger
50273d98127SChangbin Du
50373d98127SChangbin Du  Note that there can be only one stacktrace trigger per triggering
50473d98127SChangbin Du  event.
50573d98127SChangbin Du
50673d98127SChangbin Du- snapshot
50773d98127SChangbin Du
50873d98127SChangbin Du  This command causes a snapshot to be triggered whenever the
50973d98127SChangbin Du  triggering event occurs.
51073d98127SChangbin Du
51173d98127SChangbin Du  The following command creates a snapshot every time a block request
51273d98127SChangbin Du  queue is unplugged with a depth > 1.  If you were tracing a set of
51373d98127SChangbin Du  events or functions at the time, the snapshot trace buffer would
51473d98127SChangbin Du  capture those events when the trigger event occurred::
51573d98127SChangbin Du
51673d98127SChangbin Du	  # echo 'snapshot if nr_rq > 1' > \
5172abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
51873d98127SChangbin Du
51973d98127SChangbin Du  To only snapshot once::
52073d98127SChangbin Du
52173d98127SChangbin Du	  # echo 'snapshot:1 if nr_rq > 1' > \
5222abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
52373d98127SChangbin Du
52473d98127SChangbin Du  To remove the above commands::
52573d98127SChangbin Du
52673d98127SChangbin Du	  # echo '!snapshot if nr_rq > 1' > \
5272abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
52873d98127SChangbin Du
52973d98127SChangbin Du	  # echo '!snapshot:1 if nr_rq > 1' > \
5302abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
53173d98127SChangbin Du
53273d98127SChangbin Du  Note that there can be only one snapshot trigger per triggering
53373d98127SChangbin Du  event.
53473d98127SChangbin Du
53573d98127SChangbin Du- traceon/traceoff
53673d98127SChangbin Du
53773d98127SChangbin Du  These commands turn tracing on and off when the specified events are
53873d98127SChangbin Du  hit. The parameter determines how many times the tracing system is
53973d98127SChangbin Du  turned on and off. If unspecified, there is no limit.
54073d98127SChangbin Du
54173d98127SChangbin Du  The following command turns tracing off the first time a block
54273d98127SChangbin Du  request queue is unplugged with a depth > 1.  If you were tracing a
54373d98127SChangbin Du  set of events or functions at the time, you could then examine the
54473d98127SChangbin Du  trace buffer to see the sequence of events that led up to the
54573d98127SChangbin Du  trigger event::
54673d98127SChangbin Du
54773d98127SChangbin Du	  # echo 'traceoff:1 if nr_rq > 1' > \
5482abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
54973d98127SChangbin Du
55073d98127SChangbin Du  To always disable tracing when nr_rq  > 1::
55173d98127SChangbin Du
55273d98127SChangbin Du	  # echo 'traceoff if nr_rq > 1' > \
5532abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
55473d98127SChangbin Du
55573d98127SChangbin Du  To remove the above commands::
55673d98127SChangbin Du
55773d98127SChangbin Du	  # echo '!traceoff:1 if nr_rq > 1' > \
5582abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
55973d98127SChangbin Du
56073d98127SChangbin Du	  # echo '!traceoff if nr_rq > 1' > \
5612abfcd29SRoss Zwisler		/sys/kernel/tracing/events/block/block_unplug/trigger
56273d98127SChangbin Du
56373d98127SChangbin Du  Note that there can be only one traceon or traceoff trigger per
56473d98127SChangbin Du  triggering event.
56573d98127SChangbin Du
56673d98127SChangbin Du- hist
56773d98127SChangbin Du
56873d98127SChangbin Du  This command aggregates event hits into a hash table keyed on one or
56973d98127SChangbin Du  more trace event format fields (or stacktrace) and a set of running
57073d98127SChangbin Du  totals derived from one or more trace event format fields and/or
57173d98127SChangbin Du  event counts (hitcount).
57273d98127SChangbin Du
573ea272257SMauro Carvalho Chehab  See Documentation/trace/histogram.rst for details and examples.
57434ed6357STom Zanussi
575b8170fadSTom Zanussi7. In-kernel trace event API
576b8170fadSTom Zanussi============================
57734ed6357STom Zanussi
57834ed6357STom ZanussiIn most cases, the command-line interface to trace events is more than
57934ed6357STom Zanussisufficient.  Sometimes, however, applications might find the need for
58034ed6357STom Zanussimore complex relationships than can be expressed through a simple
58134ed6357STom Zanussiseries of linked command-line expressions, or putting together sets of
58234ed6357STom Zanussicommands may be simply too cumbersome.  An example might be an
58334ed6357STom Zanussiapplication that needs to 'listen' to the trace stream in order to
58434ed6357STom Zanussimaintain an in-kernel state machine detecting, for instance, when an
58534ed6357STom Zanussiillegal kernel state occurs in the scheduler.
58634ed6357STom Zanussi
58734ed6357STom ZanussiThe trace event subsystem provides an in-kernel API allowing modules
58834ed6357STom Zanussior other kernel code to generate user-defined 'synthetic' events at
58934ed6357STom Zanussiwill, which can be used to either augment the existing trace stream
59034ed6357STom Zanussiand/or signal that a particular important state has occurred.
59134ed6357STom Zanussi
59234ed6357STom ZanussiA similar in-kernel API is also available for creating kprobe and
59334ed6357STom Zanussikretprobe events.
59434ed6357STom Zanussi
59534ed6357STom ZanussiBoth the synthetic event and k/ret/probe event APIs are built on top
59634ed6357STom Zanussiof a lower-level "dynevent_cmd" event command API, which is also
59734ed6357STom Zanussiavailable for more specialized applications, or as the basis of other
59834ed6357STom Zanussihigher-level trace event APIs.
59934ed6357STom Zanussi
60034ed6357STom ZanussiThe API provided for these purposes is describe below and allows the
60134ed6357STom Zanussifollowing:
60234ed6357STom Zanussi
60334ed6357STom Zanussi  - dynamically creating synthetic event definitions
60434ed6357STom Zanussi  - dynamically creating kprobe and kretprobe event definitions
60534ed6357STom Zanussi  - tracing synthetic events from in-kernel code
60634ed6357STom Zanussi  - the low-level "dynevent_cmd" API
60734ed6357STom Zanussi
608b8170fadSTom Zanussi7.1 Dyamically creating synthetic event definitions
609b8170fadSTom Zanussi---------------------------------------------------
61034ed6357STom Zanussi
61134ed6357STom ZanussiThere are a couple ways to create a new synthetic event from a kernel
61234ed6357STom Zanussimodule or other kernel code.
61334ed6357STom Zanussi
61434ed6357STom ZanussiThe first creates the event in one step, using synth_event_create().
61534ed6357STom ZanussiIn this method, the name of the event to create and an array defining
61634ed6357STom Zanussithe fields is supplied to synth_event_create().  If successful, a
61734ed6357STom Zanussisynthetic event with that name and fields will exist following that
6188206de7dSMauro Carvalho Chehabcall.  For example, to create a new "schedtest" synthetic event::
61934ed6357STom Zanussi
62034ed6357STom Zanussi  ret = synth_event_create("schedtest", sched_fields,
62134ed6357STom Zanussi                           ARRAY_SIZE(sched_fields), THIS_MODULE);
62234ed6357STom Zanussi
62334ed6357STom ZanussiThe sched_fields param in this example points to an array of struct
62434ed6357STom Zanussisynth_field_desc, each of which describes an event field by type and
6258206de7dSMauro Carvalho Chehabname::
62634ed6357STom Zanussi
62734ed6357STom Zanussi  static struct synth_field_desc sched_fields[] = {
62834ed6357STom Zanussi        { .type = "pid_t",              .name = "next_pid_field" },
62934ed6357STom Zanussi        { .type = "char[16]",           .name = "next_comm_field" },
63034ed6357STom Zanussi        { .type = "u64",                .name = "ts_ns" },
63134ed6357STom Zanussi        { .type = "u64",                .name = "ts_ms" },
63234ed6357STom Zanussi        { .type = "unsigned int",       .name = "cpu" },
63334ed6357STom Zanussi        { .type = "char[64]",           .name = "my_string_field" },
63434ed6357STom Zanussi        { .type = "int",                .name = "my_int_field" },
63534ed6357STom Zanussi  };
63634ed6357STom Zanussi
637bd82631dSTom ZanussiSee synth_field_size() for available types.
638bd82631dSTom Zanussi
639bd82631dSTom ZanussiIf field_name contains [n], the field is considered to be a static array.
640bd82631dSTom Zanussi
641bd82631dSTom ZanussiIf field_names contains[] (no subscript), the field is considered to
642bd82631dSTom Zanussibe a dynamic array, which will only take as much space in the event as
643bd82631dSTom Zanussiis required to hold the array.
644bd82631dSTom Zanussi
645bd82631dSTom ZanussiBecause space for an event is reserved before assigning field values
646bd82631dSTom Zanussito the event, using dynamic arrays implies that the piecewise
647bd82631dSTom Zanussiin-kernel API described below can't be used with dynamic arrays.  The
648bd82631dSTom Zanussiother non-piecewise in-kernel APIs can, however, be used with dynamic
649bd82631dSTom Zanussiarrays.
65034ed6357STom Zanussi
65134ed6357STom ZanussiIf the event is created from within a module, a pointer to the module
65234ed6357STom Zanussimust be passed to synth_event_create().  This will ensure that the
65334ed6357STom Zanussitrace buffer won't contain unreadable events when the module is
65434ed6357STom Zanussiremoved.
65534ed6357STom Zanussi
65634ed6357STom ZanussiAt this point, the event object is ready to be used for generating new
65734ed6357STom Zanussievents.
65834ed6357STom Zanussi
65934ed6357STom ZanussiIn the second method, the event is created in several steps.  This
66034ed6357STom Zanussiallows events to be created dynamically and without the need to create
66134ed6357STom Zanussiand populate an array of fields beforehand.
66234ed6357STom Zanussi
66334ed6357STom ZanussiTo use this method, an empty or partially empty synthetic event should
66434ed6357STom Zanussifirst be created using synth_event_gen_cmd_start() or
66534ed6357STom Zanussisynth_event_gen_cmd_array_start().  For synth_event_gen_cmd_start(),
66634ed6357STom Zanussithe name of the event along with one or more pairs of args each pair
66734ed6357STom Zanussirepresenting a 'type field_name;' field specification should be
66834ed6357STom Zanussisupplied.  For synth_event_gen_cmd_array_start(), the name of the
66934ed6357STom Zanussievent along with an array of struct synth_field_desc should be
67034ed6357STom Zanussisupplied. Before calling synth_event_gen_cmd_start() or
67134ed6357STom Zanussisynth_event_gen_cmd_array_start(), the user should create and
67234ed6357STom Zanussiinitialize a dynevent_cmd object using synth_event_cmd_init().
67334ed6357STom Zanussi
67434ed6357STom ZanussiFor example, to create a new "schedtest" synthetic event with two
6758206de7dSMauro Carvalho Chehabfields::
67634ed6357STom Zanussi
67734ed6357STom Zanussi  struct dynevent_cmd cmd;
67834ed6357STom Zanussi  char *buf;
67934ed6357STom Zanussi
68034ed6357STom Zanussi  /* Create a buffer to hold the generated command */
68134ed6357STom Zanussi  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
68234ed6357STom Zanussi
68334ed6357STom Zanussi  /* Before generating the command, initialize the cmd object */
68434ed6357STom Zanussi  synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
68534ed6357STom Zanussi
68634ed6357STom Zanussi  ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE,
68734ed6357STom Zanussi                                  "pid_t", "next_pid_field",
68834ed6357STom Zanussi                                  "u64", "ts_ns");
68934ed6357STom Zanussi
69034ed6357STom ZanussiAlternatively, using an array of struct synth_field_desc fields
6918206de7dSMauro Carvalho Chehabcontaining the same information::
69234ed6357STom Zanussi
69334ed6357STom Zanussi  ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE,
69434ed6357STom Zanussi                                        fields, n_fields);
69534ed6357STom Zanussi
69634ed6357STom ZanussiOnce the synthetic event object has been created, it can then be
69734ed6357STom Zanussipopulated with more fields.  Fields are added one by one using
69834ed6357STom Zanussisynth_event_add_field(), supplying the dynevent_cmd object, a field
69934ed6357STom Zanussitype, and a field name.  For example, to add a new int field named
7008206de7dSMauro Carvalho Chehab"intfield", the following call should be made::
70134ed6357STom Zanussi
70234ed6357STom Zanussi  ret = synth_event_add_field(&cmd, "int", "intfield");
70334ed6357STom Zanussi
70434ed6357STom ZanussiSee synth_field_size() for available types. If field_name contains [n]
70534ed6357STom Zanussithe field is considered to be an array.
70634ed6357STom Zanussi
70734ed6357STom ZanussiA group of fields can also be added all at once using an array of
70834ed6357STom Zanussisynth_field_desc with add_synth_fields().  For example, this would add
7098206de7dSMauro Carvalho Chehabjust the first four sched_fields::
71034ed6357STom Zanussi
71134ed6357STom Zanussi  ret = synth_event_add_fields(&cmd, sched_fields, 4);
71234ed6357STom Zanussi
71334ed6357STom ZanussiIf you already have a string of the form 'type field_name',
71434ed6357STom Zanussisynth_event_add_field_str() can be used to add it as-is; it will
71534ed6357STom Zanussialso automatically append a ';' to the string.
71634ed6357STom Zanussi
71734ed6357STom ZanussiOnce all the fields have been added, the event should be finalized and
7188206de7dSMauro Carvalho Chehabregistered by calling the synth_event_gen_cmd_end() function::
71934ed6357STom Zanussi
72034ed6357STom Zanussi  ret = synth_event_gen_cmd_end(&cmd);
72134ed6357STom Zanussi
72234ed6357STom ZanussiAt this point, the event object is ready to be used for tracing new
72334ed6357STom Zanussievents.
72434ed6357STom Zanussi
725b8170fadSTom Zanussi7.2 Tracing synthetic events from in-kernel code
726b8170fadSTom Zanussi------------------------------------------------
72734ed6357STom Zanussi
72834ed6357STom ZanussiTo trace a synthetic event, there are several options.  The first
72934ed6357STom Zanussioption is to trace the event in one call, using synth_event_trace()
73034ed6357STom Zanussiwith a variable number of values, or synth_event_trace_array() with an
73134ed6357STom Zanussiarray of values to be set.  A second option can be used to avoid the
73234ed6357STom Zanussineed for a pre-formed array of values or list of arguments, via
73334ed6357STom Zanussisynth_event_trace_start() and synth_event_trace_end() along with
73434ed6357STom Zanussisynth_event_add_next_val() or synth_event_add_val() to add the values
73534ed6357STom Zanussipiecewise.
73634ed6357STom Zanussi
737b8170fadSTom Zanussi7.2.1 Tracing a synthetic event all at once
738b8170fadSTom Zanussi-------------------------------------------
73934ed6357STom Zanussi
74034ed6357STom ZanussiTo trace a synthetic event all at once, the synth_event_trace() or
74134ed6357STom Zanussisynth_event_trace_array() functions can be used.
74234ed6357STom Zanussi
74334ed6357STom ZanussiThe synth_event_trace() function is passed the trace_event_file
74434ed6357STom Zanussirepresenting the synthetic event (which can be retrieved using
74534ed6357STom Zanussitrace_get_event_file() using the synthetic event name, "synthetic" as
74634ed6357STom Zanussithe system name, and the trace instance name (NULL if using the global
74734ed6357STom Zanussitrace array)), along with an variable number of u64 args, one for each
74834ed6357STom Zanussisynthetic event field, and the number of values being passed.
74934ed6357STom Zanussi
75034ed6357STom ZanussiSo, to trace an event corresponding to the synthetic event definition
7518206de7dSMauro Carvalho Chehababove, code like the following could be used::
75234ed6357STom Zanussi
75334ed6357STom Zanussi  ret = synth_event_trace(create_synth_test, 7, /* number of values */
75434ed6357STom Zanussi                          444,             /* next_pid_field */
75534ed6357STom Zanussi                          (u64)"clackers", /* next_comm_field */
75634ed6357STom Zanussi                          1000000,         /* ts_ns */
75734ed6357STom Zanussi                          1000,            /* ts_ms */
75834ed6357STom Zanussi                          smp_processor_id(),/* cpu */
75934ed6357STom Zanussi                          (u64)"Thneed",   /* my_string_field */
76034ed6357STom Zanussi                          999);            /* my_int_field */
76134ed6357STom Zanussi
76234ed6357STom ZanussiAll vals should be cast to u64, and string vals are just pointers to
76334ed6357STom Zanussistrings, cast to u64.  Strings will be copied into space reserved in
76434ed6357STom Zanussithe event for the string, using these pointers.
76534ed6357STom Zanussi
76634ed6357STom ZanussiAlternatively, the synth_event_trace_array() function can be used to
76734ed6357STom Zanussiaccomplish the same thing.  It is passed the trace_event_file
76834ed6357STom Zanussirepresenting the synthetic event (which can be retrieved using
76934ed6357STom Zanussitrace_get_event_file() using the synthetic event name, "synthetic" as
77034ed6357STom Zanussithe system name, and the trace instance name (NULL if using the global
77134ed6357STom Zanussitrace array)), along with an array of u64, one for each synthetic
77234ed6357STom Zanussievent field.
77334ed6357STom Zanussi
77434ed6357STom ZanussiTo trace an event corresponding to the synthetic event definition
7758206de7dSMauro Carvalho Chehababove, code like the following could be used::
77634ed6357STom Zanussi
77734ed6357STom Zanussi  u64 vals[7];
77834ed6357STom Zanussi
77934ed6357STom Zanussi  vals[0] = 777;                  /* next_pid_field */
78034ed6357STom Zanussi  vals[1] = (u64)"tiddlywinks";   /* next_comm_field */
78134ed6357STom Zanussi  vals[2] = 1000000;              /* ts_ns */
78234ed6357STom Zanussi  vals[3] = 1000;                 /* ts_ms */
78334ed6357STom Zanussi  vals[4] = smp_processor_id();   /* cpu */
78434ed6357STom Zanussi  vals[5] = (u64)"thneed";        /* my_string_field */
78534ed6357STom Zanussi  vals[6] = 398;                  /* my_int_field */
78634ed6357STom Zanussi
78734ed6357STom ZanussiThe 'vals' array is just an array of u64, the number of which must
78834ed6357STom Zanussimatch the number of field in the synthetic event, and which must be in
78934ed6357STom Zanussithe same order as the synthetic event fields.
79034ed6357STom Zanussi
79134ed6357STom ZanussiAll vals should be cast to u64, and string vals are just pointers to
79234ed6357STom Zanussistrings, cast to u64.  Strings will be copied into space reserved in
79334ed6357STom Zanussithe event for the string, using these pointers.
79434ed6357STom Zanussi
79534ed6357STom ZanussiIn order to trace a synthetic event, a pointer to the trace event file
79634ed6357STom Zanussiis needed.  The trace_get_event_file() function can be used to get
79734ed6357STom Zanussiit - it will find the file in the given trace instance (in this case
79834ed6357STom ZanussiNULL since the top trace array is being used) while at the same time
7998206de7dSMauro Carvalho Chehabpreventing the instance containing it from going away::
80034ed6357STom Zanussi
80134ed6357STom Zanussi       schedtest_event_file = trace_get_event_file(NULL, "synthetic",
80234ed6357STom Zanussi                                                   "schedtest");
80334ed6357STom Zanussi
80434ed6357STom ZanussiBefore tracing the event, it should be enabled in some way, otherwise
80534ed6357STom Zanussithe synthetic event won't actually show up in the trace buffer.
80634ed6357STom Zanussi
80734ed6357STom ZanussiTo enable a synthetic event from the kernel, trace_array_set_clr_event()
80834ed6357STom Zanussican be used (which is not specific to synthetic events, so does need
80934ed6357STom Zanussithe "synthetic" system name to be specified explicitly).
81034ed6357STom Zanussi
8118206de7dSMauro Carvalho ChehabTo enable the event, pass 'true' to it::
81234ed6357STom Zanussi
81334ed6357STom Zanussi       trace_array_set_clr_event(schedtest_event_file->tr,
81434ed6357STom Zanussi                                 "synthetic", "schedtest", true);
81534ed6357STom Zanussi
8168206de7dSMauro Carvalho ChehabTo disable it pass false::
81734ed6357STom Zanussi
81834ed6357STom Zanussi       trace_array_set_clr_event(schedtest_event_file->tr,
81934ed6357STom Zanussi                                 "synthetic", "schedtest", false);
82034ed6357STom Zanussi
82134ed6357STom ZanussiFinally, synth_event_trace_array() can be used to actually trace the
8228206de7dSMauro Carvalho Chehabevent, which should be visible in the trace buffer afterwards::
82334ed6357STom Zanussi
82434ed6357STom Zanussi       ret = synth_event_trace_array(schedtest_event_file, vals,
82534ed6357STom Zanussi                                     ARRAY_SIZE(vals));
82634ed6357STom Zanussi
82734ed6357STom ZanussiTo remove the synthetic event, the event should be disabled, and the
8288206de7dSMauro Carvalho Chehabtrace instance should be 'put' back using trace_put_event_file()::
82934ed6357STom Zanussi
83034ed6357STom Zanussi       trace_array_set_clr_event(schedtest_event_file->tr,
83134ed6357STom Zanussi                                 "synthetic", "schedtest", false);
83234ed6357STom Zanussi       trace_put_event_file(schedtest_event_file);
83334ed6357STom Zanussi
83434ed6357STom ZanussiIf those have been successful, synth_event_delete() can be called to
8358206de7dSMauro Carvalho Chehabremove the event::
83634ed6357STom Zanussi
83734ed6357STom Zanussi       ret = synth_event_delete("schedtest");
83834ed6357STom Zanussi
839b8170fadSTom Zanussi7.2.2 Tracing a synthetic event piecewise
840b8170fadSTom Zanussi-----------------------------------------
84134ed6357STom Zanussi
84234ed6357STom ZanussiTo trace a synthetic using the piecewise method described above, the
84334ed6357STom Zanussisynth_event_trace_start() function is used to 'open' the synthetic
8448206de7dSMauro Carvalho Chehabevent trace::
84534ed6357STom Zanussi
846301de546SArtem Bityutskiy       struct synth_event_trace_state trace_state;
84734ed6357STom Zanussi
84834ed6357STom Zanussi       ret = synth_event_trace_start(schedtest_event_file, &trace_state);
84934ed6357STom Zanussi
85034ed6357STom ZanussiIt's passed the trace_event_file representing the synthetic event
85134ed6357STom Zanussiusing the same methods as described above, along with a pointer to a
852301de546SArtem Bityutskiystruct synth_event_trace_state object, which will be zeroed before use and
85334ed6357STom Zanussiused to maintain state between this and following calls.
85434ed6357STom Zanussi
85534ed6357STom ZanussiOnce the event has been opened, which means space for it has been
85634ed6357STom Zanussireserved in the trace buffer, the individual fields can be set.  There
85734ed6357STom Zanussiare two ways to do that, either one after another for each field in
85834ed6357STom Zanussithe event, which requires no lookups, or by name, which does.  The
85934ed6357STom Zanussitradeoff is flexibility in doing the assignments vs the cost of a
86034ed6357STom Zanussilookup per field.
86134ed6357STom Zanussi
86234ed6357STom ZanussiTo assign the values one after the other without lookups,
86334ed6357STom Zanussisynth_event_add_next_val() should be used.  Each call is passed the
864301de546SArtem Bityutskiysame synth_event_trace_state object used in the synth_event_trace_start(),
86534ed6357STom Zanussialong with the value to set the next field in the event.  After each
86634ed6357STom Zanussifield is set, the 'cursor' points to the next field, which will be set
86734ed6357STom Zanussiby the subsequent call, continuing until all the fields have been set
86834ed6357STom Zanussiin order.  The same sequence of calls as in the above examples using
8698206de7dSMauro Carvalho Chehabthis method would be (without error-handling code)::
87034ed6357STom Zanussi
87134ed6357STom Zanussi       /* next_pid_field */
87234ed6357STom Zanussi       ret = synth_event_add_next_val(777, &trace_state);
87334ed6357STom Zanussi
87434ed6357STom Zanussi       /* next_comm_field */
87534ed6357STom Zanussi       ret = synth_event_add_next_val((u64)"slinky", &trace_state);
87634ed6357STom Zanussi
87734ed6357STom Zanussi       /* ts_ns */
87834ed6357STom Zanussi       ret = synth_event_add_next_val(1000000, &trace_state);
87934ed6357STom Zanussi
88034ed6357STom Zanussi       /* ts_ms */
88134ed6357STom Zanussi       ret = synth_event_add_next_val(1000, &trace_state);
88234ed6357STom Zanussi
88334ed6357STom Zanussi       /* cpu */
88434ed6357STom Zanussi       ret = synth_event_add_next_val(smp_processor_id(), &trace_state);
88534ed6357STom Zanussi
88634ed6357STom Zanussi       /* my_string_field */
88734ed6357STom Zanussi       ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state);
88834ed6357STom Zanussi
88934ed6357STom Zanussi       /* my_int_field */
89034ed6357STom Zanussi       ret = synth_event_add_next_val(395, &trace_state);
89134ed6357STom Zanussi
89234ed6357STom ZanussiTo assign the values in any order, synth_event_add_val() should be
893301de546SArtem Bityutskiyused.  Each call is passed the same synth_event_trace_state object used in
89434ed6357STom Zanussithe synth_event_trace_start(), along with the field name of the field
89534ed6357STom Zanussito set and the value to set it to.  The same sequence of calls as in
89634ed6357STom Zanussithe above examples using this method would be (without error-handling
8978206de7dSMauro Carvalho Chehabcode)::
89834ed6357STom Zanussi
89934ed6357STom Zanussi       ret = synth_event_add_val("next_pid_field", 777, &trace_state);
90034ed6357STom Zanussi       ret = synth_event_add_val("next_comm_field", (u64)"silly putty",
90134ed6357STom Zanussi                                 &trace_state);
90234ed6357STom Zanussi       ret = synth_event_add_val("ts_ns", 1000000, &trace_state);
90334ed6357STom Zanussi       ret = synth_event_add_val("ts_ms", 1000, &trace_state);
90434ed6357STom Zanussi       ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state);
90534ed6357STom Zanussi       ret = synth_event_add_val("my_string_field", (u64)"thneed_9",
90634ed6357STom Zanussi                                 &trace_state);
90734ed6357STom Zanussi       ret = synth_event_add_val("my_int_field", 3999, &trace_state);
90834ed6357STom Zanussi
90934ed6357STom ZanussiNote that synth_event_add_next_val() and synth_event_add_val() are
91034ed6357STom Zanussiincompatible if used within the same trace of an event - either one
91134ed6357STom Zanussican be used but not both at the same time.
91234ed6357STom Zanussi
91334ed6357STom ZanussiFinally, the event won't be actually traced until it's 'closed',
91434ed6357STom Zanussiwhich is done using synth_event_trace_end(), which takes only the
915301de546SArtem Bityutskiystruct synth_event_trace_state object used in the previous calls::
91634ed6357STom Zanussi
91734ed6357STom Zanussi       ret = synth_event_trace_end(&trace_state);
91834ed6357STom Zanussi
91934ed6357STom ZanussiNote that synth_event_trace_end() must be called at the end regardless
92034ed6357STom Zanussiof whether any of the add calls failed (say due to a bad field name
92134ed6357STom Zanussibeing passed in).
92234ed6357STom Zanussi
923b8170fadSTom Zanussi7.3 Dyamically creating kprobe and kretprobe event definitions
924b8170fadSTom Zanussi--------------------------------------------------------------
92534ed6357STom Zanussi
92634ed6357STom ZanussiTo create a kprobe or kretprobe trace event from kernel code, the
92734ed6357STom Zanussikprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
92834ed6357STom Zanussifunctions can be used.
92934ed6357STom Zanussi
93034ed6357STom ZanussiTo create a kprobe event, an empty or partially empty kprobe event
93134ed6357STom Zanussishould first be created using kprobe_event_gen_cmd_start().  The name
932d56b699dSBjorn Helgaasof the event and the probe location should be specified along with one
93334ed6357STom Zanussior args each representing a probe field should be supplied to this
93434ed6357STom Zanussifunction.  Before calling kprobe_event_gen_cmd_start(), the user
93534ed6357STom Zanussishould create and initialize a dynevent_cmd object using
93634ed6357STom Zanussikprobe_event_cmd_init().
93734ed6357STom Zanussi
9388206de7dSMauro Carvalho ChehabFor example, to create a new "schedtest" kprobe event with two fields::
93934ed6357STom Zanussi
94034ed6357STom Zanussi  struct dynevent_cmd cmd;
94134ed6357STom Zanussi  char *buf;
94234ed6357STom Zanussi
94334ed6357STom Zanussi  /* Create a buffer to hold the generated command */
94434ed6357STom Zanussi  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
94534ed6357STom Zanussi
94634ed6357STom Zanussi  /* Before generating the command, initialize the cmd object */
94734ed6357STom Zanussi  kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
94834ed6357STom Zanussi
94934ed6357STom Zanussi  /*
95034ed6357STom Zanussi   * Define the gen_kprobe_test event with the first 2 kprobe
95134ed6357STom Zanussi   * fields.
95234ed6357STom Zanussi   */
95334ed6357STom Zanussi  ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open",
95434ed6357STom Zanussi                                   "dfd=%ax", "filename=%dx");
95534ed6357STom Zanussi
95634ed6357STom ZanussiOnce the kprobe event object has been created, it can then be
95734ed6357STom Zanussipopulated with more fields.  Fields can be added using
95834ed6357STom Zanussikprobe_event_add_fields(), supplying the dynevent_cmd object along
95934ed6357STom Zanussiwith a variable arg list of probe fields.  For example, to add a
9608206de7dSMauro Carvalho Chehabcouple additional fields, the following call could be made::
96134ed6357STom Zanussi
96234ed6357STom Zanussi  ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)");
96334ed6357STom Zanussi
96434ed6357STom ZanussiOnce all the fields have been added, the event should be finalized and
96534ed6357STom Zanussiregistered by calling the kprobe_event_gen_cmd_end() or
96634ed6357STom Zanussikretprobe_event_gen_cmd_end() functions, depending on whether a kprobe
9678206de7dSMauro Carvalho Chehabor kretprobe command was started::
96834ed6357STom Zanussi
96934ed6357STom Zanussi  ret = kprobe_event_gen_cmd_end(&cmd);
97034ed6357STom Zanussi
9718206de7dSMauro Carvalho Chehabor::
97234ed6357STom Zanussi
97334ed6357STom Zanussi  ret = kretprobe_event_gen_cmd_end(&cmd);
97434ed6357STom Zanussi
97534ed6357STom ZanussiAt this point, the event object is ready to be used for tracing new
97634ed6357STom Zanussievents.
97734ed6357STom Zanussi
97834ed6357STom ZanussiSimilarly, a kretprobe event can be created using
97934ed6357STom Zanussikretprobe_event_gen_cmd_start() with a probe name and location and
9808206de7dSMauro Carvalho Chehabadditional params such as $retval::
98134ed6357STom Zanussi
98234ed6357STom Zanussi  ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test",
98334ed6357STom Zanussi                                      "do_sys_open", "$retval");
98434ed6357STom Zanussi
98534ed6357STom ZanussiSimilar to the synthetic event case, code like the following can be
9868206de7dSMauro Carvalho Chehabused to enable the newly created kprobe event::
98734ed6357STom Zanussi
98834ed6357STom Zanussi  gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test");
98934ed6357STom Zanussi
99034ed6357STom Zanussi  ret = trace_array_set_clr_event(gen_kprobe_test->tr,
99134ed6357STom Zanussi                                  "kprobes", "gen_kprobe_test", true);
99234ed6357STom Zanussi
99334ed6357STom ZanussiFinally, also similar to synthetic events, the following code can be
9948206de7dSMauro Carvalho Chehabused to give the kprobe event file back and delete the event::
99534ed6357STom Zanussi
99634ed6357STom Zanussi  trace_put_event_file(gen_kprobe_test);
99734ed6357STom Zanussi
99834ed6357STom Zanussi  ret = kprobe_event_delete("gen_kprobe_test");
99934ed6357STom Zanussi
1000b8170fadSTom Zanussi7.4 The "dynevent_cmd" low-level API
1001b8170fadSTom Zanussi------------------------------------
100234ed6357STom Zanussi
100334ed6357STom ZanussiBoth the in-kernel synthetic event and kprobe interfaces are built on
100434ed6357STom Zanussitop of a lower-level "dynevent_cmd" interface.  This interface is
100534ed6357STom Zanussimeant to provide the basis for higher-level interfaces such as the
100634ed6357STom Zanussisynthetic and kprobe interfaces, which can be used as examples.
100734ed6357STom Zanussi
100834ed6357STom ZanussiThe basic idea is simple and amounts to providing a general-purpose
100934ed6357STom Zanussilayer that can be used to generate trace event commands.  The
101034ed6357STom Zanussigenerated command strings can then be passed to the command-parsing
101134ed6357STom Zanussiand event creation code that already exists in the trace event
1012d56b699dSBjorn Helgaassubsystem for creating the corresponding trace events.
101334ed6357STom Zanussi
101434ed6357STom ZanussiIn a nutshell, the way it works is that the higher-level interface
101534ed6357STom Zanussicode creates a struct dynevent_cmd object, then uses a couple
101634ed6357STom Zanussifunctions, dynevent_arg_add() and dynevent_arg_pair_add() to build up
101734ed6357STom Zanussia command string, which finally causes the command to be executed
101834ed6357STom Zanussiusing the dynevent_create() function.  The details of the interface
101934ed6357STom Zanussiare described below.
102034ed6357STom Zanussi
102134ed6357STom ZanussiThe first step in building a new command string is to create and
102234ed6357STom Zanussiinitialize an instance of a dynevent_cmd.  Here, for instance, we
10238206de7dSMauro Carvalho Chehabcreate a dynevent_cmd on the stack and initialize it::
102434ed6357STom Zanussi
102534ed6357STom Zanussi  struct dynevent_cmd cmd;
102634ed6357STom Zanussi  char *buf;
102734ed6357STom Zanussi  int ret;
102834ed6357STom Zanussi
102934ed6357STom Zanussi  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
103034ed6357STom Zanussi
103134ed6357STom Zanussi  dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO,
103234ed6357STom Zanussi                    foo_event_run_command);
103334ed6357STom Zanussi
103434ed6357STom ZanussiThe dynevent_cmd initialization needs to be given a user-specified
103534ed6357STom Zanussibuffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used
103634ed6357STom Zanussifor this purpose - at 2k it's generally too big to be comfortably put
103734ed6357STom Zanussion the stack, so is dynamically allocated), a dynevent type id, which
103834ed6357STom Zanussiis meant to be used to check that further API calls are for the
103934ed6357STom Zanussicorrect command type, and a pointer to an event-specific run_command()
104034ed6357STom Zanussicallback that will be called to actually execute the event-specific
104134ed6357STom Zanussicommand function.
104234ed6357STom Zanussi
104334ed6357STom ZanussiOnce that's done, the command string can by built up by successive
104434ed6357STom Zanussicalls to argument-adding functions.
104534ed6357STom Zanussi
104634ed6357STom ZanussiTo add a single argument, define and initialize a struct dynevent_arg
104734ed6357STom Zanussior struct dynevent_arg_pair object.  Here's an example of the simplest
104834ed6357STom Zanussipossible arg addition, which is simply to append the given string as
10498206de7dSMauro Carvalho Chehaba whitespace-separated argument to the command::
105034ed6357STom Zanussi
105134ed6357STom Zanussi  struct dynevent_arg arg;
105234ed6357STom Zanussi
105334ed6357STom Zanussi  dynevent_arg_init(&arg, NULL, 0);
105434ed6357STom Zanussi
105534ed6357STom Zanussi  arg.str = name;
105634ed6357STom Zanussi
105734ed6357STom Zanussi  ret = dynevent_arg_add(cmd, &arg);
105834ed6357STom Zanussi
105934ed6357STom ZanussiThe arg object is first initialized using dynevent_arg_init() and in
106034ed6357STom Zanussithis case the parameters are NULL or 0, which means there's no
106134ed6357STom Zanussioptional sanity-checking function or separator appended to the end of
106234ed6357STom Zanussithe arg.
106334ed6357STom Zanussi
106434ed6357STom ZanussiHere's another more complicated example using an 'arg pair', which is
106534ed6357STom Zanussiused to create an argument that consists of a couple components added
106634ed6357STom Zanussitogether as a unit, for example, a 'type field_name;' arg or a simple
10678206de7dSMauro Carvalho Chehabexpression arg e.g. 'flags=%cx'::
106834ed6357STom Zanussi
106934ed6357STom Zanussi  struct dynevent_arg_pair arg_pair;
107034ed6357STom Zanussi
107134ed6357STom Zanussi  dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';');
107234ed6357STom Zanussi
107334ed6357STom Zanussi  arg_pair.lhs = type;
107434ed6357STom Zanussi  arg_pair.rhs = name;
107534ed6357STom Zanussi
107634ed6357STom Zanussi  ret = dynevent_arg_pair_add(cmd, &arg_pair);
107734ed6357STom Zanussi
107834ed6357STom ZanussiAgain, the arg_pair is first initialized, in this case with a callback
107934ed6357STom Zanussifunction used to check the sanity of the args (for example, that
108034ed6357STom Zanussineither part of the pair is NULL), along with a character to be used
108134ed6357STom Zanussito add an operator between the pair (here none) and a separator to be
108234ed6357STom Zanussiappended onto the end of the arg pair (here ';').
108334ed6357STom Zanussi
108434ed6357STom ZanussiThere's also a dynevent_str_add() function that can be used to simply
1085d56b699dSBjorn Helgaasadd a string as-is, with no spaces, delimiters, or arg check.
108634ed6357STom Zanussi
108734ed6357STom ZanussiAny number of dynevent_*_add() calls can be made to build up the string
108834ed6357STom Zanussi(until its length surpasses cmd->maxlen).  When all the arguments have
108934ed6357STom Zanussibeen added and the command string is complete, the only thing left to
109034ed6357STom Zanussido is run the command, which happens by simply calling
10918206de7dSMauro Carvalho Chehabdynevent_create()::
109234ed6357STom Zanussi
109334ed6357STom Zanussi  ret = dynevent_create(&cmd);
109434ed6357STom Zanussi
109534ed6357STom ZanussiAt that point, if the return value is 0, the dynamic event has been
109634ed6357STom Zanussicreated and is ready to use.
109734ed6357STom Zanussi
109834ed6357STom ZanussiSee the dynevent_cmd function definitions themselves for the details
109934ed6357STom Zanussiof the API.
1100