1'\" te 2.\" CDDL HEADER START 3.\" 4.\" The contents of this file are subject to the terms of the 5.\" Common Development and Distribution License (the "License"). 6.\" You may not use this file except in compliance with the License. 7.\" 8.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9.\" or http://www.opensolaris.org/os/licensing. 10.\" See the License for the specific language governing permissions 11.\" and limitations under the License. 12.\" 13.\" When distributing Covered Code, include this CDDL HEADER in each 14.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15.\" If applicable, add the following below this CDDL HEADER, with the 16.\" fields enclosed by brackets "[]" replaced with your own identifying 17.\" information: Portions Copyright [yyyy] [name of copyright owner] 18.\" 19.\" CDDL HEADER END 20.\" Copyright (c) 2008, Sun Microsystems, Inc. All Rights Reserved. 21.\" 22.\" $FreeBSD$ 23.\" 24.Dd February 25, 2020 25.Dt LOCKSTAT 1 26.Os 27.Sh NAME 28.Nm lockstat 29.Nd report kernel lock and profiling statistics 30.Sh SYNOPSIS 31.Nm 32.Op Fl ACEHIV 33.Op Fl e Ar event-list 34.Op Fl i Ar rate 35.Op Fl b | t | h | s Ar depth 36.Op Fl n Ar num-records 37.Op Fl l Ar lock Oo Ns , Ns Ar size Oc 38.Op Fl d Ar duration 39.Op Fl f Ar function Oo Ns , Ns Ar size Oc 40.Op Fl T 41.Op Fl kgwWRpP 42.Op Fl D Ar count 43.Op Fl o filename 44.Op Fl x Ar opt Oo Ns = Ns Ar val Oc 45.Ar command 46.Op Oo Ar args Oc 47.Sh DESCRIPTION 48The 49.Nm 50utility gathers and displays kernel locking and profiling statistics. 51.Nm 52allows you to specify which events to watch (for example, spin on adaptive 53mutex, block on read access to rwlock due to waiting writers, and so forth), how 54much data to gather for each event, and how to display the data. 55By default, 56.Nm 57monitors all lock contention events, gathers frequency and timing data about 58those events, and displays the data in decreasing frequency order, so that the 59most common events appear first. 60.Pp 61.Nm 62gathers data until the specified command completes. 63For example, to gather statistics for a fixed-time interval, use 64.Xr sleep 1 65as the command, as follows: 66.Pp 67.Dl # lockstat sleep 5 68.Pp 69When the 70.Fl I 71option is specified, 72.Nm lockstat 73establishes a per-processor high-level periodic interrupt source to gather 74profiling data. 75The interrupt handler simply generates a 76.Nm 77event whose caller is the interrupted PC (program counter). 78The profiling event is just like any other 79.Nm lockstat 80event, so all of the normal 81.Nm lockstat 82options are applicable. 83.Pp 84.Nm 85relies on DTrace to modify the running kernel's text to intercept events of 86interest. 87This imposes a small but measurable overhead on all system activity, so access 88to 89.Nm 90is restricted to super-user by default. 91.Sh OPTIONS 92The following options are supported: 93.Bl -tag -width indent 94.It Fl V 95Print the D program used to gather the requested data. 96.El 97.Ss Event Selection 98If no event selection options are specified, the default is 99.Fl C . 100.Bl -tag -width indent 101.It Fl A 102Watch all lock events. 103.Fl A 104is equivalent to 105.Fl CH . 106.It Fl C 107Watch contention events. 108.It Fl E 109Watch error events. 110.It Fl e Ar event-list 111Only watch the specified events. 112.Ar event-list 113is a comma-separated list of events or ranges of events such as 1,4-7,35. 114Run 115.Nm 116with no arguments to get a brief description of all events. 117.It Fl H 118Watch hold events. 119.It Fl I 120Watch profiling interrupt events. 121.It Fl i Ar rate 122Interrupt rate (per second) for 123.Fl I . 124The default is 97 Hz, so that profiling doesn't run in lockstep with the clock 125interrupt (which runs at 100 Hz). 126.El 127.Ss Data Gathering 128.Bl -tag -width indent 129.It Fl x Ar arg Oo Ns = Ns Ar val Oc 130Enable or modify a 131.Xr dtrace 1 132runtime option or D compiler option. 133Boolean options are enabled by specifying their name. 134Options with values are set by separating the option name and value with an 135equals sign. 136.El 137.Ss "Data Gathering (Mutually Exclusive)" 138.Bl -tag -width indent 139.It Fl b 140Basic statistics: lock, caller, number of events. 141.It Fl h 142Histogram: timing plus time-distribution histograms. 143.It Fl s Ar depth 144Stack trace: histogram plus stack traces up to 145.Ar depth 146frames deep. 147.It Fl t 148Timing: Basic plus timing for all events (default). 149.El 150.Ss "Data Filtering" 151.Bl -tag -width indent 152.It Fl d Ar duration 153Only watch events longer than 154.Ar duration . 155.It Fl f Ar func Ns Oo Ns , Ns Ar size Oc Ns 156Only watch events generated by 157.Ar func , 158which can be specified as a symbolic name or hex address. 159.Ar size 160defaults to the ELF symbol size if available, or 1 if not. 161.It Fl l Ar lock Ns Oo Ns , Ns Ar size Oc Ns 162Only watch 163.Ar lock , 164which can be specified as a symbolic name or hex address. 165.Ar size 166defaults to the ELF symbol size or 1 if the symbol size is not available. 167.It Fl n Ar num-records 168Maximum number of data records. 169.It Fl T 170Trace (rather than sample) events. 171This is off by default. 172.El 173.Ss Data Reporting 174.Bl -tag -width indent 175.It Fl D Ar count 176Only display the top 177.Ar count 178events of each type. 179.It Fl g 180Show total events generated by function. 181For example, if 182.Fn foo 183calls 184.Fn bar 185in a loop, the work done by 186.Fn bar 187counts as work generated by 188.Fn foo 189(along with any work done by 190.Fn foo 191itself). 192The 193.Fl g 194option works by counting the total number of stack frames in which each function 195appears. 196This implies two things: (1) the data reported by 197.Fl g 198can be misleading if the stack traces are not deep enough, and (2) functions 199that are called recursively might show greater than 100% activity. 200In light of issue (1), the default data gathering mode when using 201.Fl g 202is 203.Fl s 50 . 204.It Fl k 205Coalesce PCs within functions. 206.It Fl o Ar filename 207Direct output to 208.Ar filename . 209.It Fl P 210Sort data by (\fIcount * time\fR) product. 211.It Fl p 212Parsable output format. 213.It Fl R 214Display rates (events per second) rather than counts. 215.It Fl W 216Whichever: distinguish events only by caller, not by lock. 217.It Fl w 218Wherever: distinguish events only by lock, not by caller. 219.El 220.Sh DISPLAY FORMATS 221The following headers appear over various columns of data. 222.Bl -tag -width indent 223.It Count or ops/s 224Number of times this event occurred, or the rate (times per second) if 225.Fl R 226was specified. 227.It indv 228Percentage of all events represented by this individual event. 229.It genr 230Percentage of all events generated by this function. 231.It cuml 232Cumulative percentage; a running total of the individuals. 233.It rcnt 234Average reference count. 235This will always be 1 for exclusive locks (mutexes, 236spin locks, rwlocks held as writer) but can be greater than 1 for shared locks 237(rwlocks held as reader). 238.It nsec 239Average duration of the events in nanoseconds, as appropriate for the event. 240For the profiling event, duration means interrupt latency. 241.It Lock 242Address of the lock; displayed symbolically if possible. 243.It CPU+Pri_Class 244CPU plus the priority class of the interrupted thread. 245For example, if CPU 4 is interrupted while running a timeshare thread, this 246will be reported as 247.Ql cpu[4]+TShar . 248.It Caller 249Address of the caller; displayed symbolically if possible. 250.El 251.Sh EXAMPLES 252.Bl -tag -width 0n 253.It Example 1 Measuring Kernel Lock Contention 254.Pp 255.Li # lockstat sleep 5 256.Bd -literal 257Adaptive mutex spin: 41411 events in 5.011 seconds (8263 events/sec) 258 259Count indv cuml rcnt nsec Lock Caller 260------------------------------------------------------------------------------- 26113750 33% 33% 0.00 72 vm_page_queue_free_mtx vm_page_free_toq+0x12e 26213648 33% 66% 0.00 66 vm_page_queue_free_mtx vm_page_alloc+0x138 263 4023 10% 76% 0.00 51 vm_dom+0x80 vm_page_dequeue+0x68 264 2672 6% 82% 0.00 186 vm_dom+0x80 vm_page_enqueue+0x63 265 618 1% 84% 0.00 31 0xfffff8000cd83a88 qsyncvp+0x37 266 506 1% 85% 0.00 164 0xfffff8000cb3f098 vputx+0x5a 267 477 1% 86% 0.00 69 0xfffff8000c7eb180 uma_dbg_getslab+0x5b 268 288 1% 87% 0.00 77 0xfffff8000cd8b000 vn_finished_write+0x29 269 263 1% 88% 0.00 103 0xfffff8000cbad448 vinactive+0xdc 270 259 1% 88% 0.00 53 0xfffff8000cd8b000 vfs_ref+0x24 271 237 1% 89% 0.00 20 0xfffff8000cbad448 vfs_hash_get+0xcc 272 233 1% 89% 0.00 22 0xfffff8000bfd9480 uma_dbg_getslab+0x5b 273 223 1% 90% 0.00 20 0xfffff8000cb3f098 cache_lookup+0x561 274 193 0% 90% 0.00 16 0xfffff8000cb40ba8 vref+0x27 275 175 0% 91% 0.00 34 0xfffff8000cbad448 vputx+0x5a 276 169 0% 91% 0.00 51 0xfffff8000cd8b000 vfs_unbusy+0x27 277 164 0% 92% 0.00 31 0xfffff8000cb40ba8 vputx+0x5a 278[...] 279 280Adaptive mutex block: 10 events in 5.011 seconds (2 events/sec) 281 282Count indv cuml rcnt nsec Lock Caller 283------------------------------------------------------------------------------- 284 3 30% 30% 0.00 17592 vm_page_queue_free_mtx vm_page_alloc+0x138 285 2 20% 50% 0.00 20528 vm_dom+0x80 vm_page_enqueue+0x63 286 2 20% 70% 0.00 55502 0xfffff8000cb40ba8 vputx+0x5a 287 1 10% 80% 0.00 12007 vm_page_queue_free_mtx vm_page_free_toq+0x12e 288 1 10% 90% 0.00 9125 0xfffff8000cbad448 vfs_hash_get+0xcc 289 1 10% 100% 0.00 7864 0xfffff8000cd83a88 qsyncvp+0x37 290------------------------------------------------------------------------------- 291[...] 292.Ed 293.It Example 2 Measuring Hold Times 294.Pp 295.Li # lockstat -H -D 10 sleep 1 296.Bd -literal 297Adaptive mutex hold: 109589 events in 1.039 seconds (105526 events/sec) 298 299Count indv cuml rcnt nsec Lock Caller 300------------------------------------------------------------------------------- 301 8998 8% 8% 0.00 617 0xfffff8000c7eb180 uma_dbg_getslab+0xd4 302 5901 5% 14% 0.00 917 vm_page_queue_free_mtx vm_object_terminate+0x16a 303 5040 5% 18% 0.00 902 vm_dom+0x80 vm_page_free_toq+0x88 304 4884 4% 23% 0.00 1056 vm_page_queue_free_mtx vm_page_alloc+0x44e 305 4664 4% 27% 0.00 759 vm_dom+0x80 vm_fault_hold+0x1a13 306 4011 4% 31% 0.00 888 vm_dom vm_page_advise+0x11b 307 4010 4% 34% 0.00 957 vm_dom+0x80 _vm_page_deactivate+0x5c 308 3743 3% 38% 0.00 582 0xfffff8000cf04838 pmap_is_prefaultable+0x158 309 2254 2% 40% 0.00 952 vm_dom vm_page_free_toq+0x88 310 1639 1% 41% 0.00 591 0xfffff800d60065b8 trap_pfault+0x1f7 311------------------------------------------------------------------------------- 312[...] 313 314R/W writer hold: 64314 events in 1.039 seconds (61929 events/sec) 315 316Count indv cuml rcnt nsec Lock Caller 317------------------------------------------------------------------------------- 318 7421 12% 12% 0.00 2994 pvh_global_lock pmap_page_is_mapped+0xb6 319 4668 7% 19% 0.00 3313 pvh_global_lock pmap_enter+0x9ae 320 1639 3% 21% 0.00 733 0xfffff80168d10200 vm_object_deallocate+0x683 321 1639 3% 24% 0.00 3061 0xfffff80168d10200 unlock_and_deallocate+0x2b 322 1639 3% 26% 0.00 2966 0xfffff80168d10200 vm_fault_hold+0x16ee 323 1567 2% 29% 0.00 733 0xfffff80168d10200 vm_fault_hold+0x19bc 324 821 1% 30% 0.00 786 0xfffff801eb0cc000 vm_object_madvise+0x32d 325 649 1% 31% 0.00 4918 0xfffff80191105300 vm_fault_hold+0x16ee 326 648 1% 32% 0.00 8112 0xfffff80191105300 unlock_and_deallocate+0x2b 327 647 1% 33% 0.00 1261 0xfffff80191105300 vm_object_deallocate+0x683 328------------------------------------------------------------------------------- 329.Ed 330.It Example 3 Measuring Hold Times for Stack Traces Containing a Specific Function 331.Pp 332.Li # lockstat -H -f tcp_input -s 50 -D 10 sleep 1 333.Bd -literal 334Adaptive mutex hold: 68 events in 1.026 seconds (66 events/sec) 335 336------------------------------------------------------------------------------- 337Count indv cuml rcnt nsec Lock Caller 338 32 47% 47% 0.00 1631 0xfffff800686f50d8 tcp_do_segment+0x284b 339 340 nsec ------ Time Distribution ------ count Stack 341 1024 |@@@@@@@@@@ 11 tcp_input+0xf54 342 2048 |@@@@@@@@@@@@@ 14 ip_input+0xc8 343 4096 |@@@@@ 6 swi_net+0x192 344 8192 | 1 intr_event_execute_handlers+0x93 345 ithread_loop+0xa6 346 fork_exit+0x84 347 0xffffffff808cf9ee 348------------------------------------------------------------------------------- 349Count indv cuml rcnt nsec Lock Caller 350 29 43% 90% 0.00 4851 0xfffff800686f50d8 sowakeup+0xf8 351 352 nsec ------ Time Distribution ------ count Stack 353 4096 |@@@@@@@@@@@@@@@ 15 tcp_do_segment+0x2423 354 8192 |@@@@@@@@@@@@ 12 tcp_input+0xf54 355 16384 |@@ 2 ip_input+0xc8 356 swi_net+0x192 357 intr_event_execute_handlers+0x93 358 ithread_loop+0xa6 359 fork_exit+0x84 360 0xffffffff808cf9ee 361------------------------------------------------------------------------------- 362[...] 363.Ed 364.El 365.Sh SEE ALSO 366.Xr dtrace 1 , 367.Xr ksyms 4 , 368.Xr locking 9 369.Sh HISTORY 370The 371.Nm 372utility first appeared in 373.Fx 7.1 . 374.Sh NOTES 375Tail-call elimination can affect call sites. 376For example, if 377.Fn foo Ns +0x50 378calls 379.Fn bar 380and the last thing 381.Fn bar 382does is call 383.Fn mtx_unlock , 384the compiler can arrange for 385.Fn bar 386to branch to 387.Fn mtx_unlock 388with a return address of 389.Fn foo Ns +0x58. 390Thus, the 391.Fn mtx_unlock 392in 393.Fn bar 394will appear as though it occurred at 395.Fn foo Ns +0x58. 396.Pp 397The PC in the stack frame in which an interrupt occurs can be bogus because, 398between function calls, the compiler is free to use the return address register 399for local storage. 400.Pp 401When using the 402.Fl I 403and 404.Fl s 405options together, the interrupted PC will usually not appear anywhere in the 406stack since the interrupt handler is entered asynchronously, not by a function 407call from that PC. 408