1'\" te 2.\" CDDL HEADER START 3.\" 4.\" The contents of this file are subject to the terms of the 5.\" Common Development and Distribution License (the "License"). 6.\" You may not use this file except in compliance with the License. 7.\" 8.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9.\" or http://www.opensolaris.org/os/licensing. 10.\" See the License for the specific language governing permissions 11.\" and limitations under the License. 12.\" 13.\" When distributing Covered Code, include this CDDL HEADER in each 14.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15.\" If applicable, add the following below this CDDL HEADER, with the 16.\" fields enclosed by brackets "[]" replaced with your own identifying 17.\" information: Portions Copyright [yyyy] [name of copyright owner] 18.\" 19.\" CDDL HEADER END 20.\" Copyright (c) 2008, Sun Microsystems, Inc. All Rights Reserved. 21.\" 22.Dd February 25, 2020 23.Dt LOCKSTAT 1 24.Os 25.Sh NAME 26.Nm lockstat 27.Nd report kernel lock and profiling statistics 28.Sh SYNOPSIS 29.Nm 30.Op Fl ACEHIV 31.Op Fl e Ar event-list 32.Op Fl i Ar rate 33.Op Fl b | t | h | s Ar depth 34.Op Fl n Ar num-records 35.Op Fl l Ar lock Oo Ns , Ns Ar size Oc 36.Op Fl d Ar duration 37.Op Fl f Ar function Oo Ns , Ns Ar size Oc 38.Op Fl T 39.Op Fl kgwWRpP 40.Op Fl D Ar count 41.Op Fl o filename 42.Op Fl x Ar opt Oo Ns = Ns Ar val Oc 43.Ar command 44.Op Oo Ar args Oc 45.Sh DESCRIPTION 46The 47.Nm 48utility gathers and displays kernel locking and profiling statistics. 49.Nm 50allows you to specify which events to watch (for example, spin on adaptive 51mutex, block on read access to rwlock due to waiting writers, and so forth), how 52much data to gather for each event, and how to display the data. 53By default, 54.Nm 55monitors all lock contention events, gathers frequency and timing data about 56those events, and displays the data in decreasing frequency order, so that the 57most common events appear first. 58.Pp 59.Nm 60gathers data until the specified command completes. 61For example, to gather statistics for a fixed-time interval, use 62.Xr sleep 1 63as the command, as follows: 64.Pp 65.Dl # lockstat sleep 5 66.Pp 67When the 68.Fl I 69option is specified, 70.Nm lockstat 71establishes a per-processor high-level periodic interrupt source to gather 72profiling data. 73The interrupt handler simply generates a 74.Nm 75event whose caller is the interrupted PC (program counter). 76The profiling event is just like any other 77.Nm lockstat 78event, so all of the normal 79.Nm lockstat 80options are applicable. 81.Pp 82.Nm 83relies on DTrace to modify the running kernel's text to intercept events of 84interest. 85This imposes a small but measurable overhead on all system activity, so access 86to 87.Nm 88is restricted to super-user by default. 89.Sh OPTIONS 90The following options are supported: 91.Bl -tag -width indent 92.It Fl V 93Print the D program used to gather the requested data. 94.El 95.Ss Event Selection 96If no event selection options are specified, the default is 97.Fl C . 98.Bl -tag -width indent 99.It Fl A 100Watch all lock events. 101.Fl A 102is equivalent to 103.Fl CH . 104.It Fl C 105Watch contention events. 106.It Fl E 107Watch error events. 108.It Fl e Ar event-list 109Only watch the specified events. 110.Ar event-list 111is a comma-separated list of events or ranges of events such as 1,4-7,35. 112Run 113.Nm 114with no arguments to get a brief description of all events. 115.It Fl H 116Watch hold events. 117.It Fl I 118Watch profiling interrupt events. 119.It Fl i Ar rate 120Interrupt rate (per second) for 121.Fl I . 122The default is 97 Hz, so that profiling doesn't run in lockstep with the clock 123interrupt (which runs at 100 Hz). 124.El 125.Ss Data Gathering 126.Bl -tag -width indent 127.It Fl x Ar arg Oo Ns = Ns Ar val Oc 128Enable or modify a 129.Xr dtrace 1 130runtime option or D compiler option. 131Boolean options are enabled by specifying their name. 132Options with values are set by separating the option name and value with an 133equals sign. 134.El 135.Ss "Data Gathering (Mutually Exclusive)" 136.Bl -tag -width indent 137.It Fl b 138Basic statistics: lock, caller, number of events. 139.It Fl h 140Histogram: timing plus time-distribution histograms. 141.It Fl s Ar depth 142Stack trace: histogram plus stack traces up to 143.Ar depth 144frames deep. 145.It Fl t 146Timing: Basic plus timing for all events (default). 147.El 148.Ss "Data Filtering" 149.Bl -tag -width indent 150.It Fl d Ar duration 151Only watch events longer than 152.Ar duration . 153.It Fl f Ar func Ns Oo Ns , Ns Ar size Oc Ns 154Only watch events generated by 155.Ar func , 156which can be specified as a symbolic name or hex address. 157.Ar size 158defaults to the ELF symbol size if available, or 1 if not. 159.It Fl l Ar lock Ns Oo Ns , Ns Ar size Oc Ns 160Only watch 161.Ar lock , 162which can be specified as a symbolic name or hex address. 163.Ar size 164defaults to the ELF symbol size or 1 if the symbol size is not available. 165.It Fl n Ar num-records 166Maximum number of data records. 167.It Fl T 168Trace (rather than sample) events. 169This is off by default. 170.El 171.Ss Data Reporting 172.Bl -tag -width indent 173.It Fl D Ar count 174Only display the top 175.Ar count 176events of each type. 177.It Fl g 178Show total events generated by function. 179For example, if 180.Fn foo 181calls 182.Fn bar 183in a loop, the work done by 184.Fn bar 185counts as work generated by 186.Fn foo 187(along with any work done by 188.Fn foo 189itself). 190The 191.Fl g 192option works by counting the total number of stack frames in which each function 193appears. 194This implies two things: (1) the data reported by 195.Fl g 196can be misleading if the stack traces are not deep enough, and (2) functions 197that are called recursively might show greater than 100% activity. 198In light of issue (1), the default data gathering mode when using 199.Fl g 200is 201.Fl s 50 . 202.It Fl k 203Coalesce PCs within functions. 204.It Fl o Ar filename 205Direct output to 206.Ar filename . 207.It Fl P 208Sort data by (\fIcount * time\fR) product. 209.It Fl p 210Parsable output format. 211.It Fl R 212Display rates (events per second) rather than counts. 213.It Fl W 214Whichever: distinguish events only by caller, not by lock. 215.It Fl w 216Wherever: distinguish events only by lock, not by caller. 217.El 218.Sh DISPLAY FORMATS 219The following headers appear over various columns of data. 220.Bl -tag -width indent 221.It Count or ops/s 222Number of times this event occurred, or the rate (times per second) if 223.Fl R 224was specified. 225.It indv 226Percentage of all events represented by this individual event. 227.It genr 228Percentage of all events generated by this function. 229.It cuml 230Cumulative percentage; a running total of the individuals. 231.It rcnt 232Average reference count. 233This will always be 1 for exclusive locks (mutexes, 234spin locks, rwlocks held as writer) but can be greater than 1 for shared locks 235(rwlocks held as reader). 236.It nsec 237Average duration of the events in nanoseconds, as appropriate for the event. 238For the profiling event, duration means interrupt latency. 239.It Lock 240Address of the lock; displayed symbolically if possible. 241.It CPU+Pri_Class 242CPU plus the priority class of the interrupted thread. 243For example, if CPU 4 is interrupted while running a timeshare thread, this 244will be reported as 245.Ql cpu[4]+TShar . 246.It Caller 247Address of the caller; displayed symbolically if possible. 248.El 249.Sh EXAMPLES 250.Bl -tag -width 0n 251.It Example 1 Measuring Kernel Lock Contention 252.Pp 253.Li # lockstat sleep 5 254.Bd -literal 255Adaptive mutex spin: 41411 events in 5.011 seconds (8263 events/sec) 256 257Count indv cuml rcnt nsec Lock Caller 258------------------------------------------------------------------------------- 25913750 33% 33% 0.00 72 vm_page_queue_free_mtx vm_page_free_toq+0x12e 26013648 33% 66% 0.00 66 vm_page_queue_free_mtx vm_page_alloc+0x138 261 4023 10% 76% 0.00 51 vm_dom+0x80 vm_page_dequeue+0x68 262 2672 6% 82% 0.00 186 vm_dom+0x80 vm_page_enqueue+0x63 263 618 1% 84% 0.00 31 0xfffff8000cd83a88 qsyncvp+0x37 264 506 1% 85% 0.00 164 0xfffff8000cb3f098 vputx+0x5a 265 477 1% 86% 0.00 69 0xfffff8000c7eb180 uma_dbg_getslab+0x5b 266 288 1% 87% 0.00 77 0xfffff8000cd8b000 vn_finished_write+0x29 267 263 1% 88% 0.00 103 0xfffff8000cbad448 vinactive+0xdc 268 259 1% 88% 0.00 53 0xfffff8000cd8b000 vfs_ref+0x24 269 237 1% 89% 0.00 20 0xfffff8000cbad448 vfs_hash_get+0xcc 270 233 1% 89% 0.00 22 0xfffff8000bfd9480 uma_dbg_getslab+0x5b 271 223 1% 90% 0.00 20 0xfffff8000cb3f098 cache_lookup+0x561 272 193 0% 90% 0.00 16 0xfffff8000cb40ba8 vref+0x27 273 175 0% 91% 0.00 34 0xfffff8000cbad448 vputx+0x5a 274 169 0% 91% 0.00 51 0xfffff8000cd8b000 vfs_unbusy+0x27 275 164 0% 92% 0.00 31 0xfffff8000cb40ba8 vputx+0x5a 276[...] 277 278Adaptive mutex block: 10 events in 5.011 seconds (2 events/sec) 279 280Count indv cuml rcnt nsec Lock Caller 281------------------------------------------------------------------------------- 282 3 30% 30% 0.00 17592 vm_page_queue_free_mtx vm_page_alloc+0x138 283 2 20% 50% 0.00 20528 vm_dom+0x80 vm_page_enqueue+0x63 284 2 20% 70% 0.00 55502 0xfffff8000cb40ba8 vputx+0x5a 285 1 10% 80% 0.00 12007 vm_page_queue_free_mtx vm_page_free_toq+0x12e 286 1 10% 90% 0.00 9125 0xfffff8000cbad448 vfs_hash_get+0xcc 287 1 10% 100% 0.00 7864 0xfffff8000cd83a88 qsyncvp+0x37 288------------------------------------------------------------------------------- 289[...] 290.Ed 291.It Example 2 Measuring Hold Times 292.Pp 293.Li # lockstat -H -D 10 sleep 1 294.Bd -literal 295Adaptive mutex hold: 109589 events in 1.039 seconds (105526 events/sec) 296 297Count indv cuml rcnt nsec Lock Caller 298------------------------------------------------------------------------------- 299 8998 8% 8% 0.00 617 0xfffff8000c7eb180 uma_dbg_getslab+0xd4 300 5901 5% 14% 0.00 917 vm_page_queue_free_mtx vm_object_terminate+0x16a 301 5040 5% 18% 0.00 902 vm_dom+0x80 vm_page_free_toq+0x88 302 4884 4% 23% 0.00 1056 vm_page_queue_free_mtx vm_page_alloc+0x44e 303 4664 4% 27% 0.00 759 vm_dom+0x80 vm_fault_hold+0x1a13 304 4011 4% 31% 0.00 888 vm_dom vm_page_advise+0x11b 305 4010 4% 34% 0.00 957 vm_dom+0x80 _vm_page_deactivate+0x5c 306 3743 3% 38% 0.00 582 0xfffff8000cf04838 pmap_is_prefaultable+0x158 307 2254 2% 40% 0.00 952 vm_dom vm_page_free_toq+0x88 308 1639 1% 41% 0.00 591 0xfffff800d60065b8 trap_pfault+0x1f7 309------------------------------------------------------------------------------- 310[...] 311 312R/W writer hold: 64314 events in 1.039 seconds (61929 events/sec) 313 314Count indv cuml rcnt nsec Lock Caller 315------------------------------------------------------------------------------- 316 7421 12% 12% 0.00 2994 pvh_global_lock pmap_page_is_mapped+0xb6 317 4668 7% 19% 0.00 3313 pvh_global_lock pmap_enter+0x9ae 318 1639 3% 21% 0.00 733 0xfffff80168d10200 vm_object_deallocate+0x683 319 1639 3% 24% 0.00 3061 0xfffff80168d10200 unlock_and_deallocate+0x2b 320 1639 3% 26% 0.00 2966 0xfffff80168d10200 vm_fault_hold+0x16ee 321 1567 2% 29% 0.00 733 0xfffff80168d10200 vm_fault_hold+0x19bc 322 821 1% 30% 0.00 786 0xfffff801eb0cc000 vm_object_madvise+0x32d 323 649 1% 31% 0.00 4918 0xfffff80191105300 vm_fault_hold+0x16ee 324 648 1% 32% 0.00 8112 0xfffff80191105300 unlock_and_deallocate+0x2b 325 647 1% 33% 0.00 1261 0xfffff80191105300 vm_object_deallocate+0x683 326------------------------------------------------------------------------------- 327.Ed 328.It Example 3 Measuring Hold Times for Stack Traces Containing a Specific Function 329.Pp 330.Li # lockstat -H -f tcp_input -s 50 -D 10 sleep 1 331.Bd -literal 332Adaptive mutex hold: 68 events in 1.026 seconds (66 events/sec) 333 334------------------------------------------------------------------------------- 335Count indv cuml rcnt nsec Lock Caller 336 32 47% 47% 0.00 1631 0xfffff800686f50d8 tcp_do_segment+0x284b 337 338 nsec ------ Time Distribution ------ count Stack 339 1024 |@@@@@@@@@@ 11 tcp_input+0xf54 340 2048 |@@@@@@@@@@@@@ 14 ip_input+0xc8 341 4096 |@@@@@ 6 swi_net+0x192 342 8192 | 1 intr_event_execute_handlers+0x93 343 ithread_loop+0xa6 344 fork_exit+0x84 345 0xffffffff808cf9ee 346------------------------------------------------------------------------------- 347Count indv cuml rcnt nsec Lock Caller 348 29 43% 90% 0.00 4851 0xfffff800686f50d8 sowakeup+0xf8 349 350 nsec ------ Time Distribution ------ count Stack 351 4096 |@@@@@@@@@@@@@@@ 15 tcp_do_segment+0x2423 352 8192 |@@@@@@@@@@@@ 12 tcp_input+0xf54 353 16384 |@@ 2 ip_input+0xc8 354 swi_net+0x192 355 intr_event_execute_handlers+0x93 356 ithread_loop+0xa6 357 fork_exit+0x84 358 0xffffffff808cf9ee 359------------------------------------------------------------------------------- 360[...] 361.Ed 362.El 363.Sh SEE ALSO 364.Xr dtrace 1 , 365.Xr ksyms 4 , 366.Xr locking 9 367.Sh HISTORY 368The 369.Nm 370utility first appeared in 371.Fx 7.1 . 372.Sh NOTES 373Tail-call elimination can affect call sites. 374For example, if 375.Fn foo Ns +0x50 376calls 377.Fn bar 378and the last thing 379.Fn bar 380does is call 381.Fn mtx_unlock , 382the compiler can arrange for 383.Fn bar 384to branch to 385.Fn mtx_unlock 386with a return address of 387.Fn foo Ns +0x58. 388Thus, the 389.Fn mtx_unlock 390in 391.Fn bar 392will appear as though it occurred at 393.Fn foo Ns +0x58. 394.Pp 395The PC in the stack frame in which an interrupt occurs can be bogus because, 396between function calls, the compiler is free to use the return address register 397for local storage. 398.Pp 399When using the 400.Fl I 401and 402.Fl s 403options together, the interrupted PC will usually not appear anywhere in the 404stack since the interrupt handler is entered asynchronously, not by a function 405call from that PC. 406