1perf-bench(1) 2============= 3 4NAME 5---- 6perf-bench - General framework for benchmark suites 7 8SYNOPSIS 9-------- 10[verse] 11'perf bench' [<common options>] <subsystem> <suite> [<options>] 12 13DESCRIPTION 14----------- 15This 'perf bench' command is a general framework for benchmark suites. 16 17COMMON OPTIONS 18-------------- 19-r:: 20--repeat=:: 21Specify number of times to repeat the run (default 10). 22 23-f:: 24--format=:: 25Specify format style. 26Current available format styles are: 27 28'default':: 29Default style. This is mainly for human reading. 30--------------------- 31% perf bench sched pipe # with no style specified 32(executing 1000000 pipe operations between two tasks) 33 Total time:5.855 sec 34 5.855061 usecs/op 35 170792 ops/sec 36--------------------- 37 38'simple':: 39This simple style is friendly for automated 40processing by scripts. 41--------------------- 42% perf bench --format=simple sched pipe # specified simple 435.988 44--------------------- 45 46SUBSYSTEM 47--------- 48 49'sched':: 50 Scheduler and IPC mechanisms. 51 52'syscall':: 53 System call performance (throughput). 54 55'mem':: 56 Memory access performance. 57 58'numa':: 59 NUMA scheduling and MM benchmarks. 60 61'futex':: 62 Futex stressing benchmarks. 63 64'epoll':: 65 Eventpoll (epoll) stressing benchmarks. 66 67'internals':: 68 Benchmark internal perf functionality. 69 70'uprobe':: 71 Benchmark overhead of uprobe + BPF. 72 73'all':: 74 All benchmark subsystems. 75 76SUITES FOR 'sched' 77~~~~~~~~~~~~~~~~~~ 78*messaging*:: 79Suite for evaluating performance of scheduler and IPC mechanisms. 80Based on hackbench by Rusty Russell. 81 82Options of *messaging* 83^^^^^^^^^^^^^^^^^^^^^^ 84-p:: 85--pipe:: 86Use pipe() instead of socketpair() 87 88-t:: 89--thread:: 90Be multi thread instead of multi process 91 92-g:: 93--group=:: 94Specify number of groups 95 96-l:: 97--nr_loops=:: 98Specify number of loops 99 100Example of *messaging* 101^^^^^^^^^^^^^^^^^^^^^^ 102 103--------------------- 104% perf bench sched messaging # run with default 105options (20 sender and receiver processes per group) 106(10 groups == 400 processes run) 107 108 Total time:0.308 sec 109 110% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups 111(20 sender and receiver threads per group) 112(20 groups == 800 threads run) 113 114 Total time:0.582 sec 115--------------------- 116 117*pipe*:: 118Suite for pipe() system call. 119Based on pipe-test-1m.c by Ingo Molnar. 120 121Options of *pipe* 122^^^^^^^^^^^^^^^^^ 123-l:: 124--loop=:: 125Specify number of loops. 126 127-G:: 128--cgroups=:: 129Names of cgroups for sender and receiver, separated by a comma. 130This is useful to check cgroup context switching overhead. 131Note that perf doesn't create nor delete the cgroups, so users should 132make sure that the cgroups exist and are accessible before use. 133 134 135Example of *pipe* 136^^^^^^^^^^^^^^^^^ 137 138--------------------- 139% perf bench sched pipe 140(executing 1000000 pipe operations between two tasks) 141 142 Total time:8.091 sec 143 8.091833 usecs/op 144 123581 ops/sec 145 146% perf bench sched pipe -l 1000 # loop 1000 147(executing 1000 pipe operations between two tasks) 148 149 Total time:0.016 sec 150 16.948000 usecs/op 151 59004 ops/sec 152 153% perf bench sched pipe -G AAA,BBB 154(executing 1000000 pipe operations between cgroups) 155# Running 'sched/pipe' benchmark: 156# Executed 1000000 pipe operations between two processes 157 158 Total time: 6.886 [sec] 159 160 6.886208 usecs/op 161 145217 ops/sec 162 163--------------------- 164 165SUITES FOR 'syscall' 166~~~~~~~~~~~~~~~~~~ 167*basic*:: 168Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics). 169This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not 170cached by glibc. 171 172 173SUITES FOR 'mem' 174~~~~~~~~~~~~~~~~ 175*memcpy*:: 176Suite for evaluating performance of simple memory copy in various ways. 177 178Options of *memcpy* 179^^^^^^^^^^^^^^^^^^^ 180-s:: 181--size:: 182Specify size of memory to copy (default: 1MB). 183Available units are B, KB, MB, GB and TB (case insensitive). 184 185-p:: 186--page:: 187Specify page-size for mapping memory buffers (default: 4KB). 188Available values are 4KB, 2MB, 1GB (case insensitive). 189 190-k:: 191--chunk:: 192Specify the chunk-size for each invocation. (default: 0, or full-extent) 193Available units are B, KB, MB, GB and TB (case insensitive). 194 195-f:: 196--function:: 197Specify function to copy (default: default). 198Available functions are depend on the architecture. 199On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported. 200 201-l:: 202--nr_loops:: 203Repeat memcpy invocation this number of times. 204 205-c:: 206--cycles:: 207Use perf's cpu-cycles event instead of gettimeofday syscall. 208 209*memset*:: 210Suite for evaluating performance of simple memory set in various ways. 211 212Options of *memset* 213^^^^^^^^^^^^^^^^^^^ 214-s:: 215--size:: 216Specify size of memory to set (default: 1MB). 217Available units are B, KB, MB, GB and TB (case insensitive). 218 219-p:: 220--page:: 221Specify page-size for mapping memory buffers (default: 4KB). 222Available values are 4KB, 2MB, 1GB (case insensitive). 223 224-k:: 225--chunk:: 226Specify the chunk-size for each invocation. (default: 0, or full-extent) 227Available units are B, KB, MB, GB and TB (case insensitive). 228 229-f:: 230--function:: 231Specify function to set (default: default). 232Available functions are depend on the architecture. 233On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported. 234 235-l:: 236--nr_loops:: 237Repeat memset invocation this number of times. 238 239-c:: 240--cycles:: 241Use perf's cpu-cycles event instead of gettimeofday syscall. 242 243*mmap*:: 244Suite for evaluating memory subsystem performance for mmap()'d memory. 245 246Options of *mmap* 247^^^^^^^^^^^^^^^^^ 248-s:: 249--size:: 250Specify size of memory to set (default: 1MB). 251Available units are B, KB, MB, GB and TB (case insensitive). 252 253-p:: 254--page:: 255Specify page-size for mapping memory buffers (default: 4KB). 256Available values are 4KB, 2MB, 1GB (case insensitive). 257 258-r:: 259--randomize:: 260Specify seed to randomize page access offset (default: 0, or not randomized). 261 262-f:: 263--function:: 264Specify function to set (default: all). 265Available functions are 'demand' and 'populate', with the first 266demand faulting pages in the region and the second using an eager 267mapping. 268 269-l:: 270--nr_loops:: 271Repeat mmap() invocation this number of times. 272 273-c:: 274--cycles:: 275Use perf's cpu-cycles event instead of gettimeofday syscall. 276 277SUITES FOR 'numa' 278~~~~~~~~~~~~~~~~~ 279*mem*:: 280Suite for evaluating NUMA workloads. 281 282SUITES FOR 'futex' 283~~~~~~~~~~~~~~~~~~ 284*hash*:: 285Suite for evaluating hash tables. 286 287*wake*:: 288Suite for evaluating wake calls. 289 290*wake-parallel*:: 291Suite for evaluating parallel wake calls. 292 293*requeue*:: 294Suite for evaluating requeue calls. 295 296*lock-pi*:: 297Suite for evaluating futex lock_pi calls. 298 299SUITES FOR 'epoll' 300~~~~~~~~~~~~~~~~~~ 301*wait*:: 302Suite for evaluating concurrent epoll_wait calls. 303 304*ctl*:: 305Suite for evaluating multiple epoll_ctl calls. 306 307SUITES FOR 'internals' 308~~~~~~~~~~~~~~~~~~~~~~ 309*synthesize*:: 310Suite for evaluating perf's event synthesis performance. 311 312SEE ALSO 313-------- 314linkperf:perf[1] 315