Home
last modified time | relevance | path

Searched hist:d41bc48bfab2076f7db88d079a3a3203dd9c4a54 (Results 1 – 4 of 4) sorted by relevance

/linux/tools/testing/selftests/bpf/progs/
H A Dtrigger_bench.cdiff d41bc48bfab2076f7db88d079a3a3203dd9c4a54 Tue Nov 16 02:30:41 CET 2021 Andrii Nakryiko <andrii@kernel.org> selftests/bpf: Add uprobe triggering overhead benchmarks

Add benchmark to measure overhead of uprobes and uretprobes. Also have
a baseline (no uprobe attached) benchmark.

On my dev machine, baseline benchmark can trigger 130M user_target()
invocations. When uprobe is attached, this falls to just 700K. With
uretprobe, we get down to 520K:

$ sudo ./bench trig-uprobe-base -a
Summary: hits 131.289 ± 2.872M/s

# UPROBE
$ sudo ./bench -a trig-uprobe-without-nop
Summary: hits 0.729 ± 0.007M/s

$ sudo ./bench -a trig-uprobe-with-nop
Summary: hits 1.798 ± 0.017M/s

# URETPROBE
$ sudo ./bench -a trig-uretprobe-without-nop
Summary: hits 0.508 ± 0.012M/s

$ sudo ./bench -a trig-uretprobe-with-nop
Summary: hits 0.883 ± 0.008M/s

So there is almost 2.5x performance difference between probing nop vs
non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.

This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
and 2 microseconds for non-nop uretprobe.

For nop variants, uprobe and uretprobe overhead is down to 0.556 and
1.13 microseconds, respectively.

For comparison, just doing a very low-overhead syscall (with no BPF
programs attached anywhere) gives:

$ sudo ./bench trig-base -a
Summary: hits 4.830 ± 0.036M/s

So uprobes are about 2.67x slower than pure context switch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org
/linux/tools/testing/selftests/bpf/benchs/
H A Dbench_trigger.cdiff d41bc48bfab2076f7db88d079a3a3203dd9c4a54 Tue Nov 16 02:30:41 CET 2021 Andrii Nakryiko <andrii@kernel.org> selftests/bpf: Add uprobe triggering overhead benchmarks

Add benchmark to measure overhead of uprobes and uretprobes. Also have
a baseline (no uprobe attached) benchmark.

On my dev machine, baseline benchmark can trigger 130M user_target()
invocations. When uprobe is attached, this falls to just 700K. With
uretprobe, we get down to 520K:

$ sudo ./bench trig-uprobe-base -a
Summary: hits 131.289 ± 2.872M/s

# UPROBE
$ sudo ./bench -a trig-uprobe-without-nop
Summary: hits 0.729 ± 0.007M/s

$ sudo ./bench -a trig-uprobe-with-nop
Summary: hits 1.798 ± 0.017M/s

# URETPROBE
$ sudo ./bench -a trig-uretprobe-without-nop
Summary: hits 0.508 ± 0.012M/s

$ sudo ./bench -a trig-uretprobe-with-nop
Summary: hits 0.883 ± 0.008M/s

So there is almost 2.5x performance difference between probing nop vs
non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.

This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
and 2 microseconds for non-nop uretprobe.

For nop variants, uprobe and uretprobe overhead is down to 0.556 and
1.13 microseconds, respectively.

For comparison, just doing a very low-overhead syscall (with no BPF
programs attached anywhere) gives:

$ sudo ./bench trig-base -a
Summary: hits 4.830 ± 0.036M/s

So uprobes are about 2.67x slower than pure context switch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org
/linux/tools/testing/selftests/bpf/
H A Dbench.cdiff d41bc48bfab2076f7db88d079a3a3203dd9c4a54 Tue Nov 16 02:30:41 CET 2021 Andrii Nakryiko <andrii@kernel.org> selftests/bpf: Add uprobe triggering overhead benchmarks

Add benchmark to measure overhead of uprobes and uretprobes. Also have
a baseline (no uprobe attached) benchmark.

On my dev machine, baseline benchmark can trigger 130M user_target()
invocations. When uprobe is attached, this falls to just 700K. With
uretprobe, we get down to 520K:

$ sudo ./bench trig-uprobe-base -a
Summary: hits 131.289 ± 2.872M/s

# UPROBE
$ sudo ./bench -a trig-uprobe-without-nop
Summary: hits 0.729 ± 0.007M/s

$ sudo ./bench -a trig-uprobe-with-nop
Summary: hits 1.798 ± 0.017M/s

# URETPROBE
$ sudo ./bench -a trig-uretprobe-without-nop
Summary: hits 0.508 ± 0.012M/s

$ sudo ./bench -a trig-uretprobe-with-nop
Summary: hits 0.883 ± 0.008M/s

So there is almost 2.5x performance difference between probing nop vs
non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.

This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
and 2 microseconds for non-nop uretprobe.

For nop variants, uprobe and uretprobe overhead is down to 0.556 and
1.13 microseconds, respectively.

For comparison, just doing a very low-overhead syscall (with no BPF
programs attached anywhere) gives:

$ sudo ./bench trig-base -a
Summary: hits 4.830 ± 0.036M/s

So uprobes are about 2.67x slower than pure context switch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org
H A DMakefilediff d41bc48bfab2076f7db88d079a3a3203dd9c4a54 Tue Nov 16 02:30:41 CET 2021 Andrii Nakryiko <andrii@kernel.org> selftests/bpf: Add uprobe triggering overhead benchmarks

Add benchmark to measure overhead of uprobes and uretprobes. Also have
a baseline (no uprobe attached) benchmark.

On my dev machine, baseline benchmark can trigger 130M user_target()
invocations. When uprobe is attached, this falls to just 700K. With
uretprobe, we get down to 520K:

$ sudo ./bench trig-uprobe-base -a
Summary: hits 131.289 ± 2.872M/s

# UPROBE
$ sudo ./bench -a trig-uprobe-without-nop
Summary: hits 0.729 ± 0.007M/s

$ sudo ./bench -a trig-uprobe-with-nop
Summary: hits 1.798 ± 0.017M/s

# URETPROBE
$ sudo ./bench -a trig-uretprobe-without-nop
Summary: hits 0.508 ± 0.012M/s

$ sudo ./bench -a trig-uretprobe-with-nop
Summary: hits 0.883 ± 0.008M/s

So there is almost 2.5x performance difference between probing nop vs
non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.

This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
and 2 microseconds for non-nop uretprobe.

For nop variants, uprobe and uretprobe overhead is down to 0.556 and
1.13 microseconds, respectively.

For comparison, just doing a very low-overhead syscall (with no BPF
programs attached anywhere) gives:

$ sudo ./bench trig-base -a
Summary: hits 4.830 ± 0.036M/s

So uprobes are about 2.67x slower than pure context switch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org