#
4dedcb1b |
| 28-Jan-2024 |
Mark Johnston <markj@FreeBSD.org> |
libc/amd64: Disable ASAN for amd64_archlevel.c
The code in this file runs before the sanitizer can initialize its shadow map.
Fixes: ad2fac552c3f ("lib/libc/amd64: add archlevel-based simd dispatch
libc/amd64: Disable ASAN for amd64_archlevel.c
The code in this file runs before the sanitizer can initialize its shadow map.
Fixes: ad2fac552c3f ("lib/libc/amd64: add archlevel-based simd dispatch framework")
show more ...
|
#
fb197a4f |
| 06-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be i
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925
show more ...
|
#
ea7b1377 |
| 04-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 mo
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
show more ...
|
#
fc0e38a7 |
| 02-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation calling into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C implementation of memccpy for overlapping inputs. However, overlapping inputs are not allowed for this function by ISO/IEC 9899:1999 and neither has the C implementation any code to deal with the possibility. It just proceeds byte-by-byte, which may or may not do the expected thing for some overlaps. We do not document whether overlapping inputs are supported in memccpy(3).
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
show more ...
|
#
2b7b03b7 |
| 29-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved b
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
show more ...
|
#
74d6cfad |
| 12-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar implementation of strlcpy just calls into strlen() and memcpy() to do the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as it needs to carry on with finding the source string length even if the buffer ends before the string.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
show more ...
|
Revision tags: release/14.0.0 |
|
#
e19d46c8 |
| 09-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 27578
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
#
90253d49 |
| 30-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
#
fd2ecd91 |
| 24-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead of implementing the inner loop manually, just call into the optimised routine.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42346
show more ...
|
#
2ed514a2 |
| 12-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to use SWAR techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42217
show more ...
|
#
14289e97 |
| 28-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D4
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D41971 with appropriate extensions and extra code paths to pay attention to string length.
Performance is quite good. We beat both glibc (except for very long strings, but they likely use AVX which we don't) and Bionic (except for medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42122
show more ...
|
#
f4fc317c |
| 25-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-r
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D41980
show more ...
|
#
5048c1b8 |
| 15-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE) implementation was omitted this time as I was not able to get it to perform adequately. Best I got was 8% over the scalar version for long inputs, but slower for short inputs.
Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Inspired by: https://github.com/moon-chilled/fancy-memcmp Differential Revision: https://reviews.freebsd.org/D41696
show more ...
|
#
76c2b331 |
| 30-Aug-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has been written to use only instructions specified as having d
lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has been written to use only instructions specified as having data operand independent timing by Intel.
Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Differential Revision: https://reviews.freebsd.org/D41673
show more ...
|
#
33173728 |
| 08-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Founda
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41598
show more ...
|
#
de12a689 |
| 24-Aug-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are slight changes to account for the buffer having an explicit buffer lengt
lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are slight changes to account for the buffer having an explicit buffer length.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41598
show more ...
|
#
7084133c |
| 21-Aug-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations from D41557, but we can't do the fast paths the same way.
S
lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations from D41557, but we can't do the fast paths the same way.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41567
show more ...
|
#
474408bb |
| 13-Aug-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation
This changeset adds both a scalar and an x86-64-v2 implementation of the strcspn(3) function to libc. A baseline implementation
lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation
This changeset adds both a scalar and an x86-64-v2 implementation of the strcspn(3) function to libc. A baseline implementation does not appear to be feasible given the requirements of the function.
The scalar implementation is similar to the generic libc implementation, but expands the bit set into a byte set to reduce latency, improving performance. This approach could probably be backported to the generic C version to benefit other platforms.
The x86-64-v2 implementation is built around the infamous pcmpistri instruction. An alternative implementation based on the Muła/Langdale algorithm [1] was prototyped, but performed worse than the pcmpistri approach except for sets of more than 16 characters with long input strings.
All implementations provide special cases for the empty set (reduces to strlen as well as single-character sets (reduces to strchr). The x86-64-v2 kernel falls back to the scalar implementation for sets of more than 32 characters. This limit could be raised by additional multiples of 16 through the use of additional pcmpistri code paths, but I consider this case to be too rare to be of importance.
[1]: http://0x80.pl/articles/simd-byte-lookup.html
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41557
show more ...
|
#
9fbea870 |
| 05-Jul-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string/stpcpy.S: add baseline implementation
This commit adds a baseline implementation of stpcpy(3) for amd64. It performs quite well in comparison to the previous scalar implementat
lib/libc/amd64/string/stpcpy.S: add baseline implementation
This commit adds a baseline implementation of stpcpy(3) for amd64. It performs quite well in comparison to the previous scalar implementation as well as agains bionic and glibc (though glibc is faster for very long strings). Fiddle with the Makefile to also have strcpy(3) call into the optimised stpcpy(3) code, fixing an oversight from D9841.
Sponsored by: The FreeBSD Foundation Reviewed by: imp ngie emaste Approved by: mjg kib Fixes: D9841 Differential Revision: https://reviews.freebsd.org/D41349
show more ...
|
#
d0b2dbfa |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
|
#
61f4c4d3 |
| 30-Jun-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)
A lot better than the generic (pre) implementaion. We do not beat glibc for long strings, likely due to glibc switching to AV
lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)
A lot better than the generic (pre) implementaion. We do not beat glibc for long strings, likely due to glibc switching to AVX once the input is sufficiently long. X86-64-v3 and v4 implementations may be added at a future time.
os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ Short 129.68µ ± 3% 59.91µ ± 1% -53.80% (p=0.000 n=20) 44.37µ ± 1% -65.79% (p=0.000 n=20) Mid 21.15µ ± 0% 19.30µ ± 0% -8.76% (p=0.000 n=20) 12.30µ ± 0% -41.85% (p=0.000 n=20) Long 13.772µ ± 0% 11.028µ ± 0% -19.92% (p=0.000 n=20) 3.285µ ± 0% -76.15% (p=0.000 n=20) geomean 33.55µ 23.36µ -30.37% 12.15µ -63.80%
│ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ B/s │ B/s vs base │ B/s vs base │ Short 919.3Mi ± 3% 1989.7Mi ± 1% +116.45% (p=0.000 n=20) 2686.8Mi ± 1% +192.28% (p=0.000 n=20) Mid 5.505Gi ± 0% 6.033Gi ± 0% +9.60% (p=0.000 n=20) 9.466Gi ± 0% +71.97% (p=0.000 n=20) Long 8.453Gi ± 0% 10.557Gi ± 0% +24.88% (p=0.000 n=20) 35.441Gi ± 0% +319.26% (p=0.000 n=20) geomean 3.470Gi 4.983Gi +43.62% 9.584Gi +176.22%
For comparison, glibc on the same machine:
│ strchrnul_glibc.out │ │ sec/op │ Short 49.73µ ± 0% Mid 14.60µ ± 0% Long 1.237µ ± 0% geomean 9.646µ
│ strchrnul_glibc.out │ │ B/s │ Short 2.341Gi ± 0% Mid 7.976Gi ± 0% Long 94.14Gi ± 0% geomean 12.07Gi
Sponsored by: The FreeBSD Foundation Approved by: mjg Differential Revision: https://reviews.freebsd.org/D41333
show more ...
|
#
ad2fac55 |
| 04-Aug-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64: add archlevel-based simd dispatch framework
Add a framework for selecting from one of multiple implementations of a function based on amd64 architecture level (cf. amd64 SysV ABI sup
lib/libc/amd64: add archlevel-based simd dispatch framework
Add a framework for selecting from one of multiple implementations of a function based on amd64 architecture level (cf. amd64 SysV ABI supplement).
Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: jrtc27 Differential Revision: https://reviews.freebsd.org/D40693
show more ...
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0 |
|
#
fbc002cb |
| 25-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: bring back asm bcmp, shared with memcmp
Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to bcmp calls.
Reviewed by: emaste (previous version), jhb (previous version) Differ
amd64: bring back asm bcmp, shared with memcmp
Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to bcmp calls.
Reviewed by: emaste (previous version), jhb (previous version) Differential Revision: https://reviews.freebsd.org/D34673
show more ...
|
#
5fc3cc27 |
| 12-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: make bcmp in libc just call memcmp
Preferably bcmp would just alias memcmp but there is build magic which makes this problematic.
Reviewed by: jhb Differential Revision: https://reviews.fre
amd64: make bcmp in libc just call memcmp
Preferably bcmp would just alias memcmp but there is build magic which makes this problematic.
Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28846
show more ...
|
Revision tags: release/12.3.0, release/13.0.0 |
|
#
7f06b217 |
| 21-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: import asm strlen into libc
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28845
|