| 32d1f188 | 03-Feb-2026 |
Andrew Turner <andrew@FreeBSD.org> |
libc/aarch64: Add memset for a 64 byte dc zva
On arm64 we can use the "dc zva" instruction to zero memory. The CPU tells software if the instruction is implemented, and if so the size and alignment
libc/aarch64: Add memset for a 64 byte dc zva
On arm64 we can use the "dc zva" instruction to zero memory. The CPU tells software if the instruction is implemented, and if so the size and alignment it will use.
When the size is 64-bytes the Arm Optimized Routines implementation of memset can use dc zva to zero memory, and has a build flag to skip checking.
Use this flag to build a version of memset that will be used when this assumption is true.
Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D54776
show more ...
|
| 3f224333 | 09-Dec-2024 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/aarch64/string: add timingsafe_memcmp() assembly implementation
A port of the amd64 implementation with some slight changes due to differences in instructions provided by aarch64.
No ASIMD
lib/libc/aarch64/string: add timingsafe_memcmp() assembly implementation
A port of the amd64 implementation with some slight changes due to differences in instructions provided by aarch64.
No ASIMD for the same reason as the amd64 code: it's just not particularly suitable for this application.
Event: EuroBSDcon 2024 Approved by: security (cperciva) Reviewed by: getz, cperciva Differential Revision: https://reviews.freebsd.org/D46758
show more ...
|
| f2c98669 | 09-Dec-2024 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/aarch64/string: add ASIMD-enhanced timingsafe_bcmp implementation
A straightforward port of the amd64 implementation.
Approved by: security (cperciva) Reviewed by: getz, cperciva Event: E
lib/libc/aarch64/string: add ASIMD-enhanced timingsafe_bcmp implementation
A straightforward port of the amd64 implementation.
Approved by: security (cperciva) Reviewed by: getz, cperciva Event: EuroBSDcon 2024 Differential Revision: https://reviews.freebsd.org/D46757
show more ...
|
| 79e01e7e | 28-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add bcopy & bzero wrapper
This patch enabled usage of SIMD enhanced functions to implement bcopy and bzero.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: G
lib/libc/aarch64/string: add bcopy & bzero wrapper
This patch enabled usage of SIMD enhanced functions to implement bcopy and bzero.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46459
show more ...
|
| 3863fec1 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hope
lib/libc/aarch64/string: add strlen SIMD implementation
Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines.
See the DR for bechmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623
show more ...
|
| 5ebd4d0d | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add memcpy SIMD implementation
I noticed that we have a SIMD optimized memcpy in the arm-optimized-routines in /contrib.
This patch ensures we use the SIMD variant as oppos
lib/libc/aarch64/string: add memcpy SIMD implementation
I noticed that we have a SIMD optimized memcpy in the arm-optimized-routines in /contrib.
This patch ensures we use the SIMD variant as opposed to the Scalar optimized variant.
Benchmarks are generated by fuz' strperf utility.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46251
show more ...
|
| bea89d03 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlcat SIMD implementation
This patch requires D46243 as it depends on strlcpy being labeled __strlcpy.
It's a direct copy from the amd64 string functions using memchr
lib/libc/aarch64/string: add strlcat SIMD implementation
This patch requires D46243 as it depends on strlcpy being labeled __strlcpy.
It's a direct copy from the amd64 string functions using memchr and strlcpy to implement strlcat.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46272
show more ...
|
| 3dc54291 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strncat SIMD implementation
This patch requires D46170 as it depends on strlcpy being labeled __memccpy.
It's a direct copy from the amd64 string functions.
Tested by:
lib/libc/aarch64/string: add strncat SIMD implementation
This patch requires D46170 as it depends on strlcpy being labeled __memccpy.
It's a direct copy from the amd64 string functions.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46292
show more ...
|
| bad17991 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add memccpy SIMD implementation
This changeset includes a port of the SIMD implementation of memccpy for amd64 to Aarch64.
Performance is significantly better than the scal
lib/libc/aarch64/string: add memccpy SIMD implementation
This changeset includes a port of the SIMD implementation of memccpy for amd64 to Aarch64.
Performance is significantly better than the scalar implementation except for short strings.
Benchmark results are as usual generated by the strperf utility written by fuz.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46170
show more ...
|
| 25c485e1 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strncmp SIMD implementation
This changeset includes a port of the SIMD implementation of strncmp for amd64 to Aarch64.
It is based on D45839 with added handling for the
lib/libc/aarch64/string: add strncmp SIMD implementation
This changeset includes a port of the SIMD implementation of strncmp for amd64 to Aarch64.
It is based on D45839 with added handling for the limit.
An extended unit test for strncmp is currently being written to make sure the bounds checks for page crossings work as expected.
Performance is significantly better than the existing implementation from the Arm Optimized Routines repository.
Benchmark results are generated by the strperf utility by fuz.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45943
show more ...
|
| 756b7fc8 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strlcpy SIMD implementation
This changeset includes a port of the SIMD implementation of strlcpy for amd64 to Aarch64.
It is based on memccpy (D46170) with some minor d
lib/libc/aarch64/string: add strlcpy SIMD implementation
This changeset includes a port of the SIMD implementation of strlcpy for amd64 to Aarch64.
It is based on memccpy (D46170) with some minor differences.
Performance is significantly better than the scalar implementation.
Benchmark results are as usual generated by the strperf utility written by fuz.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46243
show more ...
|
| 79287d78 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: strcat enable use of SIMD
Call into SIMD strlen and stpcpy for an optimized strcat. Port of D42600 for amd64.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by:
lib/libc/aarch64/string: strcat enable use of SIMD
Call into SIMD strlen and stpcpy for an optimized strcat. Port of D42600 for amd64.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46417
show more ...
|
| 89b38723 | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add optimized strpbrk & strsep implementations
These are direct copies from the amd64 string functions using the optimized strcspn from D46398
Tested by: fuz (exprun) Revie
lib/libc/aarch64/string: add optimized strpbrk & strsep implementations
These are direct copies from the amd64 string functions using the optimized strcspn from D46398
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46399
show more ...
|
| f2bd390a | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strcspn optimized implementation
This is a port of the Scalar optimized variant of strcspn for amd64 to aarch64 It utilizes a LUT to speed up the function, a SIMD varian
lib/libc/aarch64/string: add strcspn optimized implementation
This is a port of the Scalar optimized variant of strcspn for amd64 to aarch64 It utilizes a LUT to speed up the function, a SIMD variant is still under development.
Performance benchmarks are as usual generated by strperf.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46398
show more ...
|
| b91003ac | 26-Aug-2024 |
Getz Mikalsen <getz@FreeBSD.org> |
lib/libc/aarch64/string: add strspn optimized implementation
This is a port of the Scalar optimized variant of strspn for amd64 to aarch64.
It utilizes a LUT to speed up the function, a SIMD varian
lib/libc/aarch64/string: add strspn optimized implementation
This is a port of the Scalar optimized variant of strspn for amd64 to aarch64.
It utilizes a LUT to speed up the function, a SIMD variant is still under development.
See the DR for benchmark results.
Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46396
show more ...
|