fb197a4f | 06-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be i
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925
show more ...
|
ea7b1377 | 04-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 mo
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
show more ...
|
fc0e38a7 | 02-Dec-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation calling into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C implementation of memccpy for overlapping inputs. However, overlapping inputs are not allowed for this function by ISO/IEC 9899:1999 and neither has the C implementation any code to deal with the possibility. It just proceeds byte-by-byte, which may or may not do the expected thing for some overlaps. We do not document whether overlapping inputs are supported in memccpy(3).
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
show more ...
|
2b7b03b7 | 29-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved b
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
show more ...
|
74d6cfad | 12-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar implementation of strlcpy just calls into strlen() and memcpy() to do the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as it needs to carry on with finding the source string length even if the buffer ends before the string.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
show more ...
|
aff9143a | 14-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we inherited from NetBSD. While it performs well, it is better to call into our SIMD implement
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we inherited from NetBSD. While it performs well, it is better to call into our SIMD implementations if any SIMD features are available at all. So do that and implement strcat() by calling into strlen() and strcpy() if these are available.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Reviison: https://reviews.freebsd.org/D42600
show more ...
|
e19d46c8 | 09-Nov-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 27578
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
90253d49 | 30-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
show more ...
|
fd2ecd91 | 24-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead of implementing the inner loop manually, just call into the optimised routine.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42346
show more ...
|
2ed514a2 | 12-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to use SWAR techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42217
show more ...
|
14289e97 | 28-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D4
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D41971 with appropriate extensions and extra code paths to pay attention to string length.
Performance is quite good. We beat both glibc (except for very long strings, but they likely use AVX which we don't) and Bionic (except for medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42122
show more ...
|
f4fc317c | 25-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-r
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D41980
show more ...
|
5048c1b8 | 15-Oct-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE) implementation was omitted this time as I was not able to get it to perform adequately. Best I got was 8% over the scalar version for long inputs, but slower for short inputs.
Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Inspired by: https://github.com/moon-chilled/fancy-memcmp Differential Revision: https://reviews.freebsd.org/D41696
show more ...
|
b2618b65 | 10-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string/memchr.S: fix behaviour with overly long buffers
When memchr(buf, c, len) is called with a phony len (say, SIZE_MAX), buf + len overflows and we have buf + len < buf. This con
lib/libc/amd64/string/memchr.S: fix behaviour with overly long buffers
When memchr(buf, c, len) is called with a phony len (say, SIZE_MAX), buf + len overflows and we have buf + len < buf. This confuses the implementation and makes it return incorrect results. Neverthless we must support this case as memchr() is guaranteed to work even with phony buffer lengths, as long as a match is found before the buffer actually ends.
Sponsored by: The FreeBSD Foundation Reported by: yuri, des Tested by: des Approved by: mjg (blanket, via IRC) MFC after: 1 week MFC to: stable/14 PR: 273652
show more ...
|
33173728 | 08-Sep-2023 |
Robert Clausecker <fuz@FreeBSD.org> |
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Founda
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41598
show more ...
|