History log of /freebsd/lib/libc/amd64/string/Makefile.inc (Results 1 – 25 of 43)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 4dedcb1b 28-Jan-2024 Mark Johnston <markj@FreeBSD.org>

libc/amd64: Disable ASAN for amd64_archlevel.c

The code in this file runs before the sanitizer can initialize its
shadow map.

Fixes: ad2fac552c3f ("lib/libc/amd64: add archlevel-based simd dispatch

libc/amd64: Disable ASAN for amd64_archlevel.c

The code in this file runs before the sanitizer can initialize its
shadow map.

Fixes: ad2fac552c3f ("lib/libc/amd64: add archlevel-based simd dispatch framework")

show more ...


# fb197a4f 06-Dec-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add memrchr() scalar, baseline implementation

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
i

lib/libc/amd64/string: add memrchr() scalar, baseline implementation

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation is similar to timingsafe_memcmp. It's
slightly slower than memchr() due to the more complicated main
loop, but I don't think that can be significantly improved.

Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42925

show more ...


# ea7b1377 04-Dec-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()

This picks up the accelerated implementation of memccpy().

Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 mo

lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()

This picks up the accelerated implementation of memccpy().

Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42902

show more ...


# fc0e38a7 02-Dec-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add memccpy scalar, baseline implementation

Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation

lib/libc/amd64/string: add memccpy scalar, baseline implementation

Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too.

Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
has the C implementation any code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).

Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42902

show more ...


# 2b7b03b7 29-Nov-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strlcat() through strlcpy()

This should pick up our optimised memchr(), strlen(), and strlcpy()
when strlcat() is called.

Tested by: developers@, exp-run
Approved b

lib/libc/amd64/string: implement strlcat() through strlcpy()

This should pick up our optimised memchr(), strlen(), and strlcpy()
when strlcat() is called.

Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863

show more ...


# 74d6cfad 12-Nov-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strlcpy scalar, baseline implementation

Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the

lib/libc/amd64/string: add strlcpy scalar, baseline implementation

Somewhat similar to stpncpy, but different in that we need to compute
the full source length even if the buffer is shorter than the source.

strlcat is implemented as a simple wrapper around strlcpy. The scalar
implementation of strlcpy just calls into strlen() and memcpy() to do
the job.

Perf-wise we're very close to stpncpy. The code is slightly slower as
it needs to carry on with finding the source string length even if the
buffer ends before the string.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42863

show more ...


Revision tags: release/14.0.0
# e19d46c8 09-Nov-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strncpy() by calling stpncpy()

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 27578

lib/libc/amd64/string: implement strncpy() by calling stpncpy()

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519

show more ...


# 90253d49 30-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it ju

lib/libc/amd64/string: add stpncpy scalar, baseline implementation

This was surprisingly annoying to get right, despite being such a simple
function. A scalar implementation is also provided, it just calls into
our optimised memchr(), memcpy(), and memset() routines to carry out its
job.

I'm quite happy with the performance. glibc only beats us for very long
strings, likely due to the use of AVX-512. The scalar implementation
just calls into our optimised memchr(), memcpy(), and memset() routines,
so it has a high overhead to begin with but then performs ok for the
amount of effort that went into it. Still beats the old C code, except
for very short strings.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42519

show more ...


# fd2ecd91 24-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strsep() through strcspn()

The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead

lib/libc/amd64/string: implement strsep() through strcspn()

The strsep() function is basically strcspn() with extra steps.
On amd64, we now have an optimised implementation of strcspn(),
so instead of implementing the inner loop manually, just call
into the optimised routine.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42346

show more ...


# 2ed514a2 12-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strrchr scalar, baseline implementation

The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to

lib/libc/amd64/string: add strrchr scalar, baseline implementation

The baseline implementation is very straightforward, while the scalar
implementation suffers from register pressure and the need to use SWAR
techniques similar to those used for strchr().

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42217

show more ...


# 14289e97 28-Sep-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strncmp scalar, baseline implementation

The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D4

lib/libc/amd64/string: add strncmp scalar, baseline implementation

The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.

Performance is quite good. We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D42122

show more ...


# f4fc317c 25-Sep-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strpbrk() through strcspn()

This lets us use our optimised strcspn() routine for strpbrk() calls.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-r

lib/libc/amd64/string: implement strpbrk() through strcspn()

This lets us use our optimised strcspn() routine for strpbrk() calls.

Sponsored by: The FreeBSD Foundation
Tested by: developers@, exp-run
Approved by: mjg
MFC after: 1 month
MFC to: stable/14
PR: 275785
Differential Revision: https://reviews.freebsd.org/D41980

show more ...


# 5048c1b8 15-Oct-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation

Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's fancy memcmp. A baseline (

lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation

Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE)
implementation was omitted this time as I was not able to get it to
perform adequately. Best I got was 8% over the scalar version for
long inputs, but slower for short inputs.

Sponsored by: The FreeBSD Foundation
Approved by: security (cperciva)
Inspired by: https://github.com/moon-chilled/fancy-memcmp
Differential Revision: https://reviews.freebsd.org/D41696

show more ...


# 76c2b331 30-Aug-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations

Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
d

lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations

Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.

Sponsored by: The FreeBSD Foundation
Approved by: security (cperciva)
Differential Revision: https://reviews.freebsd.org/D41673

show more ...


# 33173728 08-Sep-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: implement strnlen(3) trough memchr(3)

Now that we have an optimised memchr(3), we can use it to implement
strnlen(3) with better perofrmance.

Sponsored by: The FreeBSD Founda

lib/libc/amd64/string: implement strnlen(3) trough memchr(3)

Now that we have an optimised memchr(3), we can use it to implement
strnlen(3) with better perofrmance.

Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41598

show more ...


# de12a689 24-Aug-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add memchr(3) scalar, baseline implementation

This is conceptually similar to strchr(3), but there are
slight changes to account for the buffer having an explicit
buffer lengt

lib/libc/amd64/string: add memchr(3) scalar, baseline implementation

This is conceptually similar to strchr(3), but there are
slight changes to account for the buffer having an explicit
buffer length.

Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41598

show more ...


# 7084133c 21-Aug-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation

This is conceptually very similar to the strcspn(3) implementations
from D41557, but we can't do the fast paths the same way.

S

lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation

This is conceptually very similar to the strcspn(3) implementations
from D41557, but we can't do the fast paths the same way.

Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41567

show more ...


# 474408bb 13-Aug-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation

This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation

lib/libc/amd64/string: add strcspn(3) scalar, x86-64-v2 implementation

This changeset adds both a scalar and an x86-64-v2 implementation
of the strcspn(3) function to libc. A baseline implementation does not
appear to be feasible given the requirements of the function.

The scalar implementation is similar to the generic libc implementation,
but expands the bit set into a byte set to reduce latency, improving
performance. This approach could probably be backported to the generic
C version to benefit other platforms.

The x86-64-v2 implementation is built around the infamous pcmpistri
instruction. An alternative implementation based on the Muła/Langdale
algorithm [1] was prototyped, but performed worse than the pcmpistri
approach except for sets of more than 16 characters with long input
strings.

All implementations provide special cases for the empty set (reduces to
strlen as well as single-character sets (reduces to strchr). The
x86-64-v2 kernel falls back to the scalar implementation for sets of
more than 32 characters. This limit could be raised by additional
multiples of 16 through the use of additional pcmpistri code paths, but
I consider this case to be too rare to be of importance.

[1]: http://0x80.pl/articles/simd-byte-lookup.html

Sponsored by: The FreeBSD Foundation
Approved by: mjg
MFC after: 1 week
MFC to: stable/14
Differential Revision: https://reviews.freebsd.org/D41557

show more ...


# 9fbea870 05-Jul-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string/stpcpy.S: add baseline implementation

This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementat

lib/libc/amd64/string/stpcpy.S: add baseline implementation

This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementation
as well as agains bionic and glibc (though glibc is faster for very long
strings). Fiddle with the Makefile to also have strcpy(3) call into the
optimised stpcpy(3) code, fixing an oversight from D9841.

Sponsored by: The FreeBSD Foundation
Reviewed by: imp ngie emaste
Approved by: mjg kib
Fixes: D9841
Differential Revision: https://reviews.freebsd.org/D41349

show more ...


# d0b2dbfa 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line sh pattern

Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/


# 61f4c4d3 30-Jun-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)

A lot better than the generic (pre) implementaion. We do not beat glibc
for long strings, likely due to glibc switching to AV

lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)

A lot better than the generic (pre) implementaion. We do not beat glibc
for long strings, likely due to glibc switching to AVX once the input is
sufficiently long. X86-64-v3 and v4 implementations may be added at a
future time.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
│ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │
│ sec/op │ sec/op vs base │ sec/op vs base │
Short 129.68µ ± 3% 59.91µ ± 1% -53.80% (p=0.000 n=20) 44.37µ ± 1% -65.79% (p=0.000 n=20)
Mid 21.15µ ± 0% 19.30µ ± 0% -8.76% (p=0.000 n=20) 12.30µ ± 0% -41.85% (p=0.000 n=20)
Long 13.772µ ± 0% 11.028µ ± 0% -19.92% (p=0.000 n=20) 3.285µ ± 0% -76.15% (p=0.000 n=20)
geomean 33.55µ 23.36µ -30.37% 12.15µ -63.80%

│ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │
│ B/s │ B/s vs base │ B/s vs base │
Short 919.3Mi ± 3% 1989.7Mi ± 1% +116.45% (p=0.000 n=20) 2686.8Mi ± 1% +192.28% (p=0.000 n=20)
Mid 5.505Gi ± 0% 6.033Gi ± 0% +9.60% (p=0.000 n=20) 9.466Gi ± 0% +71.97% (p=0.000 n=20)
Long 8.453Gi ± 0% 10.557Gi ± 0% +24.88% (p=0.000 n=20) 35.441Gi ± 0% +319.26% (p=0.000 n=20)
geomean 3.470Gi 4.983Gi +43.62% 9.584Gi +176.22%

For comparison, glibc on the same machine:

│ strchrnul_glibc.out │
│ sec/op │
Short 49.73µ ± 0%
Mid 14.60µ ± 0%
Long 1.237µ ± 0%
geomean 9.646µ

│ strchrnul_glibc.out │
│ B/s │
Short 2.341Gi ± 0%
Mid 7.976Gi ± 0%
Long 94.14Gi ± 0%
geomean 12.07Gi

Sponsored by: The FreeBSD Foundation
Approved by: mjg
Differential Revision: https://reviews.freebsd.org/D41333

show more ...


# ad2fac55 04-Aug-2023 Robert Clausecker <fuz@FreeBSD.org>

lib/libc/amd64: add archlevel-based simd dispatch framework

Add a framework for selecting from one of multiple implementations
of a function based on amd64 architecture level (cf. amd64 SysV
ABI sup

lib/libc/amd64: add archlevel-based simd dispatch framework

Add a framework for selecting from one of multiple implementations
of a function based on amd64 architecture level (cf. amd64 SysV
ABI supplement).

Sponsored by: The FreeBSD Foundation
Approved by: kib
Reviewed by: jrtc27
Differential Revision: https://reviews.freebsd.org/D40693

show more ...


Revision tags: release/13.2.0, release/12.4.0, release/13.1.0
# fbc002cb 25-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

amd64: bring back asm bcmp, shared with memcmp

Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to
bcmp calls.

Reviewed by: emaste (previous version), jhb (previous version)
Differ

amd64: bring back asm bcmp, shared with memcmp

Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to
bcmp calls.

Reviewed by: emaste (previous version), jhb (previous version)
Differential Revision: https://reviews.freebsd.org/D34673

show more ...


# 5fc3cc27 12-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

amd64: make bcmp in libc just call memcmp

Preferably bcmp would just alias memcmp but there is build magic which
makes this problematic.

Reviewed by: jhb
Differential Revision: https://reviews.fre

amd64: make bcmp in libc just call memcmp

Preferably bcmp would just alias memcmp but there is build magic which
makes this problematic.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28846

show more ...


Revision tags: release/12.3.0, release/13.0.0
# 7f06b217 21-Feb-2021 Mateusz Guzik <mjg@FreeBSD.org>

amd64: import asm strlen into libc

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28845


12