| #
e3babebf |
| 22-Apr-2026 |
Ard Biesheuvel <ardb@kernel.org> |
lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
Tweak the NEON intrinsics crc64 code written for arm64 so it can be built for 32-bit ARM as well. The only workaround needed is t
lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
Tweak the NEON intrinsics crc64 code written for arm64 so it can be built for 32-bit ARM as well. The only workaround needed is to provide alternatives for vmull_p64() and vmull_high_p64() on Clang, which only defines those when building for the AArch64 or arm64ec ISA. Use the same helpers for GCC too, to avoid doubling the size of the test/validation matrix.
KUnit benchmark results (Cortex-A53 @ 1 Ghz)
Before:
# crc64_nvme_benchmark: len=1: 35 MB/s # crc64_nvme_benchmark: len=16: 78 MB/s # crc64_nvme_benchmark: len=64: 87 MB/s # crc64_nvme_benchmark: len=127: 88 MB/s # crc64_nvme_benchmark: len=128: 88 MB/s # crc64_nvme_benchmark: len=200: 89 MB/s # crc64_nvme_benchmark: len=256: 89 MB/s # crc64_nvme_benchmark: len=511: 89 MB/s # crc64_nvme_benchmark: len=512: 89 MB/s # crc64_nvme_benchmark: len=1024: 90 MB/s # crc64_nvme_benchmark: len=3173: 90 MB/s # crc64_nvme_benchmark: len=4096: 90 MB/s # crc64_nvme_benchmark: len=16384: 90 MB/s
After:
# crc64_nvme_benchmark: len=1: 32 MB/s # crc64_nvme_benchmark: len=16: 76 MB/s # crc64_nvme_benchmark: len=64: 71 MB/s # crc64_nvme_benchmark: len=127: 88 MB/s # crc64_nvme_benchmark: len=128: 618 MB/s # crc64_nvme_benchmark: len=200: 542 MB/s # crc64_nvme_benchmark: len=256: 920 MB/s # crc64_nvme_benchmark: len=511: 836 MB/s # crc64_nvme_benchmark: len=512: 1261 MB/s # crc64_nvme_benchmark: len=1024: 1531 MB/s # crc64_nvme_benchmark: len=3173: 1731 MB/s # crc64_nvme_benchmark: len=4096: 1851 MB/s # crc64_nvme_benchmark: len=16384: 1858 MB/s
Don't bother with big-endian, as it doesn't work correctly on Clang, and is barely used these days.
Note that ARM disables preemption and softirq processing when using kernel mode SIMD, so take care not to hog the CPU for too long.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://patch.msgid.link/20260422171655.3437334-15-ardb+git@google.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
061cef5f |
| 22-Apr-2026 |
Ard Biesheuvel <ardb@kernel.org> |
lib/crc: Turn NEON intrinsics crc64 implementation into common code
Move and rename the CRC64 NEON intrinsics implementation source file and rename the function name to reflect that it is NEON code
lib/crc: Turn NEON intrinsics crc64 implementation into common code
Move and rename the CRC64 NEON intrinsics implementation source file and rename the function name to reflect that it is NEON code that can be shared. This will be wired up for 32-bit ARM in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://patch.msgid.link/20260422171655.3437334-14-ardb+git@google.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
f956dc81 |
| 30-Mar-2026 |
Ard Biesheuvel <ardb@kernel.org> |
lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
Use the existing CC_FPU_CFLAGS and CC_NO_FPU_CFLAGS to pass the appropriate compiler command line options for building kernel mode NEON
lib/crc: arm64: Use existing macros for kernel-mode FPU cflags
Use the existing CC_FPU_CFLAGS and CC_NO_FPU_CFLAGS to pass the appropriate compiler command line options for building kernel mode NEON intrinsics code. This is tidier, and will make it easier to reuse the code for 32-bit ARM.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260330144630.33026-9-ardb@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
63432fd6 |
| 29-Mar-2026 |
Demian Shulhan <demyansh@gmail.com> |
lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and
lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR software implementation is slow, which creates a bottleneck in NVMe and other storage subsystems.
The acceleration is implemented using C intrinsics (<arm_neon.h>) rather than raw assembly for better readability and maintainability.
Key highlights of this implementation: - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency spikes on large buffers. - Pre-calculates and loads fold constants via vld1q_u64() to minimize register spilling. - Benchmarks show the break-even point against the generic implementation is around 128 bytes. The PMULL path is enabled only for len >= 128.
Performance results (kunit crc_benchmark on Cortex-A72): - Generic (len=4096): ~268 MB/s - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)
Signed-off-by: Demian Shulhan <demyansh@gmail.com> Link: https://lore.kernel.org/r/20260329074338.1053550-1-demyansh@gmail.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
b10749d8 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: x86: Migrate optimized CRC code into lib/crc/
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new location in lib/crc/x86/, and wire it up in the new way. This new way of o
lib/crc: x86: Migrate optimized CRC code into lib/crc/
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new location in lib/crc/x86/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
9b2d720e |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: sparc: Migrate optimized CRC code into lib/crc/
Move the sparc-optimized CRC code from arch/sparc/lib/crc* into its new location in lib/crc/sparc/, and wire it up in the new way. This new
lib/crc: sparc: Migrate optimized CRC code into lib/crc/
Move the sparc-optimized CRC code from arch/sparc/lib/crc* into its new location in lib/crc/sparc/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
2374bf23 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: s390: Migrate optimized CRC code into lib/crc/
Move the s390-optimized CRC code from arch/s390/lib/crc* into its new location in lib/crc/s390/, and wire it up in the new way. This new way
lib/crc: s390: Migrate optimized CRC code into lib/crc/
Move the s390-optimized CRC code from arch/s390/lib/crc* into its new location in lib/crc/s390/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
b5943815 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
Move the riscv-optimized CRC code from arch/riscv/lib/crc* into its new location in lib/crc/riscv/, and wire it up in the new way. This new
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
Move the riscv-optimized CRC code from arch/riscv/lib/crc* into its new location in lib/crc/riscv/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
190c253d |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
Move the powerpc-optimized CRC code from arch/powerpc/lib/crc* into its new location in lib/crc/powerpc/, and wire it up in the new way. T
lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
Move the powerpc-optimized CRC code from arch/powerpc/lib/crc* into its new location in lib/crc/powerpc/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
2b7531b2 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: arm64: Migrate optimized CRC code into lib/crc/
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new location in lib/crc/arm64/, and wire it up in the new way. This new
lib/crc: arm64: Migrate optimized CRC code into lib/crc/
Move the arm64-optimized CRC code from arch/arm64/lib/crc* into its new location in lib/crc/arm64/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
530b304f |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: arm: Migrate optimized CRC code into lib/crc/
Move the arm-optimized CRC code from arch/arm/lib/crc* into its new location in lib/crc/arm/, and wire it up in the new way. This new way of o
lib/crc: arm: Migrate optimized CRC code into lib/crc/
Move the arm-optimized CRC code from arch/arm/lib/crc* into its new location in lib/crc/arm/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
0bcfca56 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
Rework how lib/crc/ supports arch-optimized code. First, instead of the arch-optimized CRC code being in arch/$(SRCARCH)/lib/, it wil
lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
Rework how lib/crc/ supports arch-optimized code. First, instead of the arch-optimized CRC code being in arch/$(SRCARCH)/lib/, it will now be in lib/crc/$(SRCARCH)/. Second, the API functions (e.g. crc32c()), arch-optimized functions (e.g. crc32c_arch()), and generic functions (e.g. crc32c_base()) will now be part of a single module for each CRC type, allowing better inlining and dead code elimination. The second change is made possible by the first.
As an example, consider CONFIG_CRC32=m on x86. We'll now have just crc32.ko instead of both crc32-x86.ko and crc32.ko. The two modules were already coupled together and always both got loaded together via direct symbol dependency, so the separation provided no benefit.
Note: later I'd like to apply the same design to lib/crypto/ too, where often the API functions are out-of-line so this will work even better. In those cases, for each algorithm we currently have 3 modules all coupled together, e.g. libsha256.ko, libsha256-generic.ko, and sha256-x86.ko. We should have just one, inline things properly, and rely on the compiler's dead code elimination to decide the inclusion of the generic code instead of manually setting it via kconfig.
Having arch-specific code outside arch/ was somewhat controversial when Zinc proposed it back in 2018. But I don't think the concerns are warranted. It's better from a technical perspective, as it enables the improvements mentioned above. This model is already successfully used in other places in the kernel such as lib/raid6/. The community of each architecture still remains free to work on the code, even if it's not in arch/. At the time there was also a desire to put the library code in the same files as the old-school crypto API, but that was a mistake; now that the library is separate, that's no longer a constraint either.
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-3-ebiggers@kernel.org Link: https://lore.kernel.org/r/20250612054514.142728-1-ebiggers@kernel.org Link: https://lore.kernel.org/r/20250621012221.4351-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
89a51591 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: Move files into lib/crc/
Move all CRC files in lib/ into a subdirectory lib/crc/ to keep them from cluttering up the main lib/ directory.
Reviewed-by: "Martin K. Petersen" <martin.petersen
lib/crc: Move files into lib/crc/
Move all CRC files in lib/ into a subdirectory lib/crc/ to keep them from cluttering up the main lib/ directory.
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|