History log of /linux/lib/crc/x86/crc32.h (Results 1 – 5 of 5)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.17-rc2
# 8d2b0853 11-Aug-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Updating drm-misc-fixes to the state of v6.17-rc1. Begins a new release
cycle.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.17-rc1
# a578dd09 29-Jul-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

- Reorganize the architecture-optimized CRC code

It now lives in l

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

- Reorganize the architecture-optimized CRC code

It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/,
and it is no longer artificially split into separate generic and arch
modules. This allows better inlining and dead code elimination

The generic CRC code is also no longer exported, simplifying the API.
(This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/,
which can be found in the "Crypto library updates" pull request)

- Improve crc32c() performance on newer x86_64 CPUs on long messages by
enabling the VPCLMULQDQ optimized code

- Simplify the crypto_shash wrappers for crc32_le() and crc32c()

Register just one shash algorithm for each that uses the (fully
optimized) library functions, instead of unnecessarily providing
direct access to the generic CRC code

- Remove unused and obsolete drivers for hardware CRC engines

- Remove CRC-32 combination functions that are no longer used

- Add kerneldoc for crc32_le(), crc32_be(), and crc32c()

- Convert the crc32() macro to an inline function

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits)
lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
lib/crc: x86: Reorganize crc-pclmul static_call initialization
lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst
lib/crc: crc32: Change crc32() from macro to inline function and remove cast
nvmem: layouts: Switch from crc32() to crc32_le()
lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c()
lib/crc: Explicitly include <linux/export.h>
lib/crc: Remove ARCH_HAS_* kconfig symbols
lib/crc: x86: Migrate optimized CRC code into lib/crc/
lib/crc: sparc: Migrate optimized CRC code into lib/crc/
lib/crc: s390: Migrate optimized CRC code into lib/crc/
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
lib/crc: powerpc: Migrate optimized CRC code into lib/crc/
lib/crc: mips: Migrate optimized CRC code into lib/crc/
lib/crc: loongarch: Migrate optimized CRC code into lib/crc/
lib/crc: arm64: Migrate optimized CRC code into lib/crc/
lib/crc: arm: Migrate optimized CRC code into lib/crc/
lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/
lib/crc: Move files into lib/crc/
lib/crc32: Remove unused combination support
...

show more ...


Revision tags: v6.16, v6.16-rc7
# 118da22e 20-Jul-2025 Eric Biggers <ebiggers@kernel.org>

lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial

Improve crc32c() performance on lengths >= 512 bytes by using
crc32_lsb_vpclmul_avx512() instead of crc32c_x86_3way(), when the C

lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial

Improve crc32c() performance on lengths >= 512 bytes by using
crc32_lsb_vpclmul_avx512() instead of crc32c_x86_3way(), when the CPU
supports VPCLMULQDQ and has a "good" implementation of AVX-512. For now
that means AMD Zen 4 and later, and Intel Sapphire Rapids and later.
Pass crc32_lsb_vpclmul_avx512() the table of constants needed to make it
use the CRC-32C polynomial.

Rationale: VPCLMULQDQ performance has improved on newer CPUs, making
crc32_lsb_vpclmul_avx512() faster than crc32c_x86_3way(), even though
crc32_lsb_vpclmul_avx512() is designed for generic 32-bit CRCs and does
not utilize x86_64's dedicated CRC-32C instructions.

Performance results for len=4096 using crc_kunit:

CPU Before (MB/s) After (MB/s)
====================== ============= ============
AMD Zen 4 (Genoa) 19868 28618
AMD Zen 5 (Ryzen AI 9 365) 24080 46940
AMD Zen 5 (Turin) 29566 58468
Intel Sapphire Rapids 22340 73794
Intel Emerald Rapids 24696 78666

Performance results for len=512 using crc_kunit:

CPU Before (MB/s) After (MB/s)
====================== ============= ============
AMD Zen 4 (Genoa) 7251 7758
AMD Zen 5 (Ryzen AI 9 365) 17481 19135
AMD Zen 5 (Turin) 21332 25424
Intel Sapphire Rapids 18886 29312
Intel Emerald Rapids 19675 29045

That being said, in the above benchmarks the ZMM registers are "warm",
so they don't quite tell the whole story. While significantly improved
from older Intel CPUs, Intel still has ~2000 ns of ZMM warm-up time
where 512-bit instructions execute 4 times more slowly than they
normally do. In contrast, AMD does better and has virtually zero ZMM
warm-up time (at most ~60 ns). Thus, while this change is always
beneficial on AMD, strictly speaking there are cases in which it is not
beneficial on Intel, e.g. a small number of 512-byte messages with
"cold" ZMM registers. But typically, it is beneficial even on Intel.

Note that on AMD Zen 3--5, crc32c() performance could be further
improved with implementations that interleave crc32q and VPCLMULQDQ
instructions. Unfortunately, it appears that a different such
implementation would be optimal on *each* of these microarchitectures.
Such improvements are left for future work. This commit just improves
the way that we choose the implementations we already have.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250719224938.126512-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

show more ...


# 110628e5 20-Jul-2025 Eric Biggers <ebiggers@kernel.org>

lib/crc: x86: Reorganize crc-pclmul static_call initialization

Reorganize the crc-pclmul static_call initialization to place more of
the logic in the *_mod_init_arch() functions instead of in the
IN

lib/crc: x86: Reorganize crc-pclmul static_call initialization

Reorganize the crc-pclmul static_call initialization to place more of
the logic in the *_mod_init_arch() functions instead of in the
INIT_CRC_PCLMUL macro. This provides the flexibility to do more than a
single static_call update for each CPU feature check. Right away,
optimize crc64_mod_init_arch() to check the CPU features just once
instead of twice, doing both the crc64_msb and crc64_lsb static_call
updates together. A later commit will also use this to initialize an
additional static_key when crc32_lsb_vpclmul_avx512() is enabled.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250719224938.126512-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

show more ...


Revision tags: v6.16-rc6, v6.16-rc5, v6.16-rc4, v6.16-rc3, v6.16-rc2, v6.16-rc1
# b10749d8 07-Jun-2025 Eric Biggers <ebiggers@kernel.org>

lib/crc: x86: Migrate optimized CRC code into lib/crc/

Move the x86-optimized CRC code from arch/x86/lib/crc* into its new
location in lib/crc/x86/, and wire it up in the new way. This new way
of o

lib/crc: x86: Migrate optimized CRC code into lib/crc/

Move the x86-optimized CRC code from arch/x86/lib/crc* into its new
location in lib/crc/x86/, and wire it up in the new way. This new way
of organizing the CRC code eliminates the need to artificially split the
code for each CRC variant into separate arch and generic modules,
enabling better inlining and dead code elimination. For more details,
see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".

Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20250607200454.73587-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

show more ...