Revision tags: v6.17-rc2 |
|
#
8d2b0853 |
| 11-Aug-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Updating drm-misc-fixes to the state of v6.17-rc1. Begins a new release cycle.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.17-rc1 |
|
#
a578dd09 |
| 29-Jul-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC code
It now lives in l
Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
- Reorganize the architecture-optimized CRC code
It now lives in lib/crc/$(SRCARCH)/ rather than arch/$(SRCARCH)/lib/, and it is no longer artificially split into separate generic and arch modules. This allows better inlining and dead code elimination
The generic CRC code is also no longer exported, simplifying the API. (This mirrors the similar changes to SHA-1 and SHA-2 in lib/crypto/, which can be found in the "Crypto library updates" pull request)
- Improve crc32c() performance on newer x86_64 CPUs on long messages by enabling the VPCLMULQDQ optimized code
- Simplify the crypto_shash wrappers for crc32_le() and crc32c()
Register just one shash algorithm for each that uses the (fully optimized) library functions, instead of unnecessarily providing direct access to the generic CRC code
- Remove unused and obsolete drivers for hardware CRC engines
- Remove CRC-32 combination functions that are no longer used
- Add kerneldoc for crc32_le(), crc32_be(), and crc32c()
- Convert the crc32() macro to an inline function
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (26 commits) lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial lib/crc: x86: Reorganize crc-pclmul static_call initialization lib/crc: crc64: Add include/linux/crc64.h to kernel-api.rst lib/crc: crc32: Change crc32() from macro to inline function and remove cast nvmem: layouts: Switch from crc32() to crc32_le() lib/crc: crc32: Document crc32_le(), crc32_be(), and crc32c() lib/crc: Explicitly include <linux/export.h> lib/crc: Remove ARCH_HAS_* kconfig symbols lib/crc: x86: Migrate optimized CRC code into lib/crc/ lib/crc: sparc: Migrate optimized CRC code into lib/crc/ lib/crc: s390: Migrate optimized CRC code into lib/crc/ lib/crc: riscv: Migrate optimized CRC code into lib/crc/ lib/crc: powerpc: Migrate optimized CRC code into lib/crc/ lib/crc: mips: Migrate optimized CRC code into lib/crc/ lib/crc: loongarch: Migrate optimized CRC code into lib/crc/ lib/crc: arm64: Migrate optimized CRC code into lib/crc/ lib/crc: arm: Migrate optimized CRC code into lib/crc/ lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/ lib/crc: Move files into lib/crc/ lib/crc32: Remove unused combination support ...
show more ...
|
Revision tags: v6.16, v6.16-rc7 |
|
#
118da22e |
| 20-Jul-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
Improve crc32c() performance on lengths >= 512 bytes by using crc32_lsb_vpclmul_avx512() instead of crc32c_x86_3way(), when the C
lib/crc: x86/crc32c: Enable VPCLMULQDQ optimization where beneficial
Improve crc32c() performance on lengths >= 512 bytes by using crc32_lsb_vpclmul_avx512() instead of crc32c_x86_3way(), when the CPU supports VPCLMULQDQ and has a "good" implementation of AVX-512. For now that means AMD Zen 4 and later, and Intel Sapphire Rapids and later. Pass crc32_lsb_vpclmul_avx512() the table of constants needed to make it use the CRC-32C polynomial.
Rationale: VPCLMULQDQ performance has improved on newer CPUs, making crc32_lsb_vpclmul_avx512() faster than crc32c_x86_3way(), even though crc32_lsb_vpclmul_avx512() is designed for generic 32-bit CRCs and does not utilize x86_64's dedicated CRC-32C instructions.
Performance results for len=4096 using crc_kunit:
CPU Before (MB/s) After (MB/s) ====================== ============= ============ AMD Zen 4 (Genoa) 19868 28618 AMD Zen 5 (Ryzen AI 9 365) 24080 46940 AMD Zen 5 (Turin) 29566 58468 Intel Sapphire Rapids 22340 73794 Intel Emerald Rapids 24696 78666
Performance results for len=512 using crc_kunit:
CPU Before (MB/s) After (MB/s) ====================== ============= ============ AMD Zen 4 (Genoa) 7251 7758 AMD Zen 5 (Ryzen AI 9 365) 17481 19135 AMD Zen 5 (Turin) 21332 25424 Intel Sapphire Rapids 18886 29312 Intel Emerald Rapids 19675 29045
That being said, in the above benchmarks the ZMM registers are "warm", so they don't quite tell the whole story. While significantly improved from older Intel CPUs, Intel still has ~2000 ns of ZMM warm-up time where 512-bit instructions execute 4 times more slowly than they normally do. In contrast, AMD does better and has virtually zero ZMM warm-up time (at most ~60 ns). Thus, while this change is always beneficial on AMD, strictly speaking there are cases in which it is not beneficial on Intel, e.g. a small number of 512-byte messages with "cold" ZMM registers. But typically, it is beneficial even on Intel.
Note that on AMD Zen 3--5, crc32c() performance could be further improved with implementations that interleave crc32q and VPCLMULQDQ instructions. Unfortunately, it appears that a different such implementation would be optimal on *each* of these microarchitectures. Such improvements are left for future work. This commit just improves the way that we choose the implementations we already have.
Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250719224938.126512-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
#
110628e5 |
| 20-Jul-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: x86: Reorganize crc-pclmul static_call initialization
Reorganize the crc-pclmul static_call initialization to place more of the logic in the *_mod_init_arch() functions instead of in the IN
lib/crc: x86: Reorganize crc-pclmul static_call initialization
Reorganize the crc-pclmul static_call initialization to place more of the logic in the *_mod_init_arch() functions instead of in the INIT_CRC_PCLMUL macro. This provides the flexibility to do more than a single static_call update for each CPU feature check. Right away, optimize crc64_mod_init_arch() to check the CPU features just once instead of twice, doing both the crc64_msb and crc64_lsb static_call updates together. A later commit will also use this to initialize an additional static_key when crc32_lsb_vpclmul_avx512() is enabled.
Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250719224938.126512-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
Revision tags: v6.16-rc6, v6.16-rc5, v6.16-rc4, v6.16-rc3, v6.16-rc2, v6.16-rc1 |
|
#
b10749d8 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: x86: Migrate optimized CRC code into lib/crc/
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new location in lib/crc/x86/, and wire it up in the new way. This new way of o
lib/crc: x86: Migrate optimized CRC code into lib/crc/
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new location in lib/crc/x86/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|