| #
a354b8de |
| 19-May-2026 |
Dmitry Antipov <dmantipov@yandex.ru> |
riscv: add platform-specific double word shifts for riscv32
Add riscv32-specific '__ashldi3()', '__ashrdi3()', and '__lshrdi3()'. Initially it was intended to fix the following link error observed
riscv: add platform-specific double word shifts for riscv32
Add riscv32-specific '__ashldi3()', '__ashrdi3()', and '__lshrdi3()'. Initially it was intended to fix the following link error observed when building EFI-enabled kernel with CONFIG_EFI_STUB=y and CONFIG_EFI_GENERIC_STUB=y:
riscv32-linux-gnu-ld: ./drivers/firmware/efi/libstub/lib-cmdline.stub.o: in function `__efistub_.L49': __efistub_cmdline.c:(.init.text+0x1f2): undefined reference to `__efistub___ashldi3' riscv32-linux-gnu-ld: __efistub_cmdline.c:(.init.text+0x202): undefined reference to `__efistub___lshrdi3'
Reported at [1] trying to build https://patchew.org/linux/20260212164413.889625-1-dmantipov@yandex.ru, tested with 'qemu-system-riscv32 -M virt' only.
Link: https://lore.kernel.org/20260519172259.908980-7-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603041925.KLKqpK6N-lkp@intel.com [1] Suggested-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Charlie Jenkins <thecharlesjenkins@gmail.com> Assisted-by: Gemini:gemini-3.1-pro-preview sashiko Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andriy Shevchenko <andriy.shevchenko@intel.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
feff82eb |
| 24-Apr-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley: "There is one significant change outside arch/riscv in this
Merge tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley: "There is one significant change outside arch/riscv in this pull request: the addition of a set of KUnit tests for strlen(), strnlen(), and strrchr().
Otherwise, the most notable changes are to add some RISC-V-specific string function implementations, to remove XIP kernel support, to add hardware error exception handling, and to optimize our runtime unaligned access speed testing.
A few comments on the motivation for removing XIP support. It's been broken in the RISC-V kernel for months. The code is not easy to maintain. Furthermore, for XIP support to truly be useful for RISC-V, we think that compile-time feature switches would need to be added for many of the RISC-V ISA features and microarchitectural properties that are currently implemented with runtime patching. No one has stepped forward to take responsibility for that work, so many of us think it's best to remove it until clear use cases and champions emerge.
Summary:
- Add Kunit correctness testing and microbenchmarks for strlen(), strnlen(), and strrchr()
- Add RISC-V-specific strnlen(), strchr(), strrchr() implementations
- Add hardware error exception handling
- Clean up and optimize our unaligned access probe code
- Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()
- Remove XIP kernel support
- Warn when addresses outside the vmemmap range are passed to vmemmap_populate()
- Update the ACPI FADT revision check to warn if it's not at least ACPI v6.6, which is when key RISC-V-specific tables were added to the specification
- Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC, etc.
- Make kaslr_offset() a static inline function, since there's no need for it to show up in the symbol table
- Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve kdump support
- Add Makefile cleanup rule for vdso_cfi copied source files, and add a .gitignore for the build artifacts in that directory
- Remove some redundant ifdefs that check Kconfig macros
- Add missing SPDX license tag to the CFI selftest
- Simplify UTS_MACHINE assignment in the RISC-V Makefile
- Clarify some unclear comments and remove some superfluous comments
- Fix various English typos across the RISC-V codebase"
* tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits) riscv: Remove support for XIP kernel riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() riscv: Split out compare_unaligned_access() riscv: Reuse measure_cycles() in check_vector_unaligned_access() riscv: Split out measure_cycles() for reuse riscv: Clean up & optimize unaligned scalar access probe riscv: lib: add strrchr() implementation riscv: lib: add strchr() implementation riscv: lib: add strnlen() implementation lib/string_kunit: extend benchmarks to strnlen() and chr searches lib/string_kunit: add performance benchmark for strlen() lib/string_kunit: add correctness test for strrchr() lib/string_kunit: add correctness test for strnlen() lib/string_kunit: add correctness test for strlen() riscv: vdso_cfi: Add .gitignore for build artifacts riscv: vdso_cfi: Add clean rule for copied sources riscv: enable HAVE_IOREMAP_PROT riscv: mm: WARN_ON() for bad addresses in vmemmap_populate() riscv: acpi: update FADT revision check to 6.6 riscv: add hardware error trap handler support ...
show more ...
|
| #
bef64bcb |
| 04-Apr-2026 |
Feng Jiang <jiangfeng@kylinos.cn> |
riscv: lib: add strrchr() implementation
Add an assembly implementation of strrchr() for RISC-V.
This implementation minimizes instruction count and avoids unnecessary memory access to the stack. T
riscv: lib: add strrchr() implementation
Add an assembly implementation of strrchr() for RISC-V.
This implementation minimizes instruction count and avoids unnecessary memory access to the stack. The performance benefits are most visible on small workloads (1-16 bytes) where the architectural savings in function overhead outweigh the execution time of the scan loop.
Benchmark results (QEMU TCG, rv64): Length | Original (MB/s) | Optimized (MB/s) | Improvement -------|-----------------|------------------|------------ 1 B | 20 | 21 | +5.0% 7 B | 111 | 120 | +8.1% 16 B | 189 | 199 | +5.3% 512 B | 361 | 382 | +5.8% 4096 B | 388 | 391 | +0.8%
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn> Tested-by: Joel Stanley <joel@jms.id.au> Link: https://patch.msgid.link/20260130025018.172925-9-jiangfeng@kylinos.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
show more ...
|
| #
adf54213 |
| 04-Apr-2026 |
Feng Jiang <jiangfeng@kylinos.cn> |
riscv: lib: add strchr() implementation
Add an assembly implementation of strchr() for RISC-V.
By eliminating stack frame management (prologue/epilogue) and optimizing the function entries, the ass
riscv: lib: add strchr() implementation
Add an assembly implementation of strchr() for RISC-V.
By eliminating stack frame management (prologue/epilogue) and optimizing the function entries, the assembly version provides significant relative gains for short strings where the fixed overhead of the C function is most prominent. As string length increases, performance converges with the generic C implementation.
Benchmark results (QEMU TCG, rv64): Length | Original (MB/s) | Optimized (MB/s) | Improvement -------|-----------------|------------------|------------ 1 B | 21 | 22 | +4.8% 7 B | 113 | 121 | +7.1% 16 B | 195 | 202 | +3.6% 512 B | 376 | 389 | +3.5% 4096 B | 394 | 393 | -0.3%
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn> Tested-by: Joel Stanley <joel@jms.id.au> Link: https://patch.msgid.link/20260130025018.172925-8-jiangfeng@kylinos.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
show more ...
|
| #
5ba15d41 |
| 04-Apr-2026 |
Feng Jiang <jiangfeng@kylinos.cn> |
riscv: lib: add strnlen() implementation
Add an optimized strnlen() implementation for RISC-V. This version includes a generic optimization and a Zbb-powered optimization using the 'orc.b' instructi
riscv: lib: add strnlen() implementation
Add an optimized strnlen() implementation for RISC-V. This version includes a generic optimization and a Zbb-powered optimization using the 'orc.b' instruction, derived from the strlen() implementation.
Benchmark results (QEMU TCG, rv64): Length | Original (MB/s) | Optimized (MB/s) | Improvement -------|-----------------|------------------|------------ 16 B | 179 | 309 | +72.6% 512 B | 347 | 1562 | +350.1% 4096 B | 356 | 1878 | +427.5%
Suggested-by: Qingfang Deng <dqfext@gmail.com> Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn> Link: https://patch.msgid.link/20260130025018.172925-7-jiangfeng@kylinos.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
show more ...
|
| #
5265d55b |
| 27-Mar-2026 |
Christoph Hellwig <hch@lst.de> |
riscv: move the XOR code to lib/raid/
Move the optimized XOR into lib/raid and include it it in xor.ko instead of always building it into the main kernel image.
Link: https://lkml.kernel.org/r/2026
riscv: move the XOR code to lib/raid/
Move the optimized XOR into lib/raid and include it it in xor.ko instead of always building it into the main kernel image.
Link: https://lkml.kernel.org/r/20260327061704.3707577-17-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Tested-by: Eric Biggers <ebiggers@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
13150742 |
| 29-Jul-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants.
These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...
show more ...
|
| #
b5943815 |
| 07-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
Move the riscv-optimized CRC code from arch/riscv/lib/crc* into its new location in lib/crc/riscv/, and wire it up in the new way. This new
lib/crc: riscv: Migrate optimized CRC code into lib/crc/
Move the riscv-optimized CRC code from arch/riscv/lib/crc* into its new location in lib/crc/riscv/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/".
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
daed4fcf |
| 19-Jun-2025 |
Eric Biggers <ebiggers@kernel.org> |
lib/crypto: riscv: Move arch/riscv/lib/crypto/ into lib/crypto/
Move the contents of arch/riscv/lib/crypto/ into lib/crypto/riscv/.
The new code organization makes a lot more sense for how this cod
lib/crypto: riscv: Move arch/riscv/lib/crypto/ into lib/crypto/
Move the contents of arch/riscv/lib/crypto/ into lib/crypto/riscv/.
The new code organization makes a lot more sense for how this code actually works and is developed. In particular, it makes it possible to build each algorithm as a single module, with better inlining and dead code elimination. For a more detailed explanation, see the patchset which did this for the CRC library code: https://lore.kernel.org/r/20250607200454.73587-1-ebiggers@kernel.org/. Also see the patchset which did this for SHA-512: https://lore.kernel.org/linux-crypto/20250616014019.415791-1-ebiggers@kernel.org/
This is just a preparatory commit, which does the move to get the files into their new location but keeps them building the same way as before. Later commits will make the actual improvements to the way the arch-optimized code is integrated for each algorithm.
Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20250619191908.134235-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
d604877c |
| 22-Apr-2025 |
Eric Biggers <ebiggers@google.com> |
crypto: riscv - move library functions to arch/riscv/lib/crypto/
Continue disentangling the crypto library functions from the generic crypto infrastructure by moving the riscv ChaCha library functio
crypto: riscv - move library functions to arch/riscv/lib/crypto/
Continue disentangling the crypto library functions from the generic crypto infrastructure by moving the riscv ChaCha library functions into a new directory arch/riscv/lib/crypto/ that does not depend on CRYPTO. This mirrors the distinction between crypto/ and lib/crypto/.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
| #
511484fa |
| 16-Feb-2025 |
Eric Biggers <ebiggers@google.com> |
riscv/crc64: add Zbc optimized CRC64 functions
Wire up crc64_be_arch() and crc64_nvme_arch() for 64-bit RISC-V using crc-clmul-template.h. This greatly improves the performance of these CRCs on Zbc
riscv/crc64: add Zbc optimized CRC64 functions
Wire up crc64_be_arch() and crc64_nvme_arch() for 64-bit RISC-V using crc-clmul-template.h. This greatly improves the performance of these CRCs on Zbc-capable CPUs in 64-bit kernels.
These optimized CRC64 functions are not yet supported in 32-bit kernels, since crc-clmul-template.h assumes that the CRC fits in an unsigned long. That implementation limitation could be addressed, but it would add a fair bit of complexity, so it has been omitted for now.
Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
| #
8bf3e178 |
| 16-Feb-2025 |
Eric Biggers <ebiggers@google.com> |
riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
Wire up crc_t10dif_arch() for RISC-V using crc-clmul-template.h. This greatly improves CRC-T10DIF performance on Zbc-capable CPUs.
Tested-by
riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
Wire up crc_t10dif_arch() for RISC-V using crc-clmul-template.h. This greatly improves CRC-T10DIF performance on Zbc-capable CPUs.
Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
| #
72acff5f |
| 16-Feb-2025 |
Eric Biggers <ebiggers@google.com> |
riscv/crc32: reimplement the CRC32 functions using new template
Delete the previous Zbc optimized CRC32 code, and re-implement it using the new template. The new implementation is more optimized an
riscv/crc32: reimplement the CRC32 functions using new template
Delete the previous Zbc optimized CRC32 code, and re-implement it using the new template. The new implementation is more optimized and shares more code among CRC variants.
Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250216225530.306980-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
| #
d36cebe0 |
| 02-Dec-2024 |
Eric Biggers <ebiggers@google.com> |
lib/crc32: improve support for arch-specific overrides
Currently the CRC32 library functions are defined as weak symbols, and the arm64 and riscv architectures override them.
This method of arch-sp
lib/crc32: improve support for arch-specific overrides
Currently the CRC32 library functions are defined as weak symbols, and the arm64 and riscv architectures override them.
This method of arch-specific overrides has the limitation that it only works when both the base and arch code is built-in. Also, it makes the arch-specific code be silently not used if it is accidentally built with lib-y instead of obj-y; unfortunately the RISC-V code does this.
This commit reorganizes the code to have explicit *_arch() functions that are called when they are enabled, similar to how some of the crypto library code works (e.g. chacha_crypt() calls chacha_crypt_arch()).
Make the existing kconfig choice for the CRC32 implementation also control whether the arch-optimized implementation (if one is available) is enabled or not. Make it enabled by default if CRC32 is also enabled.
The result is that arch-optimized CRC32 library functions will be included automatically when appropriate, but it is now possible to disable them. They can also now be built as a loadable module if the CRC32 library functions happen to be used only by loadable modules, in which case the arch and base CRC32 modules will be automatically loaded via direct symbol dependency when appropriate.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
| #
58ff5371 |
| 01-Aug-2024 |
Samuel Holland <samuel.holland@sifive.com> |
riscv: Omit optimized string routines when using KASAN
The optimized string routines are implemented in assembly, so they are not instrumented for use with KASAN. Fall back to the C version of the r
riscv: Omit optimized string routines when using KASAN
The optimized string routines are implemented in assembly, so they are not instrumented for use with KASAN. Fall back to the C version of the routines in order to improve KASAN coverage. This fixes the kasan_strings() unit test.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240801033725.28816-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
a43fe27d |
| 21-Jun-2024 |
Xiao Wang <xiao.w.wang@intel.com> |
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication) instructions can be used to accelerate CRC calculations. Currently, the crc32 is the most
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication) instructions can be used to accelerate CRC calculations. Currently, the crc32 is the most widely used crc function inside kernel, so this patch focuses on the optimization of just the crc32 APIs.
Compared with the current table-lookup based optimization, Zbc based optimization can also achieve large stride during CRC calculation loop, meantime, it avoids the memory access latency of the table-lookup based implementation and it reduces memory footprint.
If Zbc feature is not supported in a runtime environment, then the table-lookup based implementation would serve as fallback via alternative mechanism.
By inspecting the vmlinux built by gcc v12.2.0 with default optimization level (-O2), we can see below instruction count change for each 8-byte stride in the CRC32 loop:
rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13) rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)
The compile target CPU is little endian, extra effort is needed for byte swapping for the crc32_be API, thus, the instruction count change is not as significant as that in the *_le cases.
This patch is tested on QEMU VM with the kernel CRC32 selftest for both rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1) with Zbc extension shows 65% and 125% performance improvement respectively on crc32_test() and crc32c_test().
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
c6408684 |
| 18-Jan-2024 |
Palmer Dabbelt <palmer@rivosinc.com> |
Merge patch series "riscv: Add fine-tuned checksum functions"
Charlie Jenkins <charlie@rivosinc.com> says:
Each architecture generally implements fine-tuned checksum functions to leverage the instr
Merge patch series "riscv: Add fine-tuned checksum functions"
Charlie Jenkins <charlie@rivosinc.com> says:
Each architecture generally implements fine-tuned checksum functions to leverage the instruction set. This patch adds the main checksum functions that are used in networking. Tested on QEMU, this series allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
This patch takes heavy use of the Zbb extension using alternatives patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I have not ran anything on actual riscv hardware. My performance testing has been limited to inspecting the assembly, running the algorithms on x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is possible to read 64 bits at a time on compatible hardware, the bottleneck becomes the clean up and setup code so loading 32 bits at a time is actually faster.
* b4-shazam-merge: kunit: Add tests for csum_ipv6_magic and ip_fast_csum riscv: Add checksum library riscv: Add checksum header riscv: Add static key for misaligned accesses asm-generic: Improve csum_fold
Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-0-1c50de5f2167@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
a04c192e |
| 09-Jan-2024 |
Charlie Jenkins <charlie@rivosinc.com> |
riscv: Add checksum library
Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit will load from the buffer in groups of 32 bits, and when compiled for 64-bit will load in groups of 6
riscv: Add checksum library
Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit will load from the buffer in groups of 32 bits, and when compiled for 64-bit will load in groups of 64 bits.
Additionally provide riscv optimized implementation of csum_ipv6_magic.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Xiao Wang <xiao.w.wang@intel.com> Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-4-1c50de5f2167@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
c2a658d4 |
| 15-Jan-2024 |
Andy Chiu <andy.chiu@sifive.com> |
riscv: lib: vectorize copy_to_user/copy_from_user
This patch utilizes Vector to perform copy_to_user/copy_from_user. If Vector is available and the size of copy is large enough for Vector to perform
riscv: lib: vectorize copy_to_user/copy_from_user
This patch utilizes Vector to perform copy_to_user/copy_from_user. If Vector is available and the size of copy is large enough for Vector to perform better than scalar, then direct the kernel to do Vector copies for userspace. Though the best programming practice for users is to reduce the copy, this provides a faster variant when copies are inevitable.
The optimal size for using Vector, copy_to_user_thres, is only a heuristic for now. We can add DT parsing if people feel the need of customizing it.
The exception fixup code of the __asm_vector_usercopy must fallback to the scalar one because accessing user pages might fault, and must be sleepable. Current kernel-mode Vector does not allow tasks to be preemptible, so we must disactivate Vector and perform a scalar fallback in such case.
The original implementation of Vector operations comes from https://github.com/sifive/sifive-libc, which we agree to contribute to Linux kernel.
Co-developed-by: Jerry Shih <jerry.shih@sifive.com> Signed-off-by: Jerry Shih <jerry.shih@sifive.com> Co-developed-by: Nick Knight <nick.knight@sifive.com> Signed-off-by: Nick Knight <nick.knight@sifive.com> Suggested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240115055929.4736-6-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
c5674d00 |
| 15-Jan-2024 |
Greentime Hu <greentime.hu@sifive.com> |
riscv: Add vector extension XOR implementation
This patch adds support for vector optimized XOR and it is tested in qemu.
Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com> Signed-off-by: Han
riscv: Add vector extension XOR implementation
This patch adds support for vector optimized XOR and it is tested in qemu.
Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com> Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240115055929.4736-4-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
ab0f7746 |
| 24-Feb-2023 |
Andrew Jones <ajones@ventanamicro.com> |
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where 20 are branches. clear_page(), with Zicboz and a 64 byte block size, takes 169 to
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where 20 are branches. clear_page(), with Zicboz and a 64 byte block size, takes 169 total instructions, where 4 are branches and 33 are nops. Even though the block size is a variable, thanks to alternatives, we can still implement a Duff device without having to do any preliminary calculations. This is achieved by using the alternatives' cpufeature value (the upper 16 bits of patch_id). The value used is the maximum zicboz block size order accepted at the patch site. This enables us to stop patching / unrolling when 4K bytes have been zeroed (we would loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to only loop a few times and larger block sizes to not loop at all. Since cbo.zero doesn't take an offset, we also need an 'add' after each instruction, making the loop body 112 to 160 bytes. Hopefully this is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
56e0790c |
| 13-Jan-2023 |
Heiko Stuebner <heiko.stuebner@vrull.eu> |
RISC-V: add infrastructure to allow different str* implementations
Depending on supported extensions on specific RISC-V cores, optimized str* functions might make sense.
This adds basic infrastruct
RISC-V: add infrastructure to allow different str* implementations
Depending on supported extensions on specific RISC-V cores, optimized str* functions might make sense.
This adds basic infrastructure to allow patching the function calls via alternatives later on.
The Linux kernel provides standard implementations for string functions but when architectures want to extend them, they need to provide their own.
The added generic string functions are done in assembler (taken from disassembling the main-kernel functions for now) to allow us to control the used registers and extend them with optimized variants.
This doesn't override the compiler's use of builtin replacements. So still first of all the compiler will select if a builtin will be better suitable i.e. for known strings. For all regular cases we will want to later select possible optimized variants and in the worst case fall back to the generic implemention added with this change.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230113212301.3534711-2-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
ee55ff80 |
| 17-Dec-2020 |
Guo Ren <guoren@linux.alibaba.com> |
riscv: Add support for function error injection
Inspired by the commit 42d038c4fb00 ("arm64: Add support for function error injection"), this patch supports function error injection for riscv.
This
riscv: Add support for function error injection
Inspired by the commit 42d038c4fb00 ("arm64: Add support for function error injection"), this patch supports function error injection for riscv.
This patch mainly support two functions: one is regs_set_return_value() which is used to overwrite the return value; the another function is override_function_with_return() which is to override the probed function returning and jump to its caller.
Test log: cd /sys/kernel/debug/fail_function echo sys_clone > inject echo 100 > probability echo 1 > interval ls / [ 313.176875] FAULT_INJECTION: forcing a failure. [ 313.176875] name fail_function, interval 1, probability 100, space 0, times 1 [ 313.184357] CPU: 0 PID: 87 Comm: sh Not tainted 5.8.0-rc5-00007-g6a758cc #117 [ 313.187616] Call Trace: [ 313.189100] [<ffffffe0002036b6>] walk_stackframe+0x0/0xc2 [ 313.191626] [<ffffffe00020395c>] show_stack+0x40/0x4c [ 313.193927] [<ffffffe000556c60>] dump_stack+0x7c/0x96 [ 313.194795] [<ffffffe0005522e8>] should_fail+0x140/0x142 [ 313.195923] [<ffffffe000299ffc>] fei_kprobe_handler+0x2c/0x5a [ 313.197687] [<ffffffe0009e2ec4>] kprobe_breakpoint_handler+0xb4/0x18a [ 313.200054] [<ffffffe00020357e>] do_trap_break+0x36/0xca [ 313.202147] [<ffffffe000201bca>] ret_from_exception+0x0/0xc [ 313.204556] [<ffffffe000201bbc>] ret_from_syscall+0x0/0x2 -sh: can't fork: Invalid argument
Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|
| #
04091d6c |
| 30-Nov-2020 |
Nylon Chen <nylon7@andestech.com> |
riscv: provide memmove implementation
The memmove used by the kernel feature like KASAN.
Signed-off-by: Nick Hu <nickhu@andestech.com> Signed-off-by: Nick Hu <nick650823@gmail.com> Signed-off-by: N
riscv: provide memmove implementation
The memmove used by the kernel feature like KASAN.
Signed-off-by: Nick Hu <nickhu@andestech.com> Signed-off-by: Nick Hu <nick650823@gmail.com> Signed-off-by: Nylon Chen <nylon7@andestech.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|
| #
11129e8e |
| 07-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
riscv: use memcpy based uaccess for nommu again
This reverts commit adccfb1a805ea84d2db38eb53032533279bdaa97.
Now that the generic uaccess by mempcy code handles unaligned addresses the generic cod
riscv: use memcpy based uaccess for nommu again
This reverts commit adccfb1a805ea84d2db38eb53032533279bdaa97.
Now that the generic uaccess by mempcy code handles unaligned addresses the generic code can be used for all RISC-V CPUs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|