History log of /linux/arch/x86/crypto/Makefile (Results 1 – 25 of 791)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 37b33c68 23-Jan-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to b

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:

- Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
directly accessible via the library API, instead of requiring the
crypto API. This is much simpler and more efficient.

- Convert some users such as ext4 to use the CRC32 library API instead
of the crypto API. More conversions like this will come later.

- Add a KUnit test that tests and benchmarks multiple CRC variants.
Remove older, less-comprehensive tests that are made redundant by
this.

- Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
volunteering to maintain it. I have additional cleanups and
optimizations planned for future cycles.

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
MAINTAINERS: add entry for CRC library
powerpc/crc: delete obsolete crc-vpmsum_test.c
lib/crc32test: delete obsolete crc32test.c
lib/crc16_kunit: delete obsolete crc16_kunit.c
lib/crc_kunit.c: add KUnit test suite for CRC library functions
powerpc/crc-t10dif: expose CRC-T10DIF function through lib
arm64/crc-t10dif: expose CRC-T10DIF function through lib
arm/crc-t10dif: expose CRC-T10DIF function through lib
x86/crc-t10dif: expose CRC-T10DIF function through lib
crypto: crct10dif - expose arch-optimized lib function
lib/crc-t10dif: add support for arch overrides
lib/crc-t10dif: stop wrapping the crypto API
scsi: target: iscsi: switch to using the crc32c library
f2fs: switch to using the crc32 library
jbd2: switch to using the crc32c library
ext4: switch to using the crc32c library
lib/crc32: make crc32c() go directly to lib
bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
x86/crc32: expose CRC32 functions through lib
x86/crc32: update prototype for crc32_pclmul_le_16()
...

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2
# ed4bc981 02-Dec-2024 Eric Biggers <ebiggers@google.com>

x86/crc-t10dif: expose CRC-T10DIF function through lib

Move the x86 CRC-T10DIF assembly code into the lib directory and wire it
up to the library interface. This allows it to be used without going

x86/crc-t10dif: expose CRC-T10DIF function through lib

Move the x86 CRC-T10DIF assembly code into the lib directory and wire it
up to the library interface. This allows it to be used without going
through the crypto API. It remains usable via the crypto API too via
the shash algorithms that use the library interface. Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20241202012056.209768-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>

show more ...


# 55d1ecce 02-Dec-2024 Eric Biggers <ebiggers@google.com>

x86/crc32: expose CRC32 functions through lib

Move the x86 CRC32 assembly code into the lib directory and wire it up
to the library interface. This allows it to be used without going
through the cr

x86/crc32: expose CRC32 functions through lib

Move the x86 CRC32 assembly code into the lib directory and wire it up
to the library interface. This allows it to be used without going
through the crypto API. It remains usable via the crypto API too via
the shash algorithms that use the library interface. Thus all the
arch-specific "shash" code becomes unnecessary and is removed.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>

show more ...


Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# 36ec807b 20-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.12 merge window.


Revision tags: v6.11, v6.11-rc7
# f057b572 06-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'ib/6.11-rc6-matrix-keypad-spitz' into next

Bring in changes removing support for platform data from matrix-keypad
driver.


Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1
# 3daee2e4 16-Jul-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.10' into next

Sync up with mainline to bring in device_for_each_child_node_scoped()
and other newer APIs.


# 66e72a01 29-Jul-2024 Jerome Brunet <jbrunet@baylibre.com>

Merge tag 'v6.11-rc1' into clk-meson-next

Linux 6.11-rc1


# ee057c8c 14-Aug-2024 Steven Rostedt <rostedt@goodmis.org>

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be a

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be able to be mapped at the address specified by the "reserve_mem" command
line parameter.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

show more ...


# c8faf11c 30-Jul-2024 Tejun Heo <tj@kernel.org>

Merge tag 'v6.11-rc1' into for-6.12

Linux 6.11-rc1


# ed7171ff 16-Aug-2024 Lucas De Marchi <lucas.demarchi@intel.com>

Merge drm/drm-next into drm-xe-next

Get drm-xe-next on v6.11-rc2 and synchronized with drm-intel-next for
the display side. This resolves the current conflict for the
enable_display module parameter

Merge drm/drm-next into drm-xe-next

Get drm-xe-next on v6.11-rc2 and synchronized with drm-intel-next for
the display side. This resolves the current conflict for the
enable_display module parameter and allows further pending refactors.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

show more ...


# 5c61f598 12-Aug-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Get drm-misc-next to the state of v6.11-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 3663e2c4 01-Aug-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync with v6.11-rc1 in general, and specifically get the new
BACKLIGHT_POWER_ constants for power states.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4436e6da 02-Aug-2024 Thomas Gleixner <tglx@linutronix.de>

Merge branch 'linus' into x86/mm

Bring x86 and selftests up to date


# a1ff5a7d 30-Jul-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-fixes into drm-misc-fixes

Let's start the new drm-misc-fixes cycle by bringing in 6.11-rc1.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# c434e25b 19-Jul-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
"API:
- Test setkey in no-SIMD context
- Add skcipher speed test f

Merge tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
"API:
- Test setkey in no-SIMD context
- Add skcipher speed test for user-specified algorithm

Algorithms:
- Add x25519 support on ppc64le
- Add VAES and AVX512 / AVX10 optimized AES-GCM on x86
- Remove sm2 algorithm

Drivers:
- Add Allwinner H616 support to sun8i-ce
- Use DMA in stm32
- Add Exynos850 hwrng support to exynos"

* tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits)
hwrng: core - remove (un)register_miscdev()
crypto: lib/mpi - delete unnecessary condition
crypto: testmgr - generate power-of-2 lengths more often
crypto: mxs-dcp - Ensure payload is zero when using key slot
hwrng: Kconfig - Do not enable by default CN10K driver
crypto: starfive - Fix nent assignment in rsa dec
crypto: starfive - Align rsa input data to 32-bit
crypto: qat - fix unintentional re-enabling of error interrupts
crypto: qat - extend scope of lock in adf_cfg_add_key_value_param()
Documentation: qat: fix auto_reset attribute details
crypto: sun8i-ce - add Allwinner H616 support
crypto: sun8i-ce - wrap accesses to descriptor address fields
dt-bindings: crypto: sun8i-ce: Add compatible for H616
hwrng: core - Fix wrong quality calculation at hw rng registration
hwrng: exynos - Enable Exynos850 support
hwrng: exynos - Add SMC based TRNG operation
hwrng: exynos - Implement bus clock control
hwrng: exynos - Use devm_clk_get_enabled() to get the clock
hwrng: exynos - Improve coding style
dt-bindings: rng: Add Exynos850 support to exynos-trng
...

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# e6e758fa 03-Jun-2024 Eric Biggers <ebiggers@google.com>

crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM

Rewrite the AES-NI implementations of AES-GCM, taking advantage of
things I learned while writing the VAES-AVX10 implementations. This is

crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM

Rewrite the AES-NI implementations of AES-GCM, taking advantage of
things I learned while writing the VAES-AVX10 implementations. This is
a complete rewrite that reduces the AES-NI GCM source code size by about
70% and the binary code size by about 95%, while not regressing
performance and in fact improving it significantly in many cases.

The following summarizes the state before this patch:

- The aesni-intel module registered algorithms "generic-gcm-aesni" and
"rfc4106-gcm-aesni" with the crypto API that actually delegated to one
of three underlying implementations according to the CPU capabilities
detected at runtime: AES-NI, AES-NI + AVX, or AES-NI + AVX2.

- The AES-NI + AVX and AES-NI + AVX2 assembly code was in
aesni-intel_avx-x86_64.S and consisted of 2804 lines of source and
257 KB of binary. This massive binary size was not really
appropriate, and depending on the kconfig it could take up over 1% the
size of the entire vmlinux. The main loops did 8 blocks per
iteration. The AVX code minimized the use of carryless multiplication
whereas the AVX2 code did not. The "AVX2" code did not actually use
AVX2; the check for AVX2 was really a check for Intel Haswell or later
to detect support for fast carryless multiplication. The long source
length was caused by factors such as significant code duplication.

- The AES-NI only assembly code was in aesni-intel_asm.S and consisted
of 1501 lines of source and 15 KB of binary. The main loops did 4
blocks per iteration and minimized the use of carryless multiplication
by using Karatsuba multiplication and a multiplication-less reduction.

- The assembly code was contributed in 2010-2013. Maintenance has been
sporadic and most design choices haven't been revisited.

- The assembly function prototypes and the corresponding glue code were
separate from and were not consistent with the new VAES-AVX10 code I
recently added. The older code had several issues such as not
precomputing the GHASH key powers, which hurt performance.

This rewrite achieves the following goals:

- Much shorter source and binary sizes. The assembly source shrinks
from 4300 lines to 1130 lines, and it produces about 9 KB of binary
instead of 272 KB. This is achieved via a better designed AES-GCM
implementation that doesn't excessively unroll the code and instead
prioritizes the parts that really matter. Sharing the C glue code
with the VAES-AVX10 implementations also saves 250 lines of C source.

- Improve performance on most (possibly all) CPUs on which this code
runs, for most (possibly all) message lengths. Benchmark results are
given in Tables 1 and 2 below.

- Use the same function prototypes and glue code as the new VAES-AVX10
algorithms. This fixes some issues with the integration of the
assembly and results in some significant performance improvements,
primarily on short messages. Also, the AVX and non-AVX
implementations are now registered as separate algorithms with the
crypto API, which makes them both testable by the self-tests.

- Keep support for AES-NI without AVX (for Westmere, Silvermont,
Goldmont, and Tremont), but unify the source code with AES-NI + AVX.
Since 256-bit vectors cannot be used without VAES anyway, this is made
feasible by just using the non-VEX coded form of most instructions.

- Use a unified approach where the main loop does 8 blocks per iteration
and uses Karatsuba multiplication to save one pclmulqdq per block but
does not use the multiplication-less reduction. This strikes a good
balance across the range of CPUs on which this code runs.

- Don't spam the kernel log with an informational message on every boot.

The following tables summarize the improvement in AES-GCM throughput on
various CPU microarchitectures as a result of this patch:

Table 1: AES-256-GCM encryption throughput improvement,
CPU microarchitecture vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
-------------------+-------+-------+-------+-------+-------+-------+
Intel Broadwell | 2% | 8% | 11% | 18% | 31% | 26% |
Intel Skylake | 1% | 4% | 7% | 12% | 26% | 19% |
Intel Cascade Lake | 3% | 8% | 10% | 18% | 33% | 24% |
AMD Zen 1 | 6% | 12% | 6% | 15% | 27% | 24% |
AMD Zen 2 | 8% | 13% | 13% | 19% | 26% | 28% |
AMD Zen 3 | 8% | 14% | 13% | 19% | 26% | 25% |

| 300 | 200 | 64 | 63 | 16 |
-------------------+-------+-------+-------+-------+-------+
Intel Broadwell | 35% | 29% | 45% | 55% | 54% |
Intel Skylake | 25% | 19% | 28% | 33% | 27% |
Intel Cascade Lake | 36% | 28% | 39% | 49% | 54% |
AMD Zen 1 | 27% | 22% | 23% | 29% | 26% |
AMD Zen 2 | 32% | 24% | 22% | 25% | 31% |
AMD Zen 3 | 30% | 24% | 22% | 23% | 26% |

Table 2: AES-256-GCM decryption throughput improvement,
CPU microarchitecture vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
-------------------+-------+-------+-------+-------+-------+-------+
Intel Broadwell | 3% | 8% | 11% | 19% | 32% | 28% |
Intel Skylake | 3% | 4% | 7% | 13% | 28% | 27% |
Intel Cascade Lake | 3% | 9% | 11% | 19% | 33% | 28% |
AMD Zen 1 | 15% | 18% | 14% | 20% | 36% | 33% |
AMD Zen 2 | 9% | 16% | 13% | 21% | 26% | 27% |
AMD Zen 3 | 8% | 15% | 12% | 18% | 23% | 23% |

| 300 | 200 | 64 | 63 | 16 |
-------------------+-------+-------+-------+-------+-------+
Intel Broadwell | 36% | 31% | 40% | 51% | 53% |
Intel Skylake | 28% | 21% | 23% | 30% | 30% |
Intel Cascade Lake | 36% | 29% | 36% | 47% | 53% |
AMD Zen 1 | 35% | 31% | 32% | 35% | 36% |
AMD Zen 2 | 31% | 30% | 27% | 38% | 30% |
AMD Zen 3 | 27% | 23% | 24% | 32% | 26% |

The above numbers are percentage improvements in single-thread
throughput, so e.g. an increase from 3000 MB/s to 3300 MB/s would be
listed as 10%. They were collected by directly measuring the Linux
crypto API performance using a custom kernel module. Note that indirect
benchmarks (e.g. 'cryptsetup benchmark' or benchmarking dm-crypt I/O)
include more overhead and won't see quite as much of a difference. All
these benchmarks used an associated data length of 16 bytes. Note that
AES-GCM is almost always used with short associated data lengths.

I didn't test Intel CPUs before Broadwell, AMD CPUs before Zen 1, or
Intel low-power CPUs, as these weren't readily available to me.
However, based on the design of the new code and the available
information about these other CPU microarchitectures, I wouldn't expect
any significant regressions, and there's a good chance performance is
improved just as it is above.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# b06affb1 03-Jun-2024 Eric Biggers <ebiggers@google.com>

crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM

Add implementations of AES-GCM for x86_64 CPUs that support VAES (vector
AES), VPCLMULQDQ (vector carryless multiplication), and e

crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM

Add implementations of AES-GCM for x86_64 CPUs that support VAES (vector
AES), VPCLMULQDQ (vector carryless multiplication), and either AVX512 or
AVX10. There are two implementations, sharing most source code: one
using 256-bit vectors and one using 512-bit vectors. This patch
improves AES-GCM performance by up to 162%; see Tables 1 and 2 below.

I wrote the new AES-GCM assembly code from scratch, focusing on
correctness, performance, code size (both source and binary), and
documenting the source. The new assembly file aes-gcm-avx10-x86_64.S is
about 1200 lines including extensive comments, and it generates less
than 8 KB of binary code. The main loop does 4 vectors at a time, with
the AES and GHASH instructions interleaved. Any remainder is handled
using a simple 1 vector at a time loop, with masking.

Several VAES + AVX512 implementations of AES-GCM exist from Intel,
including one in OpenSSL and one proposed for inclusion in Linux in 2021
(https://lore.kernel.org/linux-crypto/1611386920-28579-6-git-send-email-megha.dey@intel.com/).
These aren't really suitable to be used, though, due to the massive
amount of binary code generated (696 KB for OpenSSL, 200 KB for Linux)
and well as the significantly larger amount of assembly source (4978
lines for OpenSSL, 1788 lines for Linux). Also, Intel's code does not
support 256-bit vectors, which makes it not usable on future
AVX10/256-only CPUs, and also not ideal for certain Intel CPUs that have
downclocking issues. So I ended up starting from scratch. Usually my
much shorter code is actually slightly faster than Intel's AVX512 code,
though it depends on message length and on which of Intel's
implementations is used; for details, see Tables 3 and 4 below.

To facilitate potential integration into other projects, I've
dual-licensed aes-gcm-avx10-x86_64.S under Apache-2.0 OR BSD-2-Clause,
the same as the recently added RISC-V crypto code.

The following two tables summarize the performance improvement over the
existing AES-GCM code in Linux that uses AES-NI and AVX2:

Table 1: AES-256-GCM encryption throughput improvement,
CPU microarchitecture vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake | 42% | 48% | 60% | 62% | 70% | 69% |
Intel Sapphire Rapids | 157% | 145% | 162% | 119% | 96% | 96% |
Intel Emerald Rapids | 156% | 144% | 161% | 115% | 95% | 100% |
AMD Zen 4 | 103% | 89% | 78% | 56% | 54% | 54% |

| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake | 66% | 48% | 49% | 70% | 53% |
Intel Sapphire Rapids | 80% | 60% | 41% | 62% | 38% |
Intel Emerald Rapids | 79% | 60% | 41% | 62% | 38% |
AMD Zen 4 | 51% | 35% | 27% | 32% | 25% |

Table 2: AES-256-GCM decryption throughput improvement,
CPU microarchitecture vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
----------------------+-------+-------+-------+-------+-------+-------+
Intel Ice Lake | 42% | 48% | 59% | 63% | 67% | 71% |
Intel Sapphire Rapids | 159% | 145% | 161% | 125% | 102% | 100% |
Intel Emerald Rapids | 158% | 144% | 161% | 124% | 100% | 103% |
AMD Zen 4 | 110% | 95% | 80% | 59% | 56% | 54% |

| 300 | 200 | 64 | 63 | 16 |
----------------------+-------+-------+-------+-------+-------+
Intel Ice Lake | 67% | 56% | 46% | 70% | 56% |
Intel Sapphire Rapids | 79% | 62% | 39% | 61% | 39% |
Intel Emerald Rapids | 80% | 62% | 40% | 58% | 40% |
AMD Zen 4 | 49% | 36% | 30% | 35% | 28% |

The above numbers are percentage improvements in single-thread
throughput, so e.g. an increase from 4000 MB/s to 6000 MB/s would be
listed as 50%. They were collected by directly measuring the Linux
crypto API performance using a custom kernel module. Note that indirect
benchmarks (e.g. 'cryptsetup benchmark' or benchmarking dm-crypt I/O)
include more overhead and won't see quite as much of a difference. All
these benchmarks used an associated data length of 16 bytes. Note that
AES-GCM is almost always used with short associated data lengths.

The following two tables summarize how the performance of my code
compares with Intel's AVX512 AES-GCM code, both the version that is in
OpenSSL and the version that was proposed for inclusion in Linux.
Neither version exists in Linux currently, but these are alternative
AES-GCM implementations that could be chosen instead of mine. I
collected the following numbers on Emerald Rapids using a userspace
benchmark program that calls the assembly functions directly.

I've also included a comparison with Cloudflare's AES-GCM implementation
from https://boringssl-review.googlesource.com/c/boringssl/+/65987/3.

Table 3: VAES-based AES-256-GCM encryption throughput in MB/s,
implementation name vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
This implementation | 14171 | 12956 | 12318 | 9588 | 7293 | 6449 |
AVX512_Intel_OpenSSL | 14022 | 12467 | 11863 | 9107 | 5891 | 6472 |
AVX512_Intel_Linux | 13954 | 12277 | 11530 | 8712 | 6627 | 5898 |
AVX512_Cloudflare | 12564 | 11050 | 10905 | 8152 | 5345 | 5202 |

| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
This implementation | 4939 | 3688 | 1846 | 1821 | 738 |
AVX512_Intel_OpenSSL | 4629 | 4532 | 2734 | 2332 | 1131 |
AVX512_Intel_Linux | 4035 | 2966 | 1567 | 1330 | 639 |
AVX512_Cloudflare | 3344 | 2485 | 1141 | 1127 | 456 |

Table 4: VAES-based AES-256-GCM decryption throughput in MB/s,
implementation name vs. message length in bytes:

| 16384 | 4096 | 4095 | 1420 | 512 | 500 |
---------------------+-------+-------+-------+-------+-------+-------+
This implementation | 14276 | 13311 | 13007 | 11086 | 8268 | 8086 |
AVX512_Intel_OpenSSL | 14067 | 12620 | 12421 | 9587 | 5954 | 7060 |
AVX512_Intel_Linux | 14116 | 12795 | 11778 | 9269 | 7735 | 6455 |
AVX512_Cloudflare | 13301 | 12018 | 11919 | 9182 | 7189 | 6726 |

| 300 | 200 | 64 | 63 | 16 |
---------------------+-------+-------+-------+-------+-------+
This implementation | 6454 | 5020 | 2635 | 2602 | 1079 |
AVX512_Intel_OpenSSL | 5184 | 5799 | 2957 | 2545 | 1228 |
AVX512_Intel_Linux | 4394 | 4247 | 2235 | 1635 | 922 |
AVX512_Cloudflare | 4289 | 3851 | 1435 | 1417 | 574 |

So, usually my code is actually slightly faster than Intel's code,
though the OpenSSL implementation has a slight edge on messages shorter
than 256 bytes in this microbenchmark. (This also holds true when doing
the same tests on AMD Zen 4.) It can be seen that the large code size
(up to 94x larger!) of the Intel implementations doesn't seem to bring
much benefit, so starting from scratch with much smaller code, as I've
done, seems appropriate. The performance of my code on messages shorter
than 256 bytes could be improved through a limited amount of unrolling,
but it's unclear it would be worth it, given code size considerations
(e.g. caches) that don't get measured in microbenchmarks.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# afeea275 04-Jul-2024 Maxime Ripard <mripard@kernel.org>

Merge drm-misc-next-2024-07-04 into drm-misc-next-fixes

Let's start the drm-misc-next-fixes cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# d754ed28 19-Jun-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to v6.10-rc3.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 89aa02ed 12-Jun-2024 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-xe-next

Needed to get tracing cleanup and add mmio tracing series.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 92815da4 12-Jun-2024 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge remote-tracking branch 'drm-misc/drm-misc-next' into HEAD

Merge drm-misc-next tree into the msm-next tree in order to be able to
use HDMI connector framework for the MSM HDMI driver.


# 375c4d15 27-May-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-next into drm-misc-next

Let's start the new release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# 0c8ea05e 04-Jul-2024 Peter Zijlstra <peterz@infradead.org>

Merge branch 'tip/x86/cpu'

The Lunarlake patches rely on the new VFM stuff.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>


# 594ce0b8 10-Jun-2024 Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Merge topic branches 'clkdev' and 'fixes' into for-linus


# f73a058b 28-May-2024 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

v6.10-rc1 is released, forward from v6.9

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


12345678910>>...32