zstd: Import upstream v1.5.7In addition to keeping the kernel's copy of zstd up to date, this updatewas requested by Intel to expose upstream's APIs that allow QAT to acceleratethe LZ match findi
zstd: Import upstream v1.5.7In addition to keeping the kernel's copy of zstd up to date, this updatewas requested by Intel to expose upstream's APIs that allow QAT to acceleratethe LZ match finding stage of Zstd.This patch is imported from the upstream tag v1.5.7-kernel [0], which is signedwith upstream's signing key EF8FE99528B52FFD [1]. It was imported from upstreamusing this command: export ZSTD=/path/to/repo/zstd/ export LINUX=/path/to/repo/linux/ cd "$ZSTD/contrib/linux-kernel" git checkout v1.5.7-kernel make import LINUX="$LINUX"This patch has been tested on x86-64, and has been boot tested witha zstd compressed kernel & initramfs on i386 and aarch64. I benchmarkedthe patch on x86-64 with gcc-14.2.1 on an Intel i9-9900K by measruing theperformance of compressed filesystem reads and writes.Component, Level, Size delta, C. time delta, D. time deltaBtrfs , 1, +0.00%, -6.1%, +1.4%Btrfs , 3, +0.00%, -9.8%, +3.0%Btrfs , 5, +0.00%, +1.7%, +1.4%Btrfs , 7, +0.00%, -1.9%, +2.7%Btrfs , 9, +0.00%, -3.4%, +3.7%Btrfs , 15, +0.00%, -0.3%, +3.6%SquashFS , 1, +0.00%, N/A, +1.9%The major changes that impact the kernel use cases for each version are:v1.5.7: https://github.com/facebook/zstd/releases/tag/v1.5.7* Add zstd_compress_sequences_and_literals() for use by Intel's QAT driver to implement Zstd compression acceleration in the kernel.* Fix an underflow bug in 32-bit builds that can cause data corruption when processing more than 4GB of data with a single `ZSTD_CCtx` object, when an input crosses the 4GB boundry. I don't believe this impacts any current kernel use cases, because the `ZSTD_CCtx` is typically reconstructed between compressions.* Levels 1-4 see 5-10% compression speed improvements for inputs smaller than 128KB.v1.5.6: https://github.com/facebook/zstd/releases/tag/v1.5.6* Improved compression ratio for the highest compression levels. I don't expect these see much use however, due to their slow speeds.v1.5.5: https://github.com/facebook/zstd/releases/tag/v1.5.5* Fix a rare corruption bug that can trigger on levels 13 and above.* Improve compression speed of levels 5-11 on incompressible data.v1.5.4: https://github.com/facebook/zstd/releases/tag/v1.5.4* Improve copmression speed of levels 5-11 on ARM.* Improve dictionary compression speed.Signed-off-by: Nick Terrell <terrelln@fb.com>
show more ...
lib: zstd: Backport fix for in-place decompressionBackport the relevant part of upstream commit 5b266196 [0].This fixes in-place decompression for x86-64 kernel decompression. Ituses a bound of
lib: zstd: Backport fix for in-place decompressionBackport the relevant part of upstream commit 5b266196 [0].This fixes in-place decompression for x86-64 kernel decompression. Ituses a bound of 131072 + (uncompressed_size >> 8), which can be violatedafter upstream commit 6a7ede3d [1], as zstd can use part of the outputbuffer as temporary storage, and without this patch needs a bound of~262144.The fix is for zstd to detect that the input and output buffers overlap,so that zstd knows it can't use the overlapping portion of the outputbuffer as tempoary storage. If the margin is not large enough, this willensure that zstd will fail the decompression, rather than overwritingpart of the input data, and causing corruption.This fix has been landed upstream and is in release v1.5.4. That commitalso adds unit and fuzz tests to verify that the margin we use isrespected, and correct. That means that the fix is well tested upstream.I have not been able to reproduce the potential bug in x86-64 kerneldecompression locally, nor have I recieved reports of failures todecompress the kernel. It is possible that compression saves enoughspace to make it very hard for the issue to appear.I've boot tested the zstd compressed kernel on x86-64 and i386 with thispatch, which uses in-place decompression, and sanity tested zstd compressionin btrfs / squashfs to make sure that we don't see any issues, but otheruses of zstd shouldn't be affected, because they don't use in-placedecompression.Thanks to Vasily Gorbik <gor@linux.ibm.com> for debugging a related issueon s390, which was triggered by the same commit, but was a bug in how__decompress() was called [2]. And to Sasha Levin <sashal@kernel.org>for the CC alerting me of the issue.[0] https://github.com/facebook/zstd/commit/5b266196a41e6a15e21bd4f0eeab43b938db1d90[1] https://github.com/facebook/zstd/commit/6a7ede3dfccbf3e0a5928b4224a039c260dcff72[2] https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hoursCC: Vasily Gorbik <gor@linux.ibm.com>CC: Heiko Carstens <hca@linux.ibm.com>CC: Sasha Levin <sashal@kernel.org>CC: Yann Collet <cyan@fb.com>Signed-off-by: Nick Terrell <terrelln@fb.com>
lib: zstd: Fix -Wstringop-overflow warningFix the following -Wstringop-overflow warning when building with GCC 11+:lib/zstd/decompress/huf_decompress.c: In function ‘HUF_readDTableX2_wksp’:lib/z
lib: zstd: Fix -Wstringop-overflow warningFix the following -Wstringop-overflow warning when building with GCC 11+:lib/zstd/decompress/huf_decompress.c: In function ‘HUF_readDTableX2_wksp’:lib/zstd/decompress/huf_decompress.c:700:5: warning: ‘HUF_fillDTableX2.constprop’ accessing 624 bytes in a region of size 52 [-Wstringop-overflow=] 700 | HUF_fillDTableX2(dt, maxTableLog, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 701 | wksp->sortedSymbol, sizeOfSort, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 702 | wksp->rankStart0, wksp->rankVal, maxW, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 703 | tableLog+1, | ~~~~~~~~~~~ 704 | wksp->calleeWksp, sizeof(wksp->calleeWksp) / sizeof(U32)); |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~lib/zstd/decompress/huf_decompress.c:700:5: note: referencing argument 6 of type ‘U32 (*)[13]’ {aka ‘unsigned int (*)[13]’}lib/zstd/decompress/huf_decompress.c:571:13: note: in a call to function ‘HUF_fillDTableX2.constprop’ 571 | static void HUF_fillDTableX2(HUF_DEltX2* DTable, const U32 targetLog, | ^~~~~~~~~~~~~~~~by using pointer notation instead of array notation.This is one of the last remaining warnings to be fixed before globallyenabling -Wstringop-overflow.Co-developed-by: Gustavo A. R. Silva <gustavoars@kernel.org>Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>Cc: Nick Terrell <terrelln@fb.com>Signed-off-by: Kees Cook <keescook@chromium.org>Signed-off-by: Nick Terrell <terrelln@fb.com>
zstd: import usptream v1.5.2Updates the kernel's zstd library to v1.5.2, the latest zstd release.The upstream tag it is updated to is `v1.5.2-kernel`, which containsseveral cherry-picked commits
zstd: import usptream v1.5.2Updates the kernel's zstd library to v1.5.2, the latest zstd release.The upstream tag it is updated to is `v1.5.2-kernel`, which containsseveral cherry-picked commits on top of the v1.5.2 release which arerequired for the kernel update. I will create this tag once the PR isready to merge, until then reference the temporary upstream branch`v1.5.2-kernel-cherrypicks`.I plan to submit this patch as part of the v6.2 merge window.I've done basic build testing & testing on x86-64, i386, and aarch64.I'm merging these patches into my `zstd-next` branch, which is pulledinto `linux-next` for further testing.I've benchmarked BtrFS with zstd compression on a x86-64 machine, andsaw these results. Decompression speed is a small win across the board.The lower compression levels 1-4 see both compression speed andcompression ratio wins. The higher compression levels see a smallcompression speed loss and about neutral ratio. I expect the lowercompression levels to be used much more heavily than the highcompression levels, so this should be a net win.Level CTime DTime Ratio1 -2.95% -1.1% -0.7%3 -3.5% -1.2% -0.5%5 +3.7% -1.0% +0.0%7 +3.2% -0.9% +0.0%9 -4.3% -0.8% +0.1%Signed-off-by: Nick Terrell <terrelln@fb.com>
lib: zstd: Fix comment typoThe double `when' is duplicated in line 999, remove one.Signed-off-by: Xin Gao <gaoxin@cdjrlc.com>Signed-off-by: Nick Terrell <terrelln@fb.com>
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logicalA new warning in clang warns that there is an instance where booleanexpressions are being used with bitwise operators instead of
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logicalA new warning in clang warns that there is an instance where booleanexpressions are being used with bitwise operators instead of logicalones:lib/zstd/decompress/huf_decompress.c:890:25: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical] (BIT_reloadDStreamFast(&bitD1) == BIT_DStream_unfinished) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~zstd does this frequently to help with performance, as logical operatorshave branches whereas bitwise ones do not.To fix this warning in other cases, the expressions were placed onseparate lines with the '&=' operator; however, this particular instancewas moved away from that so that it could be surrounded by LIKELY, whichis a macro for __builtin_expect(), to help with a performanceregression, according to upstream zstd pull #1973.Aside from switching to logical operators, which is likely undesirablein this instance, or disabling the warning outright, the solution iscasting one of the expressions to an integer type to make it clear toclang that the author knows what they are doing. Add a cast to U32 tosilence the warning. The first U32 cast is to silence an instance of-Wshorten-64-to-32 because __builtin_expect() returns long so it cannotbe moved.Link: https://github.com/ClangBuiltLinux/linux/issues/1486Link: https://github.com/facebook/zstd/pull/1973Reported-by: Nick Desaulniers <ndesaulniers@google.com>Signed-off-by: Nathan Chancellor <nathan@kernel.org>Signed-off-by: Nick Terrell <terrelln@fb.com>
lib: zstd: Upgrade to latest upstream zstd version 1.4.10Upgrade to the latest upstream zstd version 1.4.10.This patch is 100% generated from upstream zstd commit 20821a46f412 [0].This patch is
lib: zstd: Upgrade to latest upstream zstd version 1.4.10Upgrade to the latest upstream zstd version 1.4.10.This patch is 100% generated from upstream zstd commit 20821a46f412 [0].This patch is very large because it is transitioning from the customkernel zstd to using upstream directly. The new zstd follows upstreamsfile structure which is different. Future update patches will be muchsmaller because they will only contain the changes from one upstreamzstd release.As an aid for review I've created a commit [1] that shows the diffbetween upstream zstd as-is (which doesn't compile), and the zstdcode imported in this patch. The verion of zstd in this patch isgenerated from upstream with changes applied by automation to replaceupstreams libc dependencies, remove unnecessary portability macros,replace `/**` comments with `/*` comments, and use the kernel's xxhashinstead of bundling it.The benefits of this patch are as follows:1. Using upstream directly with automated script to generate kernel code. This allows us to update the kernel every upstream release, so the kernel gets the latest bug fixes and performance improvements, and doesn't get 3 years out of date again. The automation and the translated code are tested every upstream commit to ensure it continues to work.2. Upgrades from a custom zstd based on 1.3.1 to 1.4.10, getting 3 years of performance improvements and bug fixes. On x86_64 I've measured 15% faster BtrFS and SquashFS decompression+read speeds, 35% faster kernel decompression, and 30% faster ZRAM decompression+read speeds.3. Zstd-1.4.10 supports negative compression levels, which allow zstd to match or subsume lzo's performance.4. Maintains the same kernel-specific wrapper API, so no callers have to be modified with zstd version updates.One concern that was brought up was stack usage. Upstream zstd hadalready removed most of its heavy stack usage functions, but I justremoved the last functions that allocate arrays on the stack. I'vemeasured the high water mark for both compression and decompressionbefore and after this patch. Decompression is approximately neutral,using about 1.2KB of stack space. Compression levels up to 3 regressedfrom 1.4KB -> 1.6KB, and higher compression levels regressed from 1.5KB-> 2KB. We've added unit tests upstream to prevent further regression.I believe that this is a reasonable increase, and if it does end upcausing problems, this commit can be cleanly reverted, because it onlytouches zstd.I chose the bulk update instead of replaying upstream commits becausethere have been ~3500 upstream commits since the 1.3.1 release, zstdwasn't ready to be used in the kernel as-is before a month ago, and notall upstream zstd commits build. The bulk update preserves bisectablitybecause bugs can be bisected to the zstd version update. At that pointthe update can be reverted, and we can work with upstream to find andfix the bug.Note that upstream zstd release 1.4.10 doesn't exist yet. I have cut astaging branch at 20821a46f412 [0] and will apply any changes requestedto the staging branch. Once we're ready to merge this update I will cuta zstd release at the commit we merge, so we have a known zstd releasein the kernel.The implementation of the kernel API is contained inzstd_compress_module.c and zstd_decompress_module.c.[0] https://github.com/facebook/zstd/commit/20821a46f4122f9abd7c7b245d28162dde8129c9[1] https://github.com/terrelln/linux/commit/e0fa481d0e3df26918da0a13749740a1f6777574Signed-off-by: Nick Terrell <terrelln@fb.com>Tested By: Paul Jones <paul@pauljones.id.au>Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>