uaccess_64.h - OpenGrok history log for /linux/arch/x86/include/asm/uaccess

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.3
# e1f2750e	20-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: remove 'zerorest' argument from __copy_user_nocache() Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as we x86: remove 'zerorest' argument from __copy_user_nocache() Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as well, because the implementation of that function didn't actually even look at that argument, and wasn't even aware it existed, although some misleading comments did mention it still. The 'zerorest' thing is a historical artifact of how "copy_from_user()" worked, in that it would zero the rest of the kernel buffer that it copied into. That zeroing still exists, but it's long since been moved to generic code, and the raw architecture-specific code doesn't do it. See _copy_from_user() in lib/usercopy.c for this all. However, while __copy_user_nocache() shares some history and superficial other similarities with copy_from_user(), it is in many ways also very different. In particular, while the code makes it look similar to the generic user copy functions that can copy both to and from user space, and take faults on both reads and writes as a result, __copy_user_nocache() does no such thing at all. __copy_user_nocache() always copies to kernel space, and will never take a page fault on the destination. What can happen, though, is that the non-temporal stores take a machine check because one of the use cases is for writing to stable memory, and any memory errors would then take synchronous faults. So __copy_user_nocache() does look a lot like copy_from_user(), but has faulting behavior that is more akin to our old copy_in_user() (which no longer exists, but copied from user space to user space and could fault on both source and destination). And it very much does not have the "zero the end of the destination buffer", since a problem with the destination buffer is very possibly the very source of the partial copy. So this whole thing was just a confusing historical artifact from having shared some code with a completely different function with completely different use cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
# 427fda2c	17-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: improve on the non-rep 'copy_user' function The old 'copy_user_generic_unrolled' function was oddly implemented for largely historical reasons: it had been largely based on the uncached copy ca x86: improve on the non-rep 'copy_user' function The old 'copy_user_generic_unrolled' function was oddly implemented for largely historical reasons: it had been largely based on the uncached copy case, which has some other concerns. For example, the __copy_user_nocache() function uses 'movnti' for the destination stores, and those want the destination to be aligned. In contrast, the regular copy function doesn't really care, and trying to align things only complicates matters. Also, like the clear_user function, the copy function had some odd handling of the repeat counts, complicating the exception handling for no really good reason. So as with clear_user, just write it to keep all the byte counts in the %rcx register, exactly like the 'rep movs' functionality that this replaces. Unlike a real 'rep movs', we do allow for this to trash a few temporary registers to not have to unnecessarily save/restore registers on the stack. And like the clearing case, rename this to what it now clearly is: 'rep_movs_alternative', and make it one coherent function, so that it shows up as such in profiles (instead of the odd split between "copy_user_generic_unrolled" and "copy_user_short_string", the latter of which was not about strings at all, and which was shared with the uncached case). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
Revision tags: v6.3-rc7
# 8c9b6a88	16-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: improve on the non-rep 'clear_user' function The old version was oddly written to have the repeat count in multiple registers. So instead of taking advantage of %rax being zero, it had some su x86: improve on the non-rep 'clear_user' function The old version was oddly written to have the repeat count in multiple registers. So instead of taking advantage of %rax being zero, it had some sub-counts in it. All just for a "single word clearing" loop, which isn't even efficient to begin with. So get rid of those games, and just keep all the state in the same registers we got it in (and that we should return things in). That not only makes this act much more like 'rep stos' (which this function is replacing), but makes it much easier to actually do the obvious loop unrolling. Also rename the function from the now nonsensical 'clear_user_original' to what it now clearly is: 'rep_stos_alternative'. End result: if we don't have a fast 'rep stosb', at least we can have a fast fallback for it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
# 577e6a7f	16-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: inline the 'rep movs' in user copies for the FSRM case This does the same thing for the user copies as commit 0db7058e8e23 ("x86/clear_user: Make it faster") did for clear_user(). In other wor x86: inline the 'rep movs' in user copies for the FSRM case This does the same thing for the user copies as commit 0db7058e8e23 ("x86/clear_user: Make it faster") did for clear_user(). In other words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set, avoiding the function call entirely. In order to do that, it makes the calling convention for the out-of-line case ("copy_user_generic_unrolled") match the 'rep movs' calling convention, although it does also end up clobbering a number of additional registers. Also, to simplify code sharing in the low-level assembly with the __copy_user_nocache() function (that uses the normal C calling convention), we end up with a kind of mixed return value for the low-level asm code: it will return the result in both %rcx (to work as an alternative for the 'rep movs' case), _and_ in %rax (for the nocache case). We could avoid this by wrapping __copy_user_nocache() callers in an inline asm, but since the cost is just an extra register copy, it's probably not worth it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
# 3639a535	15-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: move stac/clac from user copy routines into callers This is preparatory work for inlining the 'rep movs' case, but also a cleanup. The __copy_user_nocache() function was mis-used by the rdma c x86: move stac/clac from user copy routines into callers This is preparatory work for inlining the 'rep movs' case, but also a cleanup. The __copy_user_nocache() function was mis-used by the rdma code to do uncached kernel copies that don't actually want user copies at all, and as a result doesn't want the stac/clac either. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
# d2c95f9d	15-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: don't use REP_GOOD or ERMS for user memory clearing The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page x86: don't use REP_GOOD or ERMS for user memory clearing The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Note! This changes the conditional for the inlining from FSRM ("fast short rep movs") to FSRS ("fast short rep stos"). We'll have a separate fixup for AMD microarchitectures that have a good 'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM was there first). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
# adfcf423	15-Apr-2023	Linus Torvalds <torvalds@linux-foundation.org>	x86: don't use REP_GOOD or ERMS for user memory copies The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page cl x86: don't use REP_GOOD or ERMS for user memory copies The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> show more ...
Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3
# b9da86e3	16-Mar-2023	Ira Weiny <ira.weiny@intel.com>	x86/uaccess: Remove memcpy_page_flushcache() Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). In addition, memcpy_page_ x86/uaccess: Remove memcpy_page_flushcache() Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). In addition, memcpy_page_flushcache() uses the deprecated kmap_atomic(). Remove the unused x86 memcpy_page_flushcache() implementation and also get rid of one more kmap_atomic() user. [ dhansen: tweak changelog ] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221230-kmap-x86-v1-1-15f1ecccab50%40intel.com show more ...
Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1
# 4f2c0a4a	14-Dec-2022	Nick Terrell <terrelln@fb.com>	Merge branch 'main' into zstd-linus
# e291c116	12-Dec-2022	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 6.2 merge window.
Revision tags: v6.1, v6.1-rc8, v6.1-rc7
# 29583dfc	21-Nov-2022	Thomas Zimmermann <tzimmermann@suse.de>	Merge drm/drm-next into drm-misc-next-fixes Backmerging to update drm-misc-next-fixes for the final phase of the release cycle. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Revision tags: v6.1-rc6
# 002c6ca7	14-Nov-2022	Rodrigo Vivi <rodrigo.vivi@intel.com>	Merge drm/drm-next into drm-intel-next Catch up on 6.1-rc cycle in order to solve the intel_backlight conflict on linux-next. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Revision tags: v6.1-rc5, v6.1-rc4
# d93618da	04-Nov-2022	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	Merge drm/drm-next into drm-intel-gt-next Needed to bring in v6.1-rc1 which contains commit f683b9d61319 ("i915: use the VMA iterator") which is needed for series https://patchwork.freedesktop.org/s Merge drm/drm-next into drm-intel-gt-next Needed to bring in v6.1-rc1 which contains commit f683b9d61319 ("i915: use the VMA iterator") which is needed for series https://patchwork.freedesktop.org/series/110083/ . Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> show more ...
Revision tags: v6.1-rc3, v6.1-rc2
# 14e77332	22-Oct-2022	Nick Terrell <terrelln@fb.com>	Merge branch 'main' into zstd-next
# 1aca5ce0	20-Oct-2022	Thomas Zimmermann <tzimmermann@suse.de>	Merge drm/drm-fixes into drm-misc-fixes Backmerging to get v6.1-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
# 008f05a7	19-Oct-2022	Mark Brown <broonie@kernel.org>	ASoC: jz4752b: Capture fixes Merge series from Siarhei Volkau <lis8215@gmail.com>: The patchset fixes: - Line In path stays powered off during capturing or bypass to mixer. - incorrectly repre ASoC: jz4752b: Capture fixes Merge series from Siarhei Volkau <lis8215@gmail.com>: The patchset fixes: - Line In path stays powered off during capturing or bypass to mixer. - incorrectly represented dB values in alsamixer, et al. - incorrect represented Capture input selector in alsamixer in Playback tab. - wrong control selected as Capture Master show more ...
# a140a6a2	18-Oct-2022	Maxime Ripard <maxime@cerno.tech>	Merge drm/drm-next into drm-misc-next Let's kick-off this release cycle. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
# c29a017f	17-Oct-2022	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v6.1-rc1' into next Merge with mainline to bring in the latest changes to twl4030 driver.
# 8048b835	17-Oct-2022	Andrew Morton <akpm@linux-foundation.org>	Merge branch 'master' into mm-hotfixes-stable
Revision tags: v6.1-rc1
# dfd2d876	10-Oct-2022	Johannes Berg <johannes.berg@intel.com>	Merge remote-tracking branch 'wireless/main' into wireless-next Pull in wireless/main content since some new code would otherwise conflict with it. Signed-off-by: Johannes Berg <johannes.berg@intel Merge remote-tracking branch 'wireless/main' into wireless-next Pull in wireless/main content since some new code would otherwise conflict with it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> show more ...
# 7db99f01	04-Oct-2022	Linus Torvalds <torvalds@linux-foundation.org>	Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Print the CPU number at segfault time. The number printed Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Print the CPU number at segfault time. The number printed is not always accurate (preemption is enabled at that time) but the print string contains "likely" and after a lot of back'n'forth on this, this was the consensus that was reached. See thread at [1]. - After a lot of testing and polishing, finally the clear_user() improvements to inline REP; STOSB by default Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1] * tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Print likely CPU at segfault time x86/clear_user: Make it faster show more ...
Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1
# 0db7058e	24-May-2022	Borislav Petkov <bp@suse.de>	x86/clear_user: Make it faster Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - x86/clear_user: Make it faster Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - which is supposed to be the majority of x86 hw out there - if not now then soon - be directly inlined into the instruction stream so that no function call overhead is taking place. Drop the early clobbers from the @size and @addr operands as those are not needed anymore since we have single instruction alternatives. The benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running the git test suite, the function gets called under 700K times, all from padzero(): <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. The situation looks a lot more confusing on Intel: Icelake: clear_user_original: Amean: 19679.4 (Sum: 13652560764, samples: 693750) Amean: 19743.7 (Sum: 13693470604, samples: 693562) (I ran it twice just to be sure.) ERMS: Amean: 20374.3 (Sum: 13910601024, samples: 682752) Amean: 20453.7 (Sum: 14186223606, samples: 693576) FSRM: Amean: 20458.2 (Sum: 13918381386, sample s: 680331) The original microbenchmark which people were complaining about: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 \| grep copied 32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s 37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s 62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s 60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s 53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s 55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s 55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s 54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s 50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s 58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s Now the same thing with smaller buffers: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 \| grep copied 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s is also not conclusive because it all depends on the buffer sizes, their alignments and when the microcode detects that cachelines can be aggregated properly and copied in bigger sizes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com show more ...
Revision tags: v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1
# 762f99f4	15-Jan-2022	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 5.17 merge window.
Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5
# 86329873	09-Dec-2021	Daniel Lezcano <daniel.lezcano@linaro.org>	Merge branch 'reset/of-get-optional-exclusive' of git://git.pengutronix.de/pza/linux into timers/drivers/next "Add optional variant of of_reset_control_get_exclusive(). If the requested reset is not Merge branch 'reset/of-get-optional-exclusive' of git://git.pengutronix.de/pza/linux into timers/drivers/next "Add optional variant of of_reset_control_get_exclusive(). If the requested reset is not specified in the device tree, this function returns NULL instead of an error." This dependency is needed for the Generic Timer Module (a.k.a OSTM) support for RZ/G2L. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> show more ...
# 5d8dfaa7	09-Dec-2021	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v5.15' into next Sync up with the mainline to get the latest APIs and DT bindings.
1 234 5 6 7 8 9 10 >>...28