History log of /linux/arch/x86/include/asm/uaccess_64.h (Results 51 – 75 of 677)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.3
# e1f2750e 20-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: remove 'zerorest' argument from __copy_user_nocache()

Every caller passes in zero, meaning they don't want any partial copy to
zero the remainder of the destination buffer.

Which is just as we

x86: remove 'zerorest' argument from __copy_user_nocache()

Every caller passes in zero, meaning they don't want any partial copy to
zero the remainder of the destination buffer.

Which is just as well, because the implementation of that function
didn't actually even look at that argument, and wasn't even aware it
existed, although some misleading comments did mention it still.

The 'zerorest' thing is a historical artifact of how "copy_from_user()"
worked, in that it would zero the rest of the kernel buffer that it
copied into.

That zeroing still exists, but it's long since been moved to generic
code, and the raw architecture-specific code doesn't do it. See
_copy_from_user() in lib/usercopy.c for this all.

However, while __copy_user_nocache() shares some history and superficial
other similarities with copy_from_user(), it is in many ways also very
different.

In particular, while the code makes it *look* similar to the generic
user copy functions that can copy both to and from user space, and take
faults on both reads and writes as a result, __copy_user_nocache() does
no such thing at all.

__copy_user_nocache() always copies to kernel space, and will never take
a page fault on the destination. What *can* happen, though, is that the
non-temporal stores take a machine check because one of the use cases is
for writing to stable memory, and any memory errors would then take
synchronous faults.

So __copy_user_nocache() does look a lot like copy_from_user(), but has
faulting behavior that is more akin to our old copy_in_user() (which no
longer exists, but copied from user space to user space and could fault
on both source and destination).

And it very much does not have the "zero the end of the destination
buffer", since a problem with the destination buffer is very possibly
the very source of the partial copy.

So this whole thing was just a confusing historical artifact from having
shared some code with a completely different function with completely
different use cases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 427fda2c 17-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: improve on the non-rep 'copy_user' function

The old 'copy_user_generic_unrolled' function was oddly implemented for
largely historical reasons: it had been largely based on the uncached
copy ca

x86: improve on the non-rep 'copy_user' function

The old 'copy_user_generic_unrolled' function was oddly implemented for
largely historical reasons: it had been largely based on the uncached
copy case, which has some other concerns.

For example, the __copy_user_nocache() function uses 'movnti' for the
destination stores, and those want the destination to be aligned. In
contrast, the regular copy function doesn't really care, and trying to
align things only complicates matters.

Also, like the clear_user function, the copy function had some odd
handling of the repeat counts, complicating the exception handling for
no really good reason. So as with clear_user, just write it to keep all
the byte counts in the %rcx register, exactly like the 'rep movs'
functionality that this replaces.

Unlike a real 'rep movs', we do allow for this to trash a few temporary
registers to not have to unnecessarily save/restore registers on the
stack.

And like the clearing case, rename this to what it now clearly is:
'rep_movs_alternative', and make it one coherent function, so that it
shows up as such in profiles (instead of the odd split between
"copy_user_generic_unrolled" and "copy_user_short_string", the latter of
which was not about strings at all, and which was shared with the
uncached case).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v6.3-rc7
# 8c9b6a88 16-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: improve on the non-rep 'clear_user' function

The old version was oddly written to have the repeat count in multiple
registers. So instead of taking advantage of %rax being zero, it had
some su

x86: improve on the non-rep 'clear_user' function

The old version was oddly written to have the repeat count in multiple
registers. So instead of taking advantage of %rax being zero, it had
some sub-counts in it. All just for a "single word clearing" loop,
which isn't even efficient to begin with.

So get rid of those games, and just keep all the state in the same
registers we got it in (and that we should return things in). That not
only makes this act much more like 'rep stos' (which this function is
replacing), but makes it much easier to actually do the obvious loop
unrolling.

Also rename the function from the now nonsensical 'clear_user_original'
to what it now clearly is: 'rep_stos_alternative'.

End result: if we don't have a fast 'rep stosb', at least we can have a
fast fallback for it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 577e6a7f 16-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: inline the 'rep movs' in user copies for the FSRM case

This does the same thing for the user copies as commit 0db7058e8e23
("x86/clear_user: Make it faster") did for clear_user(). In other
wor

x86: inline the 'rep movs' in user copies for the FSRM case

This does the same thing for the user copies as commit 0db7058e8e23
("x86/clear_user: Make it faster") did for clear_user(). In other
words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set,
avoiding the function call entirely.

In order to do that, it makes the calling convention for the out-of-line
case ("copy_user_generic_unrolled") match the 'rep movs' calling
convention, although it does also end up clobbering a number of
additional registers.

Also, to simplify code sharing in the low-level assembly with the
__copy_user_nocache() function (that uses the normal C calling
convention), we end up with a kind of mixed return value for the
low-level asm code: it will return the result in both %rcx (to work as
an alternative for the 'rep movs' case), _and_ in %rax (for the nocache
case).

We could avoid this by wrapping __copy_user_nocache() callers in an
inline asm, but since the cost is just an extra register copy, it's
probably not worth it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 3639a535 15-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: move stac/clac from user copy routines into callers

This is preparatory work for inlining the 'rep movs' case, but also a
cleanup. The __copy_user_nocache() function was mis-used by the rdma
c

x86: move stac/clac from user copy routines into callers

This is preparatory work for inlining the 'rep movs' case, but also a
cleanup. The __copy_user_nocache() function was mis-used by the rdma
code to do uncached kernel copies that don't actually want user copies
at all, and as a result doesn't want the stac/clac either.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# d2c95f9d 15-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: don't use REP_GOOD or ERMS for user memory clearing

The modern target to use is FSRS (Fast Short REP STOS), and the other
cases should only be used for bigger areas (ie mainly things like page

x86: don't use REP_GOOD or ERMS for user memory clearing

The modern target to use is FSRS (Fast Short REP STOS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).

Note! This changes the conditional for the inlining from FSRM ("fast
short rep movs") to FSRS ("fast short rep stos").

We'll have a separate fixup for AMD microarchitectures that have a good
'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM
was there first).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# adfcf423 15-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86: don't use REP_GOOD or ERMS for user memory copies

The modern target to use is FSRM (Fast Short REP MOVS), and the other
cases should only be used for bigger areas (ie mainly things like page
cl

x86: don't use REP_GOOD or ERMS for user memory copies

The modern target to use is FSRM (Fast Short REP MOVS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3
# b9da86e3 16-Mar-2023 Ira Weiny <ira.weiny@intel.com>

x86/uaccess: Remove memcpy_page_flushcache()

Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

In addition, memcpy_page_

x86/uaccess: Remove memcpy_page_flushcache()

Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().

In addition, memcpy_page_flushcache() uses the deprecated
kmap_atomic().

Remove the unused x86 memcpy_page_flushcache() implementation and
also get rid of one more kmap_atomic() user.

[ dhansen: tweak changelog ]

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20221230-kmap-x86-v1-1-15f1ecccab50%40intel.com

show more ...


Revision tags: v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1
# 4f2c0a4a 14-Dec-2022 Nick Terrell <terrelln@fb.com>

Merge branch 'main' into zstd-linus


# e291c116 12-Dec-2022 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.2 merge window.


Revision tags: v6.1, v6.1-rc8, v6.1-rc7
# 29583dfc 21-Nov-2022 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next-fixes

Backmerging to update drm-misc-next-fixes for the final phase
of the release cycle.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.1-rc6
# 002c6ca7 14-Nov-2022 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catch up on 6.1-rc cycle in order to solve the intel_backlight
conflict on linux-next.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.1-rc5, v6.1-rc4
# d93618da 04-Nov-2022 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Needed to bring in v6.1-rc1 which contains commit f683b9d61319 ("i915: use the VMA iterator")
which is needed for series https://patchwork.freedesktop.org/s

Merge drm/drm-next into drm-intel-gt-next

Needed to bring in v6.1-rc1 which contains commit f683b9d61319 ("i915: use the VMA iterator")
which is needed for series https://patchwork.freedesktop.org/series/110083/ .

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


Revision tags: v6.1-rc3, v6.1-rc2
# 14e77332 22-Oct-2022 Nick Terrell <terrelln@fb.com>

Merge branch 'main' into zstd-next


# 1aca5ce0 20-Oct-2022 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 008f05a7 19-Oct-2022 Mark Brown <broonie@kernel.org>

ASoC: jz4752b: Capture fixes

Merge series from Siarhei Volkau <lis8215@gmail.com>:

The patchset fixes:
- Line In path stays powered off during capturing or
bypass to mixer.
- incorrectly repre

ASoC: jz4752b: Capture fixes

Merge series from Siarhei Volkau <lis8215@gmail.com>:

The patchset fixes:
- Line In path stays powered off during capturing or
bypass to mixer.
- incorrectly represented dB values in alsamixer, et al.
- incorrect represented Capture input selector in alsamixer
in Playback tab.
- wrong control selected as Capture Master

show more ...


# a140a6a2 18-Oct-2022 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-next into drm-misc-next

Let's kick-off this release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


# c29a017f 17-Oct-2022 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.1-rc1' into next

Merge with mainline to bring in the latest changes to twl4030 driver.


# 8048b835 17-Oct-2022 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-hotfixes-stable


Revision tags: v6.1-rc1
# dfd2d876 10-Oct-2022 Johannes Berg <johannes.berg@intel.com>

Merge remote-tracking branch 'wireless/main' into wireless-next

Pull in wireless/main content since some new code would
otherwise conflict with it.

Signed-off-by: Johannes Berg <johannes.berg@intel

Merge remote-tracking branch 'wireless/main' into wireless-next

Pull in wireless/main content since some new code would
otherwise conflict with it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

show more ...


# 7db99f01 04-Oct-2022 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

- Print the CPU number at segfault time.

The number printed

Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu updates from Borislav Petkov:

- Print the CPU number at segfault time.

The number printed is not always accurate (preemption is enabled at
that time) but the print string contains "likely" and after a lot of
back'n'forth on this, this was the consensus that was reached. See
thread at [1].

- After a *lot* of testing and polishing, finally the clear_user()
improvements to inline REP; STOSB by default

Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1]

* tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Print likely CPU at segfault time
x86/clear_user: Make it faster

show more ...


Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1
# 0db7058e 24-May-2022 Borislav Petkov <bp@suse.de>

x86/clear_user: Make it faster

Based on a patch by Mark Hemment <markhemm@googlemail.com> and
incorporating very sane suggestions from Linus.

The point here is to have the default case with FSRM -

x86/clear_user: Make it faster

Based on a patch by Mark Hemment <markhemm@googlemail.com> and
incorporating very sane suggestions from Linus.

The point here is to have the default case with FSRM - which is supposed
to be the majority of x86 hw out there - if not now then soon - be
directly inlined into the instruction stream so that no function call
overhead is taking place.

Drop the early clobbers from the @size and @addr operands as those are
not needed anymore since we have single instruction alternatives.

The benchmarks I ran would show very small improvements and a PF
benchmark would even show weird things like slowdowns with higher core
counts.

So for a ~6m running the git test suite, the function gets called under
700K times, all from padzero():

<...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900
<...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160
<...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850
<...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900
...

which is around 1%-ish of the total time and which is consistent with
the benchmark numbers.

So Mel gave me the idea to simply measure how fast the function becomes.
I.e.:

start = rdtsc_ordered();
ret = __clear_user(to, n);
end = rdtsc_ordered();

Computing the mean average of all the samples collected during the test
suite run then shows some improvement:

clear_user_original:
Amean: 9219.71 (Sum: 6340154910, samples: 687674)

fsrm:
Amean: 8030.63 (Sum: 5522277720, samples: 687652)

That's on Zen3.

The situation looks a lot more confusing on Intel:

Icelake:

clear_user_original:
Amean: 19679.4 (Sum: 13652560764, samples: 693750)
Amean: 19743.7 (Sum: 13693470604, samples: 693562)

(I ran it twice just to be sure.)

ERMS:
Amean: 20374.3 (Sum: 13910601024, samples: 682752)
Amean: 20453.7 (Sum: 14186223606, samples: 693576)

FSRM:
Amean: 20458.2 (Sum: 13918381386, sample s: 680331)

The original microbenchmark which people were complaining about:

for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 | grep copied
32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s
37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s
62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s
60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s
53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s
55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s
55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s
54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s
50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s
58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s
68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s

Now the same thing with smaller buffers:

for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 | grep copied
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s

is also not conclusive because it all depends on the buffer sizes,
their alignments and when the microcode detects that cachelines can be
aggregated properly and copied in bigger sizes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com

show more ...


Revision tags: v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1
# 762f99f4 15-Jan-2022 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 5.17 merge window.


Revision tags: v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5
# 86329873 09-Dec-2021 Daniel Lezcano <daniel.lezcano@linaro.org>

Merge branch 'reset/of-get-optional-exclusive' of git://git.pengutronix.de/pza/linux into timers/drivers/next

"Add optional variant of of_reset_control_get_exclusive(). If the
requested reset is not

Merge branch 'reset/of-get-optional-exclusive' of git://git.pengutronix.de/pza/linux into timers/drivers/next

"Add optional variant of of_reset_control_get_exclusive(). If the
requested reset is not specified in the device tree, this function
returns NULL instead of an error."

This dependency is needed for the Generic Timer Module (a.k.a OSTM)
support for RZ/G2L.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

show more ...


# 5d8dfaa7 09-Dec-2021 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v5.15' into next

Sync up with the mainline to get the latest APIs and DT bindings.


12345678910>>...28