History log of /freebsd/sys/amd64/include/pcpu.h (Results 1 – 25 of 212)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 2730f429 03-Jul-2024 Ryan Libby <rlibby@FreeBSD.org>

amd64 pcpu: fix clobbers, suppress warnings, and clean up

These changes mostly apply to the !__SEG_GS section, which is no longer
the normal compilation path. They're made to be consistent with cha

amd64 pcpu: fix clobbers, suppress warnings, and clean up

These changes mostly apply to the !__SEG_GS section, which is no longer
the normal compilation path. They're made to be consistent with changes
to i386.

- Add missing cc clobber to __PCPU_ADD (which is currently unused).
- Allow the compiler the opportunity to marginally improve code
generation from __PCPU_PTR by letting it figure out how to do the add
(also removing the addition fixes a missing cc clobber).
- Quiet gcc -Warray-bounds by using constant operands instead of bogus
memory references.
- Remove the struct __s __s temporaries, just cast through the type.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45827

show more ...


Revision tags: release/14.1.0, release/13.3.0, release/14.0.0
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 2596008a 08-Jul-2023 John Baldwin <jhb@FreeBSD.org>

amd64 pcpu.h: Add missing 'do' from do-while loop around __PCPU_SET.

Reported by: mjg
Diagnosed by: jrtc27


# 2329393c 07-Jul-2023 John Baldwin <jhb@FreeBSD.org>

amd64: Use __seg_gs to implement per-CPU data accesses.

This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:. The
original

amd64: Use __seg_gs to implement per-CPU data accesses.

This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:. The
original motivation for this is that it quiets verbose warnings from
GCC 12. However, this version is also much easier to read and
allows the compiler to generate better code (e.g. the compiler can
use a GS: memory operand directly in other instructions such as IMUL
and CMP rather than always MOVing to a temporary register).

The one caveat is that the current approach is very inefficient at -O0
since the compiler expects to load the 0 base offset from a global
variable instead of assuming it is 0 (even with the const).

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40647

show more ...


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix

show more ...


# 42f722e7 01-May-2023 Konstantin Belousov <kib@FreeBSD.org>

amd64: store pcids pmap data in pcpu zone

This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also
it removes fa

amd64: store pcids pmap data in pcpu zone

This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.

Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890

show more ...


Revision tags: release/13.2.0, release/12.4.0
# cde70e31 11-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG

A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems. The workaround is

amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG

A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems. The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.

Reviewed by: alc (previous version)
Discussed with: emaste, markj
Tested by: karels
PR: 261169, 266145
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770

show more ...


# 45ac7755 21-Dec-2022 Konstantin Belousov <kib@FreeBSD.org>

amd64: identify small cores

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770


# f6fada5e 13-Jun-2022 Brooks Davis <brooks@FreeBSD.org>

amd64: -m32 support for machine/pcpu.h

Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header and should not be required, but
procsta

amd64: -m32 support for machine/pcpu.h

Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header and should not be required, but
procstat's zfs support includes this with _KERNEL defined.

Reviewed by: jhb, imp

show more ...


Revision tags: release/13.1.0
# 3d6f4411 12-Apr-2022 John Baldwin <jhb@FreeBSD.org>

Remove checks for <sys/cdefs.h> being included.

These files no longer depend on the macros required when these checks
were added.

PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential

Remove checks for <sys/cdefs.h> being included.

These files no longer depend on the macros required when these checks
were added.

PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34804

show more ...


# 5ab33279 12-Apr-2022 John Baldwin <jhb@FreeBSD.org>

Remove checks for __GNUCLIKE___TYPEOF assuming it is always true.

All supported compilers (modern versions of GCC and clang) support
this.

PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Dif

Remove checks for __GNUCLIKE___TYPEOF assuming it is always true.

All supported compilers (modern versions of GCC and clang) support
this.

PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34798

show more ...


# 56f5947a 12-Apr-2022 John Baldwin <jhb@FreeBSD.org>

Remove checks for __GNUCLIKE_ASM assuming it is always true.

All supported compilers (modern versions of GCC and clang) support
this.

Many places didn't have an #else so would just silently do the

Remove checks for __GNUCLIKE_ASM assuming it is always true.

All supported compilers (modern versions of GCC and clang) support
this.

Many places didn't have an #else so would just silently do the wrong
thing. Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.

PR: 263102 (exp-run)
Reviewed by: brooks, imp
Differential Revision: https://reviews.freebsd.org/D34797

show more ...


Revision tags: release/12.3.0, release/13.0.0
# d22883d7 10-Mar-2021 Jason A. Harmening <jah@FreeBSD.org>

Remove PCPU_INC

e4b8deb22227 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and acc

Remove PCPU_INC

e4b8deb22227 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308

show more ...


# e4b8deb2 25-Feb-2021 Jason A. Harmening <jah@FreeBSD.org>

amd64 pmap: convert to counter(9), add PV and pagetable page counts

This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
w

amd64 pmap: convert to counter(9), add PV and pagetable page counts

This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.

The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28923

show more ...


Revision tags: release/12.2.0
# c7aa572c 31-Jul-2020 Glen Barber <gjb@FreeBSD.org>

MFH

Sponsored by: Rubicon Communications, LLC (netgate.com)


# 3ec7e169 18-Jul-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64 pmap: microoptimize local shootdowns for PCID PTI configurations

When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non

amd64 pmap: microoptimize local shootdowns for PCID PTI configurations

When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).

In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB. In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing. This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is
automatically reset to all-1s on return to usermode.

Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6. This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.

But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID. To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().

Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25483

show more ...


# e2c0e292 16-Jul-2020 Glen Barber <gjb@FreeBSD.org>

MFH

Sponsored by: Rubicon Communications, LLC (netgate.com)


# dc43978a 14-Jul-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64: allow parallel shootdown IPIs

Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently fr

amd64: allow parallel shootdown IPIs

Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.

Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510

show more ...


Revision tags: release/11.4.0
# 44e86fbd 13-Feb-2020 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r357662 through r357854.


# 2318ed25 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

amd64: provide custom zpcpu set/add/sub routines

Note that clobbers are highly overzealous, can be cleaned up later.


# fb886947 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

amd64: store per-cpu allocations subtracted by __pcpu

This eliminates a runtime subtraction from counter_u64_add.

before:
mov 0x4f00ed(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4>
s

amd64: store per-cpu allocations subtracted by __pcpu

This eliminates a runtime subtraction from counter_u64_add.

before:
mov 0x4f00ed(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4>
sub 0x808ff6(%rip),%rax # 0xffffffff80f1a698 <__pcpu>
addq $0x1,%gs:(%rax)

after:
mov 0x4f02fd(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4>
addq $0x1,%gs:(%rax)

Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D23570

show more ...


# a7af4a3e 12-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

amd64: move GDT into PCPU area.

Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22302


# 98158c75 10-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

amd64: move common_tss into pcpu.

This saves some memory, around 256K I think. It removes some code,
e.g. KPTI does not need to specially map common_tss anymore. Also,
common_tss become domain-loc

amd64: move common_tss into pcpu.

This saves some memory, around 256K I think. It removes some code,
e.g. KPTI does not need to specially map common_tss anymore. Also,
common_tss become domain-local.

Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22231

show more ...


Revision tags: release/12.1.0
# c5c3ba6b 03-Sep-2019 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r351317 through r351731.


# a2a0f906 29-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Centralize __pcpu definitions.

Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcp

Centralize __pcpu definitions.

Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.

To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.

Reported and tested by: lwhsu, bcran
Reviewed by: imp, markj (previous version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21418

show more ...


123456789