#
2730f429 |
| 03-Jul-2024 |
Ryan Libby <rlibby@FreeBSD.org> |
amd64 pcpu: fix clobbers, suppress warnings, and clean up
These changes mostly apply to the !__SEG_GS section, which is no longer the normal compilation path. They're made to be consistent with cha
amd64 pcpu: fix clobbers, suppress warnings, and clean up
These changes mostly apply to the !__SEG_GS section, which is no longer the normal compilation path. They're made to be consistent with changes to i386.
- Add missing cc clobber to __PCPU_ADD (which is currently unused). - Allow the compiler the opportunity to marginally improve code generation from __PCPU_PTR by letting it figure out how to do the add (also removing the addition fixes a missing cc clobber). - Quiet gcc -Warray-bounds by using constant operands instead of bogus memory references. - Remove the struct __s __s temporaries, just cast through the type.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D45827
show more ...
|
Revision tags: release/14.1.0, release/13.3.0, release/14.0.0 |
|
#
95ee2897 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
#
2596008a |
| 08-Jul-2023 |
John Baldwin <jhb@FreeBSD.org> |
amd64 pcpu.h: Add missing 'do' from do-while loop around __PCPU_SET.
Reported by: mjg Diagnosed by: jrtc27
|
#
2329393c |
| 07-Jul-2023 |
John Baldwin <jhb@FreeBSD.org> |
amd64: Use __seg_gs to implement per-CPU data accesses.
This makes use of the alternate address space support in both GCC and clang to access per-CPU data as accesses relative to GS:. The original
amd64: Use __seg_gs to implement per-CPU data accesses.
This makes use of the alternate address space support in both GCC and clang to access per-CPU data as accesses relative to GS:. The original motivation for this is that it quiets verbose warnings from GCC 12. However, this version is also much easier to read and allows the compiler to generate better code (e.g. the compiler can use a GS: memory operand directly in other instructions such as IMUL and CMP rather than always MOVing to a temporary register).
The one caveat is that the current approach is very inefficient at -O0 since the compiler expects to load the 0 base offset from a global variable instead of assuming it is 0 (even with the const).
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D40647
show more ...
|
#
4d846d26 |
| 10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
show more ...
|
#
42f722e7 |
| 01-May-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: store pcids pmap data in pcpu zone
This change eliminates the struct pmap_pcid array embedded into struct pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also it removes fa
amd64: store pcids pmap data in pcpu zone
This change eliminates the struct pmap_pcid array embedded into struct pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also it removes false sharing of cache lines, since the array elements are mostly locally accessed by corresponding CPUs.
Suggested by: mjg Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D39890
show more ...
|
Revision tags: release/13.2.0, release/12.4.0 |
|
#
cde70e31 |
| 11-Oct-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG in pcid mode unreliable, it seems. The workaround is
amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG in pcid mode unreliable, it seems. The workaround is applied for all CPUs with small cores, since we do not know the scope of the issue, and the right fix.
Reviewed by: alc (previous version) Discussed with: emaste, markj Tested by: karels PR: 261169, 266145 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37770
show more ...
|
#
45ac7755 |
| 21-Dec-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: identify small cores
Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37770
|
#
f6fada5e |
| 13-Jun-2022 |
Brooks Davis <brooks@FreeBSD.org> |
amd64: -m32 support for machine/pcpu.h
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include when targeting i386.
This is a kernel-only header and should not be required, but procsta
amd64: -m32 support for machine/pcpu.h
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include when targeting i386.
This is a kernel-only header and should not be required, but procstat's zfs support includes this with _KERNEL defined.
Reviewed by: jhb, imp
show more ...
|
Revision tags: release/13.1.0 |
|
#
3d6f4411 |
| 12-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
Remove checks for <sys/cdefs.h> being included.
These files no longer depend on the macros required when these checks were added.
PR: 263102 (exp-run) Reviewed by: brooks, imp, emaste Differential
Remove checks for <sys/cdefs.h> being included.
These files no longer depend on the macros required when these checks were added.
PR: 263102 (exp-run) Reviewed by: brooks, imp, emaste Differential Revision: https://reviews.freebsd.org/D34804
show more ...
|
#
5ab33279 |
| 12-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
Remove checks for __GNUCLIKE___TYPEOF assuming it is always true.
All supported compilers (modern versions of GCC and clang) support this.
PR: 263102 (exp-run) Reviewed by: brooks, imp, emaste Dif
Remove checks for __GNUCLIKE___TYPEOF assuming it is always true.
All supported compilers (modern versions of GCC and clang) support this.
PR: 263102 (exp-run) Reviewed by: brooks, imp, emaste Differential Revision: https://reviews.freebsd.org/D34798
show more ...
|
#
56f5947a |
| 12-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
Remove checks for __GNUCLIKE_ASM assuming it is always true.
All supported compilers (modern versions of GCC and clang) support this.
Many places didn't have an #else so would just silently do the
Remove checks for __GNUCLIKE_ASM assuming it is always true.
All supported compilers (modern versions of GCC and clang) support this.
Many places didn't have an #else so would just silently do the wrong thing. Ancient versions of icc (the original motivation for this) are no longer a compiler FreeBSD supports.
PR: 263102 (exp-run) Reviewed by: brooks, imp Differential Revision: https://reviews.freebsd.org/D34797
show more ...
|
Revision tags: release/12.3.0, release/13.0.0 |
|
#
d22883d7 |
| 10-Mar-2021 |
Jason A. Harmening <jah@FreeBSD.org> |
Remove PCPU_INC
e4b8deb22227 removed the last in-tree uses of PCPU_INC(). Its potential benefit is also practically nonexistent. Non-x86 platforms already implement it as PCPU_ADD(..., 1), and acc
Remove PCPU_INC
e4b8deb22227 removed the last in-tree uses of PCPU_INC(). Its potential benefit is also practically nonexistent. Non-x86 platforms already implement it as PCPU_ADD(..., 1), and according to [0] there are no recent x86 processors for which the 'inc' instruction provides a performance benefit over the equivalent memory-operand form of the 'add' instruction. The only remaining benefit of 'inc' is smaller instruction size, which in this case is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29308
show more ...
|
#
e4b8deb2 |
| 25-Feb-2021 |
Jason A. Harmening <jah@FreeBSD.org> |
amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from global atomics to scalable counter(9) counters. Per discussion w
amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from global atomics to scalable counter(9) counters. Per discussion with kib@, it also removes the handrolled per-CPU PCID save count as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems unlikely that they will be useful outside of very specific debugging scenarios. However, this change does add two new counters that are available without PV_STATS. pt_page_count and pv_page_count track the number of active physical-to-virtual list pages and page table pages, respectively. These will be useful in evaluating the memory footprint of pmap structures under various workloads, which will help to guide future changes in this area.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28923
show more ...
|
Revision tags: release/12.2.0 |
|
#
c7aa572c |
| 31-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
3ec7e169 |
| 18-Jul-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non
amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non-global TLB entries and we take advantage of it by only invalidating the KPT TLB entries (there is no cached UPT entries at all).
In PCID mode, we flush both KPT and UPT TLB explicitly, but we can take advantage of the fact that PCID mode command to reload %cr3 includes a flag to flush/not flush target TLB. In particular, we can avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3 on return to usermode should be flushing. This is done by providing either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing UPT TLB on context switch, replacing it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID PTI ifunc, leaving only 4 cases instead of 6. This trick is also applicable both to the TLB shootdown IPI handlers, since handlers interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would reopen the same race for INVPCID machines as was fixed in r306350 for non-INVPCID. To not introduce the same bug, unconditionally do spinlock_enter() in pmap_activate().
Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25483
show more ...
|
#
e2c0e292 |
| 16-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
dc43978a |
| 14-Jul-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently fr
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently from other CPUs. Initiator enters critical section, then fills its local PCPU shootdown info (pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu, my_cpuid) for each target cpu. After that IPI is sent to all targets which scan for zeroed scoreboard generation words. Upon finding such word the shootdown data is read from corresponding cpu' pcpu, and generation is set. Meantime initiator loops waiting for all zeroed generations in scoreboard to update.
Initiator does not disable interrupts, which should allow non-invalidation IPIs from deadlocking, it only needs to disable preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in handler. It is safe because target CPU cannot return to userspace before handler finishes. In principle only NMI can preempt the handler, but NMI would see the kernel handler frame and not touch not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations. This, together with hardware keeping one pending IPI in LAPIC IRR should prevent lost shootdowns.
Notes. 1. The code does protect writes to LAPIC ICR with exclusion. I believe this is fine because we in fact do not send IPIs from interrupt handlers. More for !x2APIC mode where ICR access for write requires two registers write, we disable interrupts around it. If considered incorrect, I can add per-cpu spinlock around ipi_send(). 2. Scoreboard lines owned by given target CPU can be padded to the cache line, to reduce ping-pong.
Reviewed by: markj (previous version) Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25510
show more ...
|
Revision tags: release/11.4.0 |
|
#
44e86fbd |
| 13-Feb-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r357662 through r357854.
|
#
2318ed25 |
| 12-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: provide custom zpcpu set/add/sub routines
Note that clobbers are highly overzealous, can be cleaned up later.
|
#
fb886947 |
| 12-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
amd64: store per-cpu allocations subtracted by __pcpu
This eliminates a runtime subtraction from counter_u64_add.
before: mov 0x4f00ed(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4> s
amd64: store per-cpu allocations subtracted by __pcpu
This eliminates a runtime subtraction from counter_u64_add.
before: mov 0x4f00ed(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4> sub 0x808ff6(%rip),%rax # 0xffffffff80f1a698 <__pcpu> addq $0x1,%gs:(%rax)
after: mov 0x4f02fd(%rip),%rax # 0xffffffff80c01788 <numfullpathfail4> addq $0x1,%gs:(%rax)
Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23570
show more ...
|
#
a7af4a3e |
| 12-Nov-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: move GDT into PCPU area.
Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302
|
#
98158c75 |
| 10-Nov-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: move common_tss into pcpu.
This saves some memory, around 256K I think. It removes some code, e.g. KPTI does not need to specially map common_tss anymore. Also, common_tss become domain-loc
amd64: move common_tss into pcpu.
This saves some memory, around 256K I think. It removes some code, e.g. KPTI does not need to specially map common_tss anymore. Also, common_tss become domain-local.
Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22231
show more ...
|
Revision tags: release/12.1.0 |
|
#
c5c3ba6b |
| 03-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r351317 through r351731.
|
#
a2a0f906 |
| 29-Aug-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Centralize __pcpu definitions.
Many extern struct pcpu <something>__pcpu declarations were copied/pasted in sources. The issue is that the definition is MD, but it cannot be provided by machine/pcp
Centralize __pcpu definitions.
Many extern struct pcpu <something>__pcpu declarations were copied/pasted in sources. The issue is that the definition is MD, but it cannot be provided by machine/pcpu.h due to actual struct pcpu defined in sys/pcpu.h later than the inclusion of machine/pcpu.h. This forced the copying when other code needed direct access to __pcpu. There is no way around it, due to machine/pcpu.h supplying part of struct pcpu fields.
To work around the problem, add a new machine/pcpu_aux.h header, which should fill any needed MD definitions after struct pcpu definition is completed. This allows to remove copies of __pcpu spread around the source. Also on x86 it makes it possible to remove work arounds like OFFSETOF_CURTHREAD or clang specific warnings supressions.
Reported and tested by: lwhsu, bcran Reviewed by: imp, markj (previous version) Discussed with: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21418
show more ...
|