| #
2f89b7f7 |
| 24-Mar-2026 |
Besar Wicaksono <bwicaksono@nvidia.com> |
perf: add NVIDIA Tegra410 C2C PMU
Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is used to measure memory latency between the SOC and device memory, e.g GPU Memory (GMEM), CXL Memory, or mem
perf: add NVIDIA Tegra410 C2C PMU
Adds NVIDIA C2C PMU support in Tegra410 SOC. This PMU is used to measure memory latency between the SOC and device memory, e.g GPU Memory (GMEM), CXL Memory, or memory on remote Tegra410 SOC.
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
429b7638 |
| 24-Mar-2026 |
Besar Wicaksono <bwicaksono@nvidia.com> |
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. The PMU is used to measure latency between the edge of the Unified Coherence Fabric to th
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. The PMU is used to measure latency between the edge of the Unified Coherence Fabric to the local system DRAM.
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
bad11557 |
| 09-Sep-2025 |
Koichi Okuno <fj2767dz@fujitsu.com> |
perf: Fujitsu: Add the Uncore PMU driver
This adds a new dynamic PMU to the Perf Events framework to program and control the Uncore PMUs in Fujitsu chips.
This driver exports formatting and event i
perf: Fujitsu: Add the Uncore PMU driver
This adds a new dynamic PMU to the Perf Events framework to program and control the Uncore PMUs in Fujitsu chips.
This driver exports formatting and event information to sysfs so it can be used by the perf user space tools with the syntaxes:
perf stat -e pci_iod0_pci0/ea-pci/ ls perf stat -e pci_iod0_pci0/event=0x80/ ls perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls perf stat -e mac_iod0_mac0_ch0/event=0x80/ ls
FUJITSU-MONAKA PMU Events Specification v1.1 URL: https://github.com/fujitsu/FUJITSU-MONAKA
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Koichi Okuno <fj2767dz@fujitsu.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
58074a0f |
| 11-Jun-2025 |
Rob Herring (Arm) <robh@kernel.org> |
perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
The ARMv9.2 architecture introduces the optional Branch Record Buffer Extension (BRBE), which records information about bra
perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
The ARMv9.2 architecture introduces the optional Branch Record Buffer Extension (BRBE), which records information about branches as they are executed into set of branch record registers. BRBE is similar to x86's Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer (BHRB).
BRBE supports filtering by exception level and can filter just the source or target address if excluded to avoid leaking privileged addresses. The h/w filter would be sufficient except when there are multiple events with disjoint filtering requirements. In this case, BRBE is configured with a union of all the events' desired branches, and then the recorded branches are filtered based on each event's filter. For example, with one event capturing kernel events and another event capturing user events, BRBE will be configured to capture both kernel and user branches. When handling event overflow, the branch records have to be filtered by software to only include kernel or user branch addresses for that event. In contrast, x86 simply configures LBR using the last installed event which seems broken.
It is possible on x86 to configure branch filter such that no branches are ever recorded (e.g. -j save_type). For BRBE, events with a configuration that will result in no samples are rejected.
Recording branches in KVM guests is not supported like x86. However, perf on x86 allows requesting branch recording in guests. The guest events are recorded, but the resulting branches are all from the host. For BRBE, events with branch recording and "exclude_host" set are rejected. Requiring "exclude_guest" to be set did not work. The default for the perf tool does set "exclude_guest" if no exception level options are specified. However, specifying kernel or user events defaults to including both host and guest. In this case, only host branches are recorded.
BRBE can support some additional exception branch types compared to x86. On x86, all exceptions other than syscalls are recorded as IRQ. With BRBE, it is possible to better categorize these exceptions. One limitation relative to x86 is we cannot distinguish a syscall return from other exception returns. So all exception returns are recorded as ERET type. The FIQ branch type is omitted as the only FIQ user is Apple platforms which don't support BRBE. The debug branch types are omitted as there is no clear need for them.
BRBE records are invalidated whenever events are reconfigured, a new task is scheduled in, or after recording is paused (and the records have been recorded for the event). The architecture allows branch records to be invalidated by the PE under implementation defined conditions. It is expected that these conditions are rare.
Cc: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-4-e7775563036e@kernel.org [will: Fix sparse warnings about mixed declarations and code. Fix C99 comment syntax.] Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
70cbcb28 |
| 17-Apr-2025 |
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> |
perf: Do not enable by default during compile testing
Enabling the compile test should not cause automatic enabling of all drivers, but only allow to choose to compile them.
Signed-off-by: Krzyszto
perf: Do not enable by default during compile testing
Enabling the compile test should not cause automatic enabling of all drivers, but only allow to choose to compile them.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250417074650.81561-1-krzysztof.kozlowski@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
e1dce564 |
| 28-Oct-2024 |
Gowthami Thiagarajan <gthiagarajan@marvell.com> |
perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track v
perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track various inbound and outbound transactions which includes separate counters for posted/non-posted/completion TLPs. Also, inbound and outbound memory read requests along with their latencies can also be monitored. Address Translation Services(ATS)events such as ATS Translation, ATS Page Request, ATS Invalidation along with their corresponding latencies are also supported.
The performance counters are 64 bits wide.
For instance, perf stat -e ib_tlp_pr <workload> tracks the inbound posted TLPs for the workload.
Co-developed-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com> Link: https://lore.kernel.org/r/20241028055309.17893-1-gthiagarajan@marvell.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
4d5a7680 |
| 04-Sep-2024 |
Robin Murphy <robin.murphy@arm.com> |
perf: Add driver for Arm NI-700 interconnect PMU
The Arm NI-700 Network-on-Chip Interconnect has a relatively straightforward design with a hierarchy of voltage, power, and clock domains, where each
perf: Add driver for Arm NI-700 interconnect PMU
The Arm NI-700 Network-on-Chip Interconnect has a relatively straightforward design with a hierarchy of voltage, power, and clock domains, where each clock domain then contains a number of interface units and a PMU which can monitor events thereon. As such, it begets a relatively straightforward driver to interface those PMUs with perf.
Even more so than with arm-cmn, users will require detailed knowledge of the wider system topology in order to meaningfully analyse anything, since the interconnect itself cannot know what lies beyond the boundary of each inscrutably-numbered interface. Given that, for now they are also expected to refer to the NI-700 documentation for the relevant event IDs to provide as well. An identifier is implemented so we can come back and add jevents if anyone really wants to.
Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/9933058d0ab8138c78a61cd6852ea5d5ff48e393.1725470837.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
8d75537b |
| 27-Jun-2024 |
Rob Herring (Arm) <robh@kernel.org> |
perf/arm: Move 32-bit PMU drivers to drivers/perf/
It is preferred to put drivers under drivers/ rather than under arch/. The PMU drivers also depend on arm_pmu.c, so it's better to place them all t
perf/arm: Move 32-bit PMU drivers to drivers/perf/
It is preferred to put drivers under drivers/ rather than under arch/. The PMU drivers also depend on arm_pmu.c, so it's better to place them all together.
Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-3-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
c150b809 |
| 22-Mar-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for various vector-accelerated crypto routines
Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for various vector-accelerated crypto routines
- Hibernation is now enabled for portable kernel builds
- mmap_rnd_bits_max is larger on systems with larger VAs
- Support for fast GUP
- Support for membarrier-based instruction cache synchronization
- Support for the Andes hart-level interrupt controller and PMU
- Some cleanups around unaligned access speed probing and Kconfig settings
- Support for ACPI LPI and CPPC
- Various cleanus related to barriers
- A handful of fixes
* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ...
show more ...
|
| #
1d63d1d9 |
| 18-Mar-2024 |
Conor Dooley <conor.dooley@microchip.com> |
perf: starfive: fix 64-bit only COMPILE_TEST condition
ARCH_STARFIVE is not restricted to 64-bit platforms, so while Will's addition of a 64-bit only condition satisfied the build robots doing COMPI
perf: starfive: fix 64-bit only COMPILE_TEST condition
ARCH_STARFIVE is not restricted to 64-bit platforms, so while Will's addition of a 64-bit only condition satisfied the build robots doing COMPILE_TEST builds, Palmer ran into the same problems with writeq() being undefined during regular rv32 builds.
Promote the dependency on 64-bit to its own `depends on` so that the driver can never be included in 32-bit builds.
Reported-by: Palmer Dabbelt <palmer@rivosinc.com> Fixes: c2b24812f7bc ("perf: starfive: Add StarLink PMU support") Fixes: f0dbc6d0de38 ("perf: starfive: Only allow COMPILE_TEST for 64-bit architectures") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Acked-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Link: https://lore.kernel.org/r/20240318-emphatic-rally-f177a4fe1bdc@spud Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
| #
bc969d6c |
| 22-Feb-2024 |
Yu Chien Peter Lin <peterlin@andestech.com> |
perf: RISC-V: Introduce Andes PMU to support perf event sampling
Assign riscv_pmu_irq_num the value of (256 + 18) for the custome PMU and add SSCOUNTOVF and SIP alternatives to ALT_SBI_PMU_OVERFLOW(
perf: RISC-V: Introduce Andes PMU to support perf event sampling
Assign riscv_pmu_irq_num the value of (256 + 18) for the custome PMU and add SSCOUNTOVF and SIP alternatives to ALT_SBI_PMU_OVERFLOW() and ALT_SBI_PMU_OVF_CLEAR_PENDING() macros, respectively.
To make use of Andes PMU extension, "xandespmu" needs to be appended to the riscv,isa-extensions for each cpu node in device-tree, and make sure CONFIG_ANDES_CUSTOM_PMU is enabled.
Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Charles Ci-Jyun Wu <dminus@andestech.com> Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com> Co-developed-by: Locus Wei-Han Chen <locus84@andestech.com> Signed-off-by: Locus Wei-Han Chen <locus84@andestech.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240222083946.3977135-8-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
| #
f0dbc6d0 |
| 05-Mar-2024 |
Will Deacon <will@kernel.org> |
perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
The kbuild robot exploded while wasting its time building the Starfive PMU driver for the 32-bit PA-RISC and Hexagon architectures.
perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
The kbuild robot exploded while wasting its time building the Starfive PMU driver for the 32-bit PA-RISC and Hexagon architectures.
Adjust the Kconfig dependencies so that COMPILE_TEST is only applicable for 64-bit architectures (which implement writeq()).
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
c2b24812 |
| 29-Feb-2024 |
Ji Sheng Teoh <jisheng.teoh@starfivetech.com> |
perf: starfive: Add StarLink PMU support
This patch adds support for StarFive's StarLink PMU (Performance Monitor Unit). StarLink PMU integrates one or more CPU cores with a shared L3 memory system.
perf: starfive: Add StarLink PMU support
This patch adds support for StarFive's StarLink PMU (Performance Monitor Unit). StarLink PMU integrates one or more CPU cores with a shared L3 memory system. The PMU supports overflow interrupt, up to 16 programmable 64bit event counters, and an independent 64bit cycle counter. StarLink PMU is accessed via MMIO.
Example Perf stat output: [root@user]# perf stat -a -e /starfive_starlink_pmu/cycles/ \ -e /starfive_starlink_pmu/read_miss/ \ -e /starfive_starlink_pmu/read_hit/ \ -e /starfive_starlink_pmu/release_request/ \ -e /starfive_starlink_pmu/write_hit/ \ -e /starfive_starlink_pmu/write_miss/ \ -e /starfive_starlink_pmu/write_request/ \ -e /starfive_starlink_pmu/writeback/ \ -e /starfive_starlink_pmu/read_request/ \ -- openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 5 2048 bits private RSA's in 2.84s Doing 2048 bits public rsa's for 10s: 169 2048 bits public RSA's in 2.42s version: 3.0.11 built on: Tue Sep 19 13:02:31 2023 UTC options: bn(64,64) CPUINFO: N/A sign verify sign/s verify/s rsa 2048 bits 0.568000s 0.014320s 1.8 69.8 ///////// Performance counter stats for 'system wide':
649991998 starfive_starlink_pmu/cycles/ 1009690 starfive_starlink_pmu/read_miss/ 1079750 starfive_starlink_pmu/read_hit/ 2089405 starfive_starlink_pmu/release_request/ 129 starfive_starlink_pmu/write_hit/ 70 starfive_starlink_pmu/write_miss/ 194 starfive_starlink_pmu/write_request/ 150080 starfive_starlink_pmu/writeback/ 2089423 starfive_starlink_pmu/read_request/
27.062755678 seconds time elapsed
Signed-off-by: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Link: https://lore.kernel.org/r/20240229072720.3987876-2-jisheng.teoh@starfivetech.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
af9597ad |
| 08-Dec-2023 |
Shuai Xue <xueshuai@linux.alibaba.com> |
drivers/perf: add DesignWare PCIe PMU driver
This commit adds the PCIe Performance Monitoring Unit (PMU) driver support for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express Core c
drivers/perf: add DesignWare PCIe PMU driver
This commit adds the PCIe Performance Monitoring Unit (PMU) driver support for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express Core controller IP which provides statistics feature. The PMU is a PCIe configuration space register block provided by each PCIe Root Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error injection, and Statistics).
To facilitate collection of statistics the controller provides the following two features for each Root Port:
- one 64-bit counter for Time Based Analysis (RX/TX data throughput and time spent in each low-power LTSSM state) and - one 32-bit counter for Event Counting (error and non-error events for a specified lane)
Note: There is no interrupt for counter overflow.
This driver adds PMU devices for each PCIe Root Port. And the PMU device is named based the BDF of Root Port. For example,
30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
the PMU device name for this Root Port is dwc_rootport_3018.
Example usage of counting PCIe RX TLP data payload (Units of bytes)::
$# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
average RX bandwidth can be calculated like this:
PCIe TX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/20231208025652.87192-5-xueshuai@linux.alibaba.com [will: Fix sparse error due to use of uninitialised 'vsec' symbol in dwc_pcie_match_des_cap()] Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
7c3f204e |
| 06-Jul-2023 |
Vincent Whitchurch <vincent.whitchurch@axis.com> |
perf/smmuv3: Remove build dependency on ACPI
This driver supports working without ACPI since commit 3f7be43561766 ("perf/smmuv3: Add devicetree support"), so remove the build dependency.
Signed-off
perf/smmuv3: Remove build dependency on ACPI
This driver supports working without ACPI since commit 3f7be43561766 ("perf/smmuv3: Add devicetree support"), so remove the build dependency.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230706-smmuv3-pmu-noacpi-v1-1-7083ef189158@axis.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
d25f0025 |
| 01-Jul-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dan Williams: "The highlights in terms of new functionality are support for the standard CXL
Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dan Williams: "The highlights in terms of new functionality are support for the standard CXL Performance Monitor definition that appeared in CXL 3.0, support for device sanitization (wiping all data from a device), secure-erase (re-keying encryption of user data), and support for firmware update. The firmware update support is notable as it reuses the simple sysfs_upload interface to just cat(1) a blob to a sysfs file and pipe that to the device.
Additionally there are a substantial number of cleanups and reorganizations to get ready for RCH error handling (RCH == Restricted CXL Host == current shipping hardware generation / pre CXL-2.0 topologies) and type-2 (accelerator / vendor specific) devices.
For vendor specific devices they implement a subset of what the generic type-3 (generic memory expander) driver expects. As a result the rework decouples optional infrastructure from the core driver context.
For RCH topologies, where the specification working group did not want to confuse pre-CXL-aware operating systems, many of the standard registers are hidden which makes support standard bus features like AER (PCIe Advanced Error Reporting) difficult. The rework arranges for the driver to help the PCI-AER core. Bjorn is on board with this direction but a late regression disocvery means the completion of this functionality needs to cook a bit longer, so it is code reorganizations only for now.
Summary:
- Add infrastructure for supporting background commands along with support for device sanitization and firmware update
- Introduce a CXL performance monitoring unit driver based on the common definition in the specification.
- Land some preparatory cleanup and refactoring for the anticipated arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1 topology) error handling.
- Rework CPU cache management with respect to region configuration (device hotplug or other dynamic changes to memory interleaving)
- Fix region reconfiguration vs CXL decoder ordering rules"
* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits) cxl: Fix one kernel-doc comment cxl/pci: Use correct flag for sanitize polling docs: perf: Minimal introduction the the CXL PMU device and driver perf: CXL Performance Monitoring Unit driver tools/testing/cxl: add firmware update emulation to CXL memdevs tools/testing/cxl: Use named effects for the Command Effect Log tools/testing/cxl: Fix command effects for inject/clear poison cxl: add a firmware update mechanism using the sysfs firmware loader cxl/test: Add Secure Erase opcode support cxl/mem: Support Secure Erase cxl/test: Add Sanitize opcode support cxl/mem: Wire up Sanitization support cxl/mbox: Add sanitization handling machinery cxl/mem: Introduce security state sysfs file cxl/mbox: Allow for IRQ_NONE case in the isr Revert "cxl/port: Enable the HDM decoder capability for switch ports" cxl/memdev: Formalize endpoint port linkage cxl/pci: Unconditionally unmask 256B Flit errors cxl/region: Manage decoder target_type at decoder-attach time cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM ...
show more ...
|
| #
5d7107c7 |
| 26-May-2023 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
perf: CXL Performance Monitoring Unit driver
CXL rev 3.0 introduces a standard performance monitoring hardware block to CXL. Instances are discovered using CXL Register Locator DVSEC entries. Each C
perf: CXL Performance Monitoring Unit driver
CXL rev 3.0 introduces a standard performance monitoring hardware block to CXL. Instances are discovered using CXL Register Locator DVSEC entries. Each CXL component may have multiple PMUs.
This initial driver supports a subset of types of counter. It supports counters that are either fixed or configurable, but requires that they support the ability to freeze and write value whilst frozen.
Development done with QEMU model which will be posted shortly.
Example:
$ perf stat -a -e cxl_pmu_mem0.0/h2d_req_snpcur/ -e cxl_pmu_mem0.0/h2d_req_snpdata/ -e cxl_pmu_mem0.0/clock_ticks/ sleep 1
Performance counter stats for 'system wide':
96,757,023,244,321 cxl_pmu_mem0.0/h2d_req_snpcur/ 96,757,023,244,365 cxl_pmu_mem0.0/h2d_req_snpdata/ 193,514,046,488,653 cxl_pmu_mem0.0/clock_ticks/
1.090539600 seconds time elapsed
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230526095824.16336-5-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| #
55691f99 |
| 18-Apr-2023 |
Xu Yang <xu.yang_2@nxp.com> |
drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
Add ddr performance monitor support for i.MX93.
There are 11 counters for ddr performance events. - Counter 0 is a 64-bit counte
drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
Add ddr performance monitor support for i.MX93.
There are 11 counters for ddr performance events. - Counter 0 is a 64-bit counter that counts only clock cycles. - Counter 1-10 are 32-bit counters that can monitor counter-specific events in addition to counting reference events.
For example: perf stat -a -e imx9_ddr0/ddrc_pm_1,counter=1/,imx9_ddr0/ddrc_pm_2,counter=2/ ls
Besides, this ddr pmu support AXI filter capability. It's implemented as counter-specific events. It now supports read transaction, write transaction and read beat events which corresponding respecitively to counter 2, 3 and 4. axi_mask and axi_id need to be as event parameters.
For example: perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_trans_filt,counter=2,axi_mask=ID_MASK,axi_id=ID/ perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_wr_trans_filt,counter=3,axi_mask=ID_MASK,axi_id=ID/ perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt,counter=4,axi_mask=ID_MASK,axi_id=ID/
Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230418102910.2065651-1-xu.yang_2@nxp.com [will: Remove redundant error message on platform_get_irq() failure] Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
009d6dc8 |
| 17-Mar-2023 |
Marc Zyngier <marc.zyngier@arm.com> |
ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM
The only thing stopping the PMUv3 driver from compiling on 32bit is the lack of defined system registers names and the handful of required h
ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM
The only thing stopping the PMUv3 driver from compiling on 32bit is the lack of defined system registers names and the handful of required helpers.
This is easily solved by providing the sysreg accessors and updating the Kconfig entry.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Co-developed-by: Zaid Al-Bassam <zalbassam@google.com> Signed-off-by: Zaid Al-Bassam <zalbassam@google.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230317195027.3746949-8-zalbassam@google.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
7755cec6 |
| 17-Mar-2023 |
Marc Zyngier <marc.zyngier@arm.com> |
arm64: perf: Move PMUv3 driver to drivers/perf
Having the ARM PMUv3 driver sitting in arch/arm64/kernel is getting in the way of being able to use perf on ARMv8 cores running a 32bit kernel, such as
arm64: perf: Move PMUv3 driver to drivers/perf
Having the ARM PMUv3 driver sitting in arch/arm64/kernel is getting in the way of being able to use perf on ARMv8 cores running a 32bit kernel, such as 32bit KVM guests.
This patch moves it into drivers/perf/arm_pmuv3.c, with an include file in include/linux/perf/arm_pmuv3.h. The only thing left in arch/arm64 is some mundane perf stuff.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Zaid Al-Bassam <zalbassam@google.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230317195027.3746949-2-zalbassam@google.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
9d33edb2 |
| 12-Dec-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem:
The bulk is
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem:
The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains.
IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device.
There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way.
The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation.
At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller.
This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this:
|--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N
where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy.
While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive.
A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model.
The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this:
|--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per device:
|--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver.
There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well.
Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place"
* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
show more ...
|
| #
2016e211 |
| 21-Nov-2022 |
Jiucheng Xu <jiucheng.xu@amlogic.com> |
perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
Add support for Amlogic Meson G12 Series SOC - DDR bandwidth PMU driver framework and interfaces. The PMU can not only monitor the
perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver
Add support for Amlogic Meson G12 Series SOC - DDR bandwidth PMU driver framework and interfaces. The PMU can not only monitor the total DDR bandwidth, but also individual IP module bandwidth.
Signed-off-by: Jiucheng Xu <jiucheng.xu@amlogic.com> Tested-by: Chris Healy <healych@amazon.com> Link: https://lore.kernel.org/r/20221121021602.3306998-1-jiucheng.xu@amlogic.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
13e7accb |
| 11-Nov-2022 |
Thomas Gleixner <tglx@linutronix.de> |
genirq: Get rid of GENERIC_MSI_IRQ_DOMAIN
Adjust to reality and remove another layer of pointless Kconfig indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve all purposes.
Signed-off-by: Th
genirq: Get rid of GENERIC_MSI_IRQ_DOMAIN
Adjust to reality and remove another layer of pointless Kconfig indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve all purposes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.524842979@linutronix.de
show more ...
|
| #
e37dfd65 |
| 11-Nov-2022 |
Besar Wicaksono <bwicaksono@nvidia.com> |
perf: arm_cspmu: Add support for ARM CoreSight PMU driver
Add support for ARM CoreSight PMU driver framework and interfaces. The driver provides generic implementation to operate uncore PMU based on
perf: arm_cspmu: Add support for ARM CoreSight PMU driver
Add support for ARM CoreSight PMU driver framework and interfaces. The driver provides generic implementation to operate uncore PMU based on ARM CoreSight PMU architecture. The driver also provides interface to get vendor/implementation specific information, for example event attributes and formating.
The specification used in this implementation can be found below: * ACPI Arm Performance Monitoring Unit table: https://developer.arm.com/documentation/den0117/latest * ARM Coresight PMU architecture: https://developer.arm.com/documentation/ihi0091/latest
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Link: https://lore.kernel.org/r/20221111222330.48602-2-bwicaksono@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
| #
e08d07dd |
| 27-Sep-2022 |
Geert Uytterhoeven <geert+renesas@glider.be> |
drivers/perf: ALIBABA_UNCORE_DRW_PMU should depend on ACPI
The Alibaba T-Head Yitian 710 DDR Sub-system Driveway PMU driver relies solely on ACPI for matching. Hence add a dependency on ACPI, to pr
drivers/perf: ALIBABA_UNCORE_DRW_PMU should depend on ACPI
The Alibaba T-Head Yitian 710 DDR Sub-system Driveway PMU driver relies solely on ACPI for matching. Hence add a dependency on ACPI, to prevent asking the user about this driver when configuring a kernel without ACPI support.
Fixes: cf7b61073e45 ("drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/2a4407bb598285660fa5e604e56823ddb12bb0aa.1664285774.git.geert+renesas@glider.be Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|