xref: /linux/Documentation/core-api/real-time/hardware.rst (revision 72c395024dac5e215136cbff793455f065603b06)
1*7548c69fSSebastian Andrzej Siewior.. SPDX-License-Identifier: GPL-2.0
2*7548c69fSSebastian Andrzej Siewior
3*7548c69fSSebastian Andrzej Siewior====================
4*7548c69fSSebastian Andrzej SiewiorConsidering hardware
5*7548c69fSSebastian Andrzej Siewior====================
6*7548c69fSSebastian Andrzej Siewior
7*7548c69fSSebastian Andrzej Siewior:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
8*7548c69fSSebastian Andrzej Siewior
9*7548c69fSSebastian Andrzej SiewiorThe way a workload is handled can be influenced by the hardware it runs on.
10*7548c69fSSebastian Andrzej SiewiorKey components include the CPU, memory, and the buses that connect them.
11*7548c69fSSebastian Andrzej SiewiorThese resources are shared among all applications on the system.
12*7548c69fSSebastian Andrzej SiewiorAs a result, heavy utilization of one resource by a single application
13*7548c69fSSebastian Andrzej Siewiorcan affect the deterministic handling of workloads in other applications.
14*7548c69fSSebastian Andrzej Siewior
15*7548c69fSSebastian Andrzej SiewiorBelow is a brief overview.
16*7548c69fSSebastian Andrzej Siewior
17*7548c69fSSebastian Andrzej SiewiorSystem memory and cache
18*7548c69fSSebastian Andrzej Siewior-----------------------
19*7548c69fSSebastian Andrzej Siewior
20*7548c69fSSebastian Andrzej SiewiorMain memory and the associated caches are the most common shared resources among
21*7548c69fSSebastian Andrzej Siewiortasks in a system. One task can dominate the available caches, forcing another
22*7548c69fSSebastian Andrzej Siewiortask to wait until a cache line is written back to main memory before it can
23*7548c69fSSebastian Andrzej Siewiorproceed. The impact of this contention varies based on write patterns and the
24*7548c69fSSebastian Andrzej Siewiorsize of the caches available. Larger caches may reduce stalls because more lines
25*7548c69fSSebastian Andrzej Siewiorcan be buffered before being written back. Conversely, certain write patterns
26*7548c69fSSebastian Andrzej Siewiormay trigger the cache controller to flush many lines at once, causing
27*7548c69fSSebastian Andrzej Siewiorapplications to stall until the operation completes.
28*7548c69fSSebastian Andrzej Siewior
29*7548c69fSSebastian Andrzej SiewiorThis issue can be partly mitigated if applications do not share the same CPU
30*7548c69fSSebastian Andrzej Siewiorcache. The kernel is aware of the cache topology and exports this information to
31*7548c69fSSebastian Andrzej Siewioruser space. Tools such as **lstopo** from the Portable Hardware Locality (hwloc)
32*7548c69fSSebastian Andrzej Siewiorproject (https://www.open-mpi.org/projects/hwloc/) can visualize the hierarchy.
33*7548c69fSSebastian Andrzej Siewior
34*7548c69fSSebastian Andrzej SiewiorAvoiding shared L2 or L3 caches is not always possible. Even when cache sharing
35*7548c69fSSebastian Andrzej Siewioris minimized, bottlenecks can still occur when accessing system memory. Memory
36*7548c69fSSebastian Andrzej Siewioris used not only by the CPU but also by peripheral devices via DMA, such as
37*7548c69fSSebastian Andrzej Siewiorgraphics cards or network adapters.
38*7548c69fSSebastian Andrzej Siewior
39*7548c69fSSebastian Andrzej SiewiorIn some cases, cache and memory bottlenecks can be controlled if the hardware
40*7548c69fSSebastian Andrzej Siewiorprovides the necessary support. On x86 systems, Intel offers Cache Allocation
41*7548c69fSSebastian Andrzej SiewiorTechnology (CAT), which enables cache partitioning among applications and
42*7548c69fSSebastian Andrzej Siewiorprovides control over the interconnect. AMD provides similar functionality under
43*7548c69fSSebastian Andrzej SiewiorPlatform Quality of Service (PQoS). On Arm64, the equivalent is Memory
44*7548c69fSSebastian Andrzej SiewiorSystem Resource Partitioning and Monitoring (MPAM).
45*7548c69fSSebastian Andrzej Siewior
46*7548c69fSSebastian Andrzej SiewiorThese features can be configured through the Linux Resource Control interface.
47*7548c69fSSebastian Andrzej SiewiorFor details, see Documentation/filesystems/resctrl.rst.
48*7548c69fSSebastian Andrzej Siewior
49*7548c69fSSebastian Andrzej SiewiorThe perf tool can be used to monitor cache behavior. It can analyze
50*7548c69fSSebastian Andrzej Siewiorcache misses of an application and compare how they change under
51*7548c69fSSebastian Andrzej Siewiordifferent workloads on a neighboring CPU. Even more powerful, the perf
52*7548c69fSSebastian Andrzej Siewiorc2c tool can help identify cache-to-cache issues, where multiple CPU
53*7548c69fSSebastian Andrzej Siewiorcores repeatedly access and modify data on the same cache line.
54*7548c69fSSebastian Andrzej Siewior
55*7548c69fSSebastian Andrzej SiewiorHardware buses
56*7548c69fSSebastian Andrzej Siewior--------------
57*7548c69fSSebastian Andrzej Siewior
58*7548c69fSSebastian Andrzej SiewiorReal-time systems often need to access hardware directly to perform their work.
59*7548c69fSSebastian Andrzej SiewiorAny latency in this process is undesirable, as it can affect the outcome of the
60*7548c69fSSebastian Andrzej Siewiortask. For example, on an I/O bus, a changed output may not become immediately
61*7548c69fSSebastian Andrzej Siewiorvisible but instead appear with variable delay depending on the latency of the
62*7548c69fSSebastian Andrzej Siewiorbus used for communication.
63*7548c69fSSebastian Andrzej Siewior
64*7548c69fSSebastian Andrzej SiewiorA bus such as PCI is relatively simple because register accesses are routed
65*7548c69fSSebastian Andrzej Siewiordirectly to the connected device. In the worst case, a read operation stalls the
66*7548c69fSSebastian Andrzej SiewiorCPU until the device responds.
67*7548c69fSSebastian Andrzej Siewior
68*7548c69fSSebastian Andrzej SiewiorA bus such as USB is more complex, involving multiple layers. A register read
69*7548c69fSSebastian Andrzej Siewioror write is wrapped in a USB Request Block (URB), which is then sent by the
70*7548c69fSSebastian Andrzej SiewiorUSB host controller to the device. Timing and latency are influenced by the
71*7548c69fSSebastian Andrzej Siewiorunderlying USB bus. Requests cannot be sent immediately; they must align with
72*7548c69fSSebastian Andrzej Siewiorthe next frame boundary according to the endpoint type and the host controller's
73*7548c69fSSebastian Andrzej Siewiorscheduling rules. This can introduce delays and additional latency. For example,
74*7548c69fSSebastian Andrzej Siewiora network device connected via USB may still deliver sufficient throughput, but
75*7548c69fSSebastian Andrzej Siewiorthe added latency when sending or receiving packets may fail to meet the
76*7548c69fSSebastian Andrzej Siewiorrequirements of certain real-time use cases.
77*7548c69fSSebastian Andrzej Siewior
78*7548c69fSSebastian Andrzej SiewiorAdditional restrictions on bus latency can arise from power management. For
79*7548c69fSSebastian Andrzej Siewiorinstance, PCIe with Active State Power Management (ASPM) enabled can suspend
80*7548c69fSSebastian Andrzej Siewiorthe link between the device and the host. While this behavior is beneficial for
81*7548c69fSSebastian Andrzej Siewiorpower savings, it delays device access and adds latency to responses. This issue
82*7548c69fSSebastian Andrzej Siewioris not limited to PCIe; internal buses within a System-on-Chip (SoC) can also be
83*7548c69fSSebastian Andrzej Siewioraffected by power management mechanisms.
84*7548c69fSSebastian Andrzej Siewior
85*7548c69fSSebastian Andrzej SiewiorVirtualization
86*7548c69fSSebastian Andrzej Siewior--------------
87*7548c69fSSebastian Andrzej Siewior
88*7548c69fSSebastian Andrzej SiewiorIn a virtualized environment such as KVM, each guest CPU is represented as a
89*7548c69fSSebastian Andrzej Siewiorthread on the host. If such a thread runs with real-time priority, the system
90*7548c69fSSebastian Andrzej Siewiorshould be tested to confirm it can sustain this behavior over extended periods.
91*7548c69fSSebastian Andrzej SiewiorBecause of its priority, the thread will not be preempted by lower-priority
92*7548c69fSSebastian Andrzej Siewiorthreads (such as SCHED_OTHER), which may then receive no CPU time. This can
93*7548c69fSSebastian Andrzej Siewiorcause problems if a lower-priority thread is pinned to a CPU already occupied by
94*7548c69fSSebastian Andrzej Siewiora real-time task and unable to make progress. Even if a CPU has been isolated,
95*7548c69fSSebastian Andrzej Siewiorthe system may still (accidentally) start a per‑CPU thread on that CPU.
96*7548c69fSSebastian Andrzej SiewiorEnsuring that a guest CPU goes idle is difficult, as it requires avoiding both
97*7548c69fSSebastian Andrzej Siewiortask scheduling and interrupt handling. Furthermore, if the guest CPU does go
98*7548c69fSSebastian Andrzej Siewioridle but the guest system is booted with the option **idle=poll**, the guest
99*7548c69fSSebastian Andrzej SiewiorCPU will never enter an idle state and will instead spin until an event
100*7548c69fSSebastian Andrzej Siewiorarrives.
101*7548c69fSSebastian Andrzej Siewior
102*7548c69fSSebastian Andrzej SiewiorDevice handling introduces additional considerations. Emulated PCI devices or
103*7548c69fSSebastian Andrzej SiewiorVirtIO devices require a counterpart on the host to complete requests. This
104*7548c69fSSebastian Andrzej Siewioradds latency because the host must intercept and either process the request
105*7548c69fSSebastian Andrzej Siewiordirectly or schedule a thread for its completion. These delays can be avoided if
106*7548c69fSSebastian Andrzej Siewiorthe required PCI device is passed directly through to the guest. Some devices,
107*7548c69fSSebastian Andrzej Siewiorsuch as networking or storage controllers, support the PCIe SR-IOV feature.
108*7548c69fSSebastian Andrzej SiewiorSR-IOV allows a single PCIe device to be divided into multiple virtual functions,
109*7548c69fSSebastian Andrzej Siewiorwhich can then be assigned to different guests.
110*7548c69fSSebastian Andrzej Siewior
111*7548c69fSSebastian Andrzej SiewiorNetworking
112*7548c69fSSebastian Andrzej Siewior----------
113*7548c69fSSebastian Andrzej Siewior
114*7548c69fSSebastian Andrzej SiewiorFor low-latency networking, the full networking stack may be undesirable, as it
115*7548c69fSSebastian Andrzej Siewiorcan introduce additional sources of delay. In this context, XDP can be used
116*7548c69fSSebastian Andrzej Siewioras a shortcut to bypass much of the stack while still relying on the kernel's
117*7548c69fSSebastian Andrzej Siewiornetwork driver.
118*7548c69fSSebastian Andrzej Siewior
119*7548c69fSSebastian Andrzej SiewiorThe requirements are that the network driver must support XDP- preferably using
120*7548c69fSSebastian Andrzej Siewioran "skb pool" and that the application must use an XDP socket. Additional
121*7548c69fSSebastian Andrzej Siewiorconfiguration may involve BPF filters, tuning networking queues, or configuring
122*7548c69fSSebastian Andrzej Siewiorqdiscs for time-based transmission. These techniques are often
123*7548c69fSSebastian Andrzej Siewiorapplied in Time-Sensitive Networking (TSN) environments.
124*7548c69fSSebastian Andrzej Siewior
125*7548c69fSSebastian Andrzej SiewiorDocumenting all required steps exceeds the scope of this text. For detailed
126*7548c69fSSebastian Andrzej Siewiorguidance, see the TSN documentation at https://tsn.readthedocs.io.
127*7548c69fSSebastian Andrzej Siewior
128*7548c69fSSebastian Andrzej SiewiorAnother useful resource is the Linux Real-Time Communication Testbench
129*7548c69fSSebastian Andrzej Siewiorhttps://github.com/Linutronix/RTC-Testbench.
130*7548c69fSSebastian Andrzej SiewiorThe goal of this project is to validate real-time network communication. It can
131*7548c69fSSebastian Andrzej Siewiorbe thought of as a "cyclictest" for networking and also serves as a starting
132*7548c69fSSebastian Andrzej Siewiorpoint for application development.
133