xref: /linux/Documentation/arch/x86/amd-hfi.rst (revision 98e8f2c0e0930feee6a2538450c74d9d7de0a9cc)
1*11390345SPerry Yuan.. SPDX-License-Identifier: GPL-2.0
2*11390345SPerry Yuan
3*11390345SPerry Yuan======================================================================
4*11390345SPerry YuanHardware Feedback Interface For Hetero Core Scheduling On AMD Platform
5*11390345SPerry Yuan======================================================================
6*11390345SPerry Yuan
7*11390345SPerry Yuan:Copyright: 2025 Advanced Micro Devices, Inc. All Rights Reserved.
8*11390345SPerry Yuan
9*11390345SPerry Yuan:Author: Perry Yuan <perry.yuan@amd.com>
10*11390345SPerry Yuan:Author: Mario Limonciello <mario.limonciello@amd.com>
11*11390345SPerry Yuan
12*11390345SPerry YuanOverview
13*11390345SPerry Yuan--------
14*11390345SPerry Yuan
15*11390345SPerry YuanAMD Heterogeneous Core implementations are comprised of more than one
16*11390345SPerry Yuanarchitectural class and CPUs are comprised of cores of various efficiency and
17*11390345SPerry Yuanpower capabilities: performance-oriented *classic cores* and power-efficient
18*11390345SPerry Yuan*dense cores*. As such, power management strategies must be designed to
19*11390345SPerry Yuanaccommodate the complexities introduced by incorporating different core types.
20*11390345SPerry YuanHeterogeneous systems can also extend to more than two architectural classes
21*11390345SPerry Yuanas well. The purpose of the scheduling feedback mechanism is to provide
22*11390345SPerry Yuaninformation to the operating system scheduler in real time such that the
23*11390345SPerry Yuanscheduler can direct threads to the optimal core.
24*11390345SPerry Yuan
25*11390345SPerry YuanThe goal of AMD's heterogeneous architecture is to attain power benefit by
26*11390345SPerry Yuansending background threads to the dense cores while sending high priority
27*11390345SPerry Yuanthreads to the classic cores. From a performance perspective, sending
28*11390345SPerry Yuanbackground threads to dense cores can free up power headroom and allow the
29*11390345SPerry Yuanclassic cores to optimally service demanding threads. Furthermore, the area
30*11390345SPerry Yuanoptimized nature of the dense cores allows for an increasing number of
31*11390345SPerry Yuanphysical cores. This improved core density will have positive multithreaded
32*11390345SPerry Yuanperformance impact.
33*11390345SPerry Yuan
34*11390345SPerry YuanAMD Heterogeneous Core Driver
35*11390345SPerry Yuan-----------------------------
36*11390345SPerry Yuan
37*11390345SPerry YuanThe ``amd_hfi`` driver delivers the operating system a performance and energy
38*11390345SPerry Yuanefficiency capability data for each CPU in the system. The scheduler can use
39*11390345SPerry Yuanthe ranking data from the HFI driver to make task placement decisions.
40*11390345SPerry Yuan
41*11390345SPerry YuanThread Classification and Ranking Table Interaction
42*11390345SPerry Yuan----------------------------------------------------
43*11390345SPerry Yuan
44*11390345SPerry YuanThe thread classification is used to select into a ranking table that
45*11390345SPerry Yuandescribes an efficiency and performance ranking for each classification.
46*11390345SPerry Yuan
47*11390345SPerry YuanThreads are classified during runtime into enumerated classes. The classes
48*11390345SPerry Yuanrepresent thread performance/power characteristics that may benefit from
49*11390345SPerry Yuanspecial scheduling behaviors. The below table depicts an example of thread
50*11390345SPerry Yuanclassification and a preference where a given thread should be scheduled
51*11390345SPerry Yuanbased on its thread class. The real time thread classification is consumed
52*11390345SPerry Yuanby the operating system and is used to inform the scheduler of where the
53*11390345SPerry Yuanthread should be placed.
54*11390345SPerry Yuan
55*11390345SPerry YuanThread Classification Example Table
56*11390345SPerry Yuan^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
57*11390345SPerry Yuan+----------+----------------+-------------------------------+---------------------+---------+
58*11390345SPerry Yuan| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter |
59*11390345SPerry Yuan+----------+----------------+-------------------------------+---------------------+---------+
60*11390345SPerry Yuan| 0        | Default        | Performant                    | Highest             |         |
61*11390345SPerry Yuan+----------+----------------+-------------------------------+---------------------+---------+
62*11390345SPerry Yuan| 1        | Non-scalable   | Efficient                     | Lowest              | PMCx1A1 |
63*11390345SPerry Yuan+----------+----------------+-------------------------------+---------------------+---------+
64*11390345SPerry Yuan| 2        | I/O bound      | Efficient                     | Lowest              | PMCx044 |
65*11390345SPerry Yuan+----------+----------------+-------------------------------+---------------------+---------+
66*11390345SPerry Yuan
67*11390345SPerry YuanThread classification is performed by the hardware each time that the thread is switched out.
68*11390345SPerry YuanThreads that don't meet any hardware specified criteria are classified as "default".
69*11390345SPerry Yuan
70*11390345SPerry YuanAMD Hardware Feedback Interface
71*11390345SPerry Yuan--------------------------------
72*11390345SPerry Yuan
73*11390345SPerry YuanThe Hardware Feedback Interface provides to the operating system information
74*11390345SPerry Yuanabout the performance and energy efficiency of each CPU in the system. Each
75*11390345SPerry Yuancapability is given as a unit-less quantity in the range [0-255]. A higher
76*11390345SPerry Yuanperformance value indicates higher performance capability, and a higher
77*11390345SPerry Yuanefficiency value indicates more efficiency. Energy efficiency and performance
78*11390345SPerry Yuanare reported in separate capabilities in the shared memory based ranking table.
79*11390345SPerry Yuan
80*11390345SPerry YuanThese capabilities may change at runtime as a result of changes in the
81*11390345SPerry Yuanoperating conditions of the system or the action of external factors.
82*11390345SPerry YuanPower Management firmware is responsible for detecting events that require
83*11390345SPerry Yuana reordering of the performance and efficiency ranking. Table updates happen
84*11390345SPerry Yuanrelatively infrequently and occur on the time scale of seconds or more.
85*11390345SPerry Yuan
86*11390345SPerry YuanThe following events trigger a table update:
87*11390345SPerry Yuan    * Thermal Stress Events
88*11390345SPerry Yuan    * Silent Compute
89*11390345SPerry Yuan    * Extreme Low Battery Scenarios
90*11390345SPerry Yuan
91*11390345SPerry YuanThe kernel or a userspace policy daemon can use these capabilities to modify
92*11390345SPerry Yuantask placement decisions. For instance, if either the performance or energy
93*11390345SPerry Yuancapabilities of a given logical processor becomes zero, it is an indication
94*11390345SPerry Yuanthat the hardware recommends to the operating system to not schedule any tasks
95*11390345SPerry Yuanon that processor for performance or energy efficiency reasons, respectively.
96*11390345SPerry Yuan
97*11390345SPerry YuanImplementation details for Linux
98*11390345SPerry Yuan--------------------------------
99*11390345SPerry Yuan
100*11390345SPerry YuanThe implementation of threads scheduling consists of the following steps:
101*11390345SPerry Yuan
102*11390345SPerry Yuan1. A thread is spawned and scheduled to the ideal core using the default
103*11390345SPerry Yuan   heterogeneous scheduling policy.
104*11390345SPerry Yuan2. The processor profiles thread execution and assigns an enumerated
105*11390345SPerry Yuan   classification ID.
106*11390345SPerry Yuan   This classification is communicated to the OS via logical processor
107*11390345SPerry Yuan   scope MSR.
108*11390345SPerry Yuan3. During the thread context switch out the operating system consumes the
109*11390345SPerry Yuan   workload (WL) classification which resides in a logical processor scope MSR.
110*11390345SPerry Yuan4. The OS triggers the hardware to clear its history by writing to an MSR,
111*11390345SPerry Yuan   after consuming the WL classification and before switching in the new thread.
112*11390345SPerry Yuan5. If due to the classification, ranking table, and processor availability,
113*11390345SPerry Yuan   the thread is not on its ideal processor, the OS will then consider
114*11390345SPerry Yuan   scheduling the thread on its ideal processor (if available).
115*11390345SPerry Yuan
116*11390345SPerry YuanRanking Table
117*11390345SPerry Yuan-------------
118*11390345SPerry YuanThe ranking table is a shared memory region that is used to communicate the
119*11390345SPerry Yuanperformance and energy efficiency capabilities of each CPU in the system.
120*11390345SPerry Yuan
121*11390345SPerry YuanThe ranking table design includes rankings for each APIC ID in the system and
122*11390345SPerry Yuanrankings both for performance and efficiency for each workload classification.
123*11390345SPerry Yuan
124*11390345SPerry Yuan.. kernel-doc:: drivers/platform/x86/amd/hfi/hfi.c
125*11390345SPerry Yuan   :doc: amd_shmem_info
126*11390345SPerry Yuan
127*11390345SPerry YuanRanking Table update
128*11390345SPerry Yuan---------------------------
129*11390345SPerry YuanThe power management firmware issues an platform interrupt after updating the
130*11390345SPerry Yuanranking table and is ready for the operating system to consume it. CPUs receive
131*11390345SPerry Yuansuch interrupt and read new ranking table from shared memory which PCCT table
132*11390345SPerry Yuanhas provided, then ``amd_hfi`` driver parses the new table to provide new
133*11390345SPerry Yuanconsume data for scheduling decisions.
134