xref: /linux/Documentation/power/energy-model.rst (revision 79790b6818e96c58fe2bffee1b418c16e64e7b80)
17b7570adSLukasz Luba.. SPDX-License-Identifier: GPL-2.0
27b7570adSLukasz Luba
37b7570adSLukasz Luba=======================
47b7570adSLukasz LubaEnergy Model of devices
57b7570adSLukasz Luba=======================
6151f4e2bSMauro Carvalho Chehab
7151f4e2bSMauro Carvalho Chehab1. Overview
8151f4e2bSMauro Carvalho Chehab-----------
9151f4e2bSMauro Carvalho Chehab
10151f4e2bSMauro Carvalho ChehabThe Energy Model (EM) framework serves as an interface between drivers knowing
117b7570adSLukasz Lubathe power consumed by devices at various performance levels, and the kernel
12151f4e2bSMauro Carvalho Chehabsubsystems willing to use that information to make energy-aware decisions.
13151f4e2bSMauro Carvalho Chehab
147b7570adSLukasz LubaThe source of the information about the power consumed by devices can vary greatly
15151f4e2bSMauro Carvalho Chehabfrom one platform to another. These power costs can be estimated using
16151f4e2bSMauro Carvalho Chehabdevicetree data in some cases. In others, the firmware will know better.
17151f4e2bSMauro Carvalho ChehabAlternatively, userspace might be best positioned. And so on. In order to avoid
18151f4e2bSMauro Carvalho Chehabeach and every client subsystem to re-implement support for each and every
19151f4e2bSMauro Carvalho Chehabpossible source of information on its own, the EM framework intervenes as an
20151f4e2bSMauro Carvalho Chehababstraction layer which standardizes the format of power cost tables in the
21151f4e2bSMauro Carvalho Chehabkernel, hence enabling to avoid redundant work.
22151f4e2bSMauro Carvalho Chehab
23c5d39faeSLukasz LubaThe power values might be expressed in micro-Watts or in an 'abstract scale'.
245a64f775SLukasz LubaMultiple subsystems might use the EM and it is up to the system integrator to
255a64f775SLukasz Lubacheck that the requirements for the power value scale types are met. An example
265a64f775SLukasz Lubacan be found in the Energy-Aware Scheduler documentation
275a64f775SLukasz LubaDocumentation/scheduler/sched-energy.rst. For some subsystems like thermal or
285a64f775SLukasz Lubapowercap power values expressed in an 'abstract scale' might cause issues.
295a64f775SLukasz LubaThese subsystems are more interested in estimation of power used in the past,
30c5d39faeSLukasz Lubathus the real micro-Watts might be needed. An example of these requirements can
315a64f775SLukasz Lubabe found in the Intelligent Power Allocation in
325a64f775SLukasz LubaDocumentation/driver-api/thermal/power_allocator.rst.
33b56a352cSLukasz LubaKernel subsystems might implement automatic detection to check whether EM
34b56a352cSLukasz Lubaregistered devices have inconsistent scale (based on EM internal flag).
355a64f775SLukasz LubaImportant thing to keep in mind is that when the power values are expressed in
36c5d39faeSLukasz Lubaan 'abstract scale' deriving real energy in micro-Joules would not be possible.
375a64f775SLukasz Luba
38151f4e2bSMauro Carvalho ChehabThe figure below depicts an example of drivers (Arm-specific here, but the
39151f4e2bSMauro Carvalho Chehabapproach is applicable to any architecture) providing power costs to the EM
40151f4e2bSMauro Carvalho Chehabframework, and interested clients reading the data from it::
41151f4e2bSMauro Carvalho Chehab
42151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
43151f4e2bSMauro Carvalho Chehab       | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
44151f4e2bSMauro Carvalho Chehab       +---------------+  +-----------------+  +---------------+
457b7570adSLukasz Luba               |                   | em_cpu_energy()   |
46151f4e2bSMauro Carvalho Chehab               |                   | em_cpu_get()      |
47151f4e2bSMauro Carvalho Chehab               +---------+         |         +---------+
48151f4e2bSMauro Carvalho Chehab                         |         |         |
49151f4e2bSMauro Carvalho Chehab                         v         v         v
50151f4e2bSMauro Carvalho Chehab                        +---------------------+
51151f4e2bSMauro Carvalho Chehab                        |    Energy Model     |
52151f4e2bSMauro Carvalho Chehab                        |     Framework       |
53151f4e2bSMauro Carvalho Chehab                        +---------------------+
54151f4e2bSMauro Carvalho Chehab                           ^       ^       ^
557b7570adSLukasz Luba                           |       |       | em_dev_register_perf_domain()
56151f4e2bSMauro Carvalho Chehab                +----------+       |       +---------+
57151f4e2bSMauro Carvalho Chehab                |                  |                 |
58151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
59151f4e2bSMauro Carvalho Chehab        |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
60151f4e2bSMauro Carvalho Chehab        +---------------+  +---------------+  +--------------+
61151f4e2bSMauro Carvalho Chehab                ^                  ^                 ^
62151f4e2bSMauro Carvalho Chehab                |                  |                 |
63151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
64151f4e2bSMauro Carvalho Chehab        | Device Tree  |   |   Firmware    |  |      ?       |
65151f4e2bSMauro Carvalho Chehab        +--------------+   +---------------+  +--------------+
66151f4e2bSMauro Carvalho Chehab
677b7570adSLukasz LubaIn case of CPU devices the EM framework manages power cost tables per
687b7570adSLukasz Luba'performance domain' in the system. A performance domain is a group of CPUs
697b7570adSLukasz Lubawhose performance is scaled together. Performance domains generally have a
707b7570adSLukasz Luba1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
717b7570adSLukasz Lubarequired to have the same micro-architecture. CPUs in different performance
727b7570adSLukasz Lubadomains can have different micro-architectures.
73151f4e2bSMauro Carvalho Chehab
74*eb1ad4d4SLukasz LubaTo better reflect power variation due to static power (leakage) the EM
75*eb1ad4d4SLukasz Lubasupports runtime modifications of the power values. The mechanism relies on
76*eb1ad4d4SLukasz LubaRCU to free the modifiable EM perf_state table memory. Its user, the task
77*eb1ad4d4SLukasz Lubascheduler, also uses RCU to access this memory. The EM framework provides
78*eb1ad4d4SLukasz LubaAPI for allocating/freeing the new memory for the modifiable EM table.
79*eb1ad4d4SLukasz LubaThe old memory is freed automatically using RCU callback mechanism when there
80*eb1ad4d4SLukasz Lubaare no owners anymore for the given EM runtime table instance. This is tracked
81*eb1ad4d4SLukasz Lubausing kref mechanism. The device driver which provided the new EM at runtime,
82*eb1ad4d4SLukasz Lubashould call EM API to free it safely when it's no longer needed. The EM
83*eb1ad4d4SLukasz Lubaframework will handle the clean-up when it's possible.
84*eb1ad4d4SLukasz Luba
85*eb1ad4d4SLukasz LubaThe kernel code which want to modify the EM values is protected from concurrent
86*eb1ad4d4SLukasz Lubaaccess using a mutex. Therefore, the device driver code must run in sleeping
87*eb1ad4d4SLukasz Lubacontext when it tries to modify the EM.
88*eb1ad4d4SLukasz Luba
89*eb1ad4d4SLukasz LubaWith the runtime modifiable EM we switch from a 'single and during the entire
90*eb1ad4d4SLukasz Lubaruntime static EM' (system property) design to a 'single EM which can be
91*eb1ad4d4SLukasz Lubachanged during runtime according e.g. to the workload' (system and workload
92*eb1ad4d4SLukasz Lubaproperty) design.
93*eb1ad4d4SLukasz Luba
94*eb1ad4d4SLukasz LubaIt is possible also to modify the CPU performance values for each EM's
95*eb1ad4d4SLukasz Lubaperformance state. Thus, the full power and performance profile (which
96*eb1ad4d4SLukasz Lubais an exponential curve) can be changed according e.g. to the workload
97*eb1ad4d4SLukasz Lubaor system property.
98*eb1ad4d4SLukasz Luba
99151f4e2bSMauro Carvalho Chehab
100151f4e2bSMauro Carvalho Chehab2. Core APIs
101151f4e2bSMauro Carvalho Chehab------------
102151f4e2bSMauro Carvalho Chehab
103151f4e2bSMauro Carvalho Chehab2.1 Config options
104151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^
105151f4e2bSMauro Carvalho Chehab
106151f4e2bSMauro Carvalho ChehabCONFIG_ENERGY_MODEL must be enabled to use the EM framework.
107151f4e2bSMauro Carvalho Chehab
108151f4e2bSMauro Carvalho Chehab
109151f4e2bSMauro Carvalho Chehab2.2 Registration of performance domains
110151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
111151f4e2bSMauro Carvalho Chehab
11208374410SLukasz LubaRegistration of 'advanced' EM
11308374410SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11408374410SLukasz Luba
115d56b699dSBjorn HelgaasThe 'advanced' EM gets its name due to the fact that the driver is allowed
11608374410SLukasz Lubato provide more precised power model. It's not limited to some implemented math
117d56b699dSBjorn Helgaasformula in the framework (like it is in 'simple' EM case). It can better reflect
11808374410SLukasz Lubathe real power measurements performed for each performance state. Thus, this
11908374410SLukasz Lubaregistration method should be preferred in case considering EM static power
12008374410SLukasz Luba(leakage) is important.
12108374410SLukasz Luba
122151f4e2bSMauro Carvalho ChehabDrivers are expected to register performance domains into the EM framework by
123151f4e2bSMauro Carvalho Chehabcalling the following API::
124151f4e2bSMauro Carvalho Chehab
1257b7570adSLukasz Luba  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
126c5d39faeSLukasz Luba		struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);
127151f4e2bSMauro Carvalho Chehab
1287b7570adSLukasz LubaDrivers must provide a callback function returning <frequency, power> tuples
1297b7570adSLukasz Lubafor each performance state. The callback function provided by the driver is free
130151f4e2bSMauro Carvalho Chehabto fetch data from any relevant location (DT, firmware, ...), and by any mean
1317b7570adSLukasz Lubadeemed necessary. Only for CPU devices, drivers must specify the CPUs of the
1327b7570adSLukasz Lubaperformance domains using cpumask. For other devices than CPUs the last
1337b7570adSLukasz Lubaargument must be set to NULL.
134c5d39faeSLukasz LubaThe last argument 'microwatts' is important to set with correct value. Kernel
135b56a352cSLukasz Lubasubsystems which use EM might rely on this flag to check if all EM devices use
136b56a352cSLukasz Lubathe same scale. If there are different scales, these subsystems might decide
137c5d39faeSLukasz Lubato return warning/error, stop working or panic.
1387b7570adSLukasz LubaSee Section 3. for an example of driver implementing this
139d62aab8fSLukasz Lubacallback, or Section 2.4 for further documentation on this API
140151f4e2bSMauro Carvalho Chehab
141f48a0c47SLukasz LubaRegistration of EM using DT
142f48a0c47SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143f48a0c47SLukasz Luba
144f48a0c47SLukasz LubaThe  EM can also be registered using OPP framework and information in DT
145f48a0c47SLukasz Luba"operating-points-v2". Each OPP entry in DT can be extended with a property
146f48a0c47SLukasz Luba"opp-microwatt" containing micro-Watts power value. This OPP DT property
147f48a0c47SLukasz Lubaallows a platform to register EM power values which are reflecting total power
148f48a0c47SLukasz Luba(static + dynamic). These power values might be coming directly from
149f48a0c47SLukasz Lubaexperiments and measurements.
150f48a0c47SLukasz Luba
151015f569cSLukasz LubaRegistration of 'artificial' EM
152015f569cSLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153015f569cSLukasz Luba
154015f569cSLukasz LubaThere is an option to provide a custom callback for drivers missing detailed
155015f569cSLukasz Lubaknowledge about power value for each performance state. The callback
156015f569cSLukasz Luba.get_cost() is optional and provides the 'cost' values used by the EAS.
157015f569cSLukasz LubaThis is useful for platforms that only provide information on relative
158015f569cSLukasz Lubaefficiency between CPU types, where one could use the information to
159015f569cSLukasz Lubacreate an abstract power model. But even an abstract power model can
160015f569cSLukasz Lubasometimes be hard to fit in, given the input power value size restrictions.
161015f569cSLukasz LubaThe .get_cost() allows to provide the 'cost' values which reflect the
162015f569cSLukasz Lubaefficiency of the CPUs. This would allow to provide EAS information which
163015f569cSLukasz Lubahas different relation than what would be forced by the EM internal
164015f569cSLukasz Lubaformulas calculating 'cost' values. To register an EM for such platform, the
165c5d39faeSLukasz Lubadriver must set the flag 'microwatts' to 0, provide .get_power() callback
166015f569cSLukasz Lubaand provide .get_cost() callback. The EM framework would handle such platform
167015f569cSLukasz Lubaproperly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
168015f569cSLukasz Lubaplatform. Special care should be taken by other frameworks which are using EM
169015f569cSLukasz Lubato test and treat this flag properly.
170015f569cSLukasz Luba
17108374410SLukasz LubaRegistration of 'simple' EM
17208374410SLukasz Luba~~~~~~~~~~~~~~~~~~~~~~~~~~~
17308374410SLukasz Luba
17408374410SLukasz LubaThe 'simple' EM is registered using the framework helper function
17508374410SLukasz Lubacpufreq_register_em_with_opp(). It implements a power model which is tight to
17608374410SLukasz Lubamath formula::
17708374410SLukasz Luba
17808374410SLukasz Luba	Power = C * V^2 * f
17908374410SLukasz Luba
18008374410SLukasz LubaThe EM which is registered using this method might not reflect correctly the
18108374410SLukasz Lubaphysics of a real device, e.g. when static power (leakage) is important.
18208374410SLukasz Luba
183151f4e2bSMauro Carvalho Chehab
184151f4e2bSMauro Carvalho Chehab2.3 Accessing performance domains
185151f4e2bSMauro Carvalho Chehab^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
186151f4e2bSMauro Carvalho Chehab
1877b7570adSLukasz LubaThere are two API functions which provide the access to the energy model:
1887b7570adSLukasz Lubaem_cpu_get() which takes CPU id as an argument and em_pd_get() with device
1897b7570adSLukasz Lubapointer as an argument. It depends on the subsystem which interface it is
1907b7570adSLukasz Lubagoing to use, but in case of CPU devices both functions return the same
1917b7570adSLukasz Lubaperformance domain.
1927b7570adSLukasz Luba
193151f4e2bSMauro Carvalho ChehabSubsystems interested in the energy model of a CPU can retrieve it using the
194151f4e2bSMauro Carvalho Chehabem_cpu_get() API. The energy model tables are allocated once upon creation of
195151f4e2bSMauro Carvalho Chehabthe performance domains, and kept in memory untouched.
196151f4e2bSMauro Carvalho Chehab
197151f4e2bSMauro Carvalho ChehabThe energy consumed by a performance domain can be estimated using the
1987b7570adSLukasz Lubaem_cpu_energy() API. The estimation is performed assuming that the schedutil
1997b7570adSLukasz LubaCPUfreq governor is in use in case of CPU device. Currently this calculation is
2007b7570adSLukasz Lubanot provided for other type of devices.
201151f4e2bSMauro Carvalho Chehab
202d62aab8fSLukasz LubaMore details about the above APIs can be found in ``<linux/energy_model.h>``
203*eb1ad4d4SLukasz Lubaor in Section 2.5
204d62aab8fSLukasz Luba
205d62aab8fSLukasz Luba
206*eb1ad4d4SLukasz Luba2.4 Runtime modifications
207*eb1ad4d4SLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^
208*eb1ad4d4SLukasz Luba
209*eb1ad4d4SLukasz LubaDrivers willing to update the EM at runtime should use the following dedicated
210*eb1ad4d4SLukasz Lubafunction to allocate a new instance of the modified EM. The API is listed
211*eb1ad4d4SLukasz Lubabelow::
212*eb1ad4d4SLukasz Luba
213*eb1ad4d4SLukasz Luba  struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd);
214*eb1ad4d4SLukasz Luba
215*eb1ad4d4SLukasz LubaThis allows to allocate a structure which contains the new EM table with
216*eb1ad4d4SLukasz Lubaalso RCU and kref needed by the EM framework. The 'struct em_perf_table'
217*eb1ad4d4SLukasz Lubacontains array 'struct em_perf_state state[]' which is a list of performance
218*eb1ad4d4SLukasz Lubastates in ascending order. That list must be populated by the device driver
219*eb1ad4d4SLukasz Lubawhich wants to update the EM. The list of frequencies can be taken from
220*eb1ad4d4SLukasz Lubaexisting EM (created during boot). The content in the 'struct em_perf_state'
221*eb1ad4d4SLukasz Lubamust be populated by the driver as well.
222*eb1ad4d4SLukasz Luba
223*eb1ad4d4SLukasz LubaThis is the API which does the EM update, using RCU pointers swap::
224*eb1ad4d4SLukasz Luba
225*eb1ad4d4SLukasz Luba  int em_dev_update_perf_domain(struct device *dev,
226*eb1ad4d4SLukasz Luba			struct em_perf_table __rcu *new_table);
227*eb1ad4d4SLukasz Luba
228*eb1ad4d4SLukasz LubaDrivers must provide a pointer to the allocated and initialized new EM
229*eb1ad4d4SLukasz Luba'struct em_perf_table'. That new EM will be safely used inside the EM framework
230*eb1ad4d4SLukasz Lubaand will be visible to other sub-systems in the kernel (thermal, powercap).
231*eb1ad4d4SLukasz LubaThe main design goal for this API is to be fast and avoid extra calculations
232*eb1ad4d4SLukasz Lubaor memory allocations at runtime. When pre-computed EMs are available in the
233*eb1ad4d4SLukasz Lubadevice driver, than it should be possible to simply re-use them with low
234*eb1ad4d4SLukasz Lubaperformance overhead.
235*eb1ad4d4SLukasz Luba
236*eb1ad4d4SLukasz LubaIn order to free the EM, provided earlier by the driver (e.g. when the module
237*eb1ad4d4SLukasz Lubais unloaded), there is a need to call the API::
238*eb1ad4d4SLukasz Luba
239*eb1ad4d4SLukasz Luba  void em_table_free(struct em_perf_table __rcu *table);
240*eb1ad4d4SLukasz Luba
241*eb1ad4d4SLukasz LubaIt will allow the EM framework to safely remove the memory, when there is
242*eb1ad4d4SLukasz Lubano other sub-system using it, e.g. EAS.
243*eb1ad4d4SLukasz Luba
244*eb1ad4d4SLukasz LubaTo use the power values in other sub-systems (like thermal, powercap) there is
245*eb1ad4d4SLukasz Lubaa need to call API which protects the reader and provide consistency of the EM
246*eb1ad4d4SLukasz Lubatable data::
247*eb1ad4d4SLukasz Luba
248*eb1ad4d4SLukasz Luba  struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd);
249*eb1ad4d4SLukasz Luba
250*eb1ad4d4SLukasz LubaIt returns the 'struct em_perf_state' pointer which is an array of performance
251*eb1ad4d4SLukasz Lubastates in ascending order.
252*eb1ad4d4SLukasz LubaThis function must be called in the RCU read lock section (after the
253*eb1ad4d4SLukasz Lubarcu_read_lock()). When the EM table is not needed anymore there is a need to
254*eb1ad4d4SLukasz Lubacall rcu_real_unlock(). In this way the EM safely uses the RCU read section
255*eb1ad4d4SLukasz Lubaand protects the users. It also allows the EM framework to manage the memory
256*eb1ad4d4SLukasz Lubaand free it. More details how to use it can be found in Section 3.2 in the
257*eb1ad4d4SLukasz Lubaexample driver.
258*eb1ad4d4SLukasz Luba
259*eb1ad4d4SLukasz LubaThere is dedicated API for device drivers to calculate em_perf_state::cost
260*eb1ad4d4SLukasz Lubavalues::
261*eb1ad4d4SLukasz Luba
262*eb1ad4d4SLukasz Luba  int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
263*eb1ad4d4SLukasz Luba                           int nr_states);
264*eb1ad4d4SLukasz Luba
265*eb1ad4d4SLukasz LubaThese 'cost' values from EM are used in EAS. The new EM table should be passed
266*eb1ad4d4SLukasz Lubatogether with the number of entries and device pointer. When the computation
267*eb1ad4d4SLukasz Lubaof the cost values is done properly the return value from the function is 0.
268*eb1ad4d4SLukasz LubaThe function takes care for right setting of inefficiency for each performance
269*eb1ad4d4SLukasz Lubastate as well. It updates em_perf_state::flags accordingly.
270*eb1ad4d4SLukasz LubaThen such prepared new EM can be passed to the em_dev_update_perf_domain()
271*eb1ad4d4SLukasz Lubafunction, which will allow to use it.
272*eb1ad4d4SLukasz Luba
273*eb1ad4d4SLukasz LubaMore details about the above APIs can be found in ``<linux/energy_model.h>``
274*eb1ad4d4SLukasz Lubaor in Section 3.2 with an example code showing simple implementation of the
275*eb1ad4d4SLukasz Lubaupdating mechanism in a device driver.
276*eb1ad4d4SLukasz Luba
277*eb1ad4d4SLukasz Luba
278*eb1ad4d4SLukasz Luba2.5 Description details of this API
279d62aab8fSLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
280d62aab8fSLukasz Luba.. kernel-doc:: include/linux/energy_model.h
281d62aab8fSLukasz Luba   :internal:
282d62aab8fSLukasz Luba
283d62aab8fSLukasz Luba.. kernel-doc:: kernel/power/energy_model.c
284d62aab8fSLukasz Luba   :export:
285151f4e2bSMauro Carvalho Chehab
286151f4e2bSMauro Carvalho Chehab
287*eb1ad4d4SLukasz Luba3. Examples
288*eb1ad4d4SLukasz Luba-----------
289*eb1ad4d4SLukasz Luba
290*eb1ad4d4SLukasz Luba3.1 Example driver with EM registration
291*eb1ad4d4SLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
292151f4e2bSMauro Carvalho Chehab
293d704aa0dSLukasz LubaThe CPUFreq framework supports dedicated callback for registering
294d704aa0dSLukasz Lubathe EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
295d704aa0dSLukasz LubaThat callback has to be implemented properly for a given driver,
296d704aa0dSLukasz Lubabecause the framework would call it at the right time during setup.
297151f4e2bSMauro Carvalho ChehabThis section provides a simple example of a CPUFreq driver registering a
298151f4e2bSMauro Carvalho Chehabperformance domain in the Energy Model framework using the (fake) 'foo'
299151f4e2bSMauro Carvalho Chehabprotocol. The driver implements an est_power() function to be provided to the
300151f4e2bSMauro Carvalho ChehabEM framework::
301151f4e2bSMauro Carvalho Chehab
302151f4e2bSMauro Carvalho Chehab  -> drivers/cpufreq/foo_cpufreq.c
303151f4e2bSMauro Carvalho Chehab
30475a3a99aSLukasz Luba  01	static int est_power(struct device *dev, unsigned long *mW,
30575a3a99aSLukasz Luba  02			unsigned long *KHz)
3067b7570adSLukasz Luba  03	{
3077b7570adSLukasz Luba  04		long freq, power;
3087b7570adSLukasz Luba  05
3097b7570adSLukasz Luba  06		/* Use the 'foo' protocol to ceil the frequency */
3107b7570adSLukasz Luba  07		freq = foo_get_freq_ceil(dev, *KHz);
3117b7570adSLukasz Luba  08		if (freq < 0);
3127b7570adSLukasz Luba  09			return freq;
3137b7570adSLukasz Luba  10
3147b7570adSLukasz Luba  11		/* Estimate the power cost for the dev at the relevant freq. */
3157b7570adSLukasz Luba  12		power = foo_estimate_power(dev, freq);
3167b7570adSLukasz Luba  13		if (power < 0);
3177b7570adSLukasz Luba  14			return power;
3187b7570adSLukasz Luba  15
3197b7570adSLukasz Luba  16		/* Return the values to the EM framework */
3207b7570adSLukasz Luba  17		*mW = power;
3217b7570adSLukasz Luba  18		*KHz = freq;
3227b7570adSLukasz Luba  19
3237b7570adSLukasz Luba  20		return 0;
3247b7570adSLukasz Luba  21	}
3257b7570adSLukasz Luba  22
326d704aa0dSLukasz Luba  23	static void foo_cpufreq_register_em(struct cpufreq_policy *policy)
3277b7570adSLukasz Luba  24	{
3287b7570adSLukasz Luba  25		struct em_data_callback em_cb = EM_DATA_CB(est_power);
3297b7570adSLukasz Luba  26		struct device *cpu_dev;
330d704aa0dSLukasz Luba  27		int nr_opp;
3317b7570adSLukasz Luba  28
3327b7570adSLukasz Luba  29		cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
3337b7570adSLukasz Luba  30
334d704aa0dSLukasz Luba  31     	/* Find the number of OPPs for this policy */
335d704aa0dSLukasz Luba  32     	nr_opp = foo_get_nr_opp(policy);
336d704aa0dSLukasz Luba  33
337d704aa0dSLukasz Luba  34     	/* And register the new performance domain */
338d704aa0dSLukasz Luba  35     	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
339d704aa0dSLukasz Luba  36					    true);
340d704aa0dSLukasz Luba  37	}
3417b7570adSLukasz Luba  38
342d704aa0dSLukasz Luba  39	static struct cpufreq_driver foo_cpufreq_driver = {
343d704aa0dSLukasz Luba  40		.register_em = foo_cpufreq_register_em,
344d704aa0dSLukasz Luba  41	};
345*eb1ad4d4SLukasz Luba
346*eb1ad4d4SLukasz Luba
347*eb1ad4d4SLukasz Luba3.2 Example driver with EM modification
348*eb1ad4d4SLukasz Luba^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
349*eb1ad4d4SLukasz Luba
350*eb1ad4d4SLukasz LubaThis section provides a simple example of a thermal driver modifying the EM.
351*eb1ad4d4SLukasz LubaThe driver implements a foo_thermal_em_update() function. The driver is woken
352*eb1ad4d4SLukasz Lubaup periodically to check the temperature and modify the EM data::
353*eb1ad4d4SLukasz Luba
354*eb1ad4d4SLukasz Luba  -> drivers/soc/example/example_em_mod.c
355*eb1ad4d4SLukasz Luba
356*eb1ad4d4SLukasz Luba  01	static void foo_get_new_em(struct foo_context *ctx)
357*eb1ad4d4SLukasz Luba  02	{
358*eb1ad4d4SLukasz Luba  03		struct em_perf_table __rcu *em_table;
359*eb1ad4d4SLukasz Luba  04		struct em_perf_state *table, *new_table;
360*eb1ad4d4SLukasz Luba  05		struct device *dev = ctx->dev;
361*eb1ad4d4SLukasz Luba  06		struct em_perf_domain *pd;
362*eb1ad4d4SLukasz Luba  07		unsigned long freq;
363*eb1ad4d4SLukasz Luba  08		int i, ret;
364*eb1ad4d4SLukasz Luba  09
365*eb1ad4d4SLukasz Luba  10		pd = em_pd_get(dev);
366*eb1ad4d4SLukasz Luba  11		if (!pd)
367*eb1ad4d4SLukasz Luba  12			return;
368*eb1ad4d4SLukasz Luba  13
369*eb1ad4d4SLukasz Luba  14		em_table = em_table_alloc(pd);
370*eb1ad4d4SLukasz Luba  15		if (!em_table)
371*eb1ad4d4SLukasz Luba  16			return;
372*eb1ad4d4SLukasz Luba  17
373*eb1ad4d4SLukasz Luba  18		new_table = em_table->state;
374*eb1ad4d4SLukasz Luba  19
375*eb1ad4d4SLukasz Luba  20		rcu_read_lock();
376*eb1ad4d4SLukasz Luba  21		table = em_perf_state_from_pd(pd);
377*eb1ad4d4SLukasz Luba  22		for (i = 0; i < pd->nr_perf_states; i++) {
378*eb1ad4d4SLukasz Luba  23			freq = table[i].frequency;
379*eb1ad4d4SLukasz Luba  24			foo_get_power_perf_values(dev, freq, &new_table[i]);
380*eb1ad4d4SLukasz Luba  25		}
381*eb1ad4d4SLukasz Luba  26		rcu_read_unlock();
382*eb1ad4d4SLukasz Luba  27
383*eb1ad4d4SLukasz Luba  28		/* Calculate 'cost' values for EAS */
384*eb1ad4d4SLukasz Luba  29		ret = em_dev_compute_costs(dev, table, pd->nr_perf_states);
385*eb1ad4d4SLukasz Luba  30		if (ret) {
386*eb1ad4d4SLukasz Luba  31			dev_warn(dev, "EM: compute costs failed %d\n", ret);
387*eb1ad4d4SLukasz Luba  32			em_free_table(em_table);
388*eb1ad4d4SLukasz Luba  33			return;
389*eb1ad4d4SLukasz Luba  34		}
390*eb1ad4d4SLukasz Luba  35
391*eb1ad4d4SLukasz Luba  36		ret = em_dev_update_perf_domain(dev, em_table);
392*eb1ad4d4SLukasz Luba  37		if (ret) {
393*eb1ad4d4SLukasz Luba  38			dev_warn(dev, "EM: update failed %d\n", ret);
394*eb1ad4d4SLukasz Luba  39			em_free_table(em_table);
395*eb1ad4d4SLukasz Luba  40			return;
396*eb1ad4d4SLukasz Luba  41		}
397*eb1ad4d4SLukasz Luba  42
398*eb1ad4d4SLukasz Luba  43		/*
399*eb1ad4d4SLukasz Luba  44		 * Since it's one-time-update drop the usage counter.
400*eb1ad4d4SLukasz Luba  45		 * The EM framework will later free the table when needed.
401*eb1ad4d4SLukasz Luba  46		 */
402*eb1ad4d4SLukasz Luba  47		em_table_free(em_table);
403*eb1ad4d4SLukasz Luba  48	}
404*eb1ad4d4SLukasz Luba  49
405*eb1ad4d4SLukasz Luba  50	/*
406*eb1ad4d4SLukasz Luba  51	 * Function called periodically to check the temperature and
407*eb1ad4d4SLukasz Luba  52	 * update the EM if needed
408*eb1ad4d4SLukasz Luba  53	 */
409*eb1ad4d4SLukasz Luba  54	static void foo_thermal_em_update(struct foo_context *ctx)
410*eb1ad4d4SLukasz Luba  55	{
411*eb1ad4d4SLukasz Luba  56		struct device *dev = ctx->dev;
412*eb1ad4d4SLukasz Luba  57		int cpu;
413*eb1ad4d4SLukasz Luba  58
414*eb1ad4d4SLukasz Luba  59		ctx->temperature = foo_get_temp(dev, ctx);
415*eb1ad4d4SLukasz Luba  60		if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD)
416*eb1ad4d4SLukasz Luba  61			return;
417*eb1ad4d4SLukasz Luba  62
418*eb1ad4d4SLukasz Luba  63		foo_get_new_em(ctx);
419*eb1ad4d4SLukasz Luba  64	}
420