1*7168ae33SJames Morse.. SPDX-License-Identifier: GPL-2.0 2*7168ae33SJames Morse.. include:: <isonum.txt> 3*7168ae33SJames Morse 4*7168ae33SJames Morse===================================================== 5*7168ae33SJames MorseUser Interface for Resource Control feature (resctrl) 6*7168ae33SJames Morse===================================================== 7*7168ae33SJames Morse 8*7168ae33SJames Morse:Copyright: |copy| 2016 Intel Corporation 9*7168ae33SJames Morse:Authors: - Fenghua Yu <fenghua.yu@intel.com> 10*7168ae33SJames Morse - Tony Luck <tony.luck@intel.com> 11*7168ae33SJames Morse - Vikas Shivappa <vikas.shivappa@intel.com> 12*7168ae33SJames Morse 13*7168ae33SJames Morse 14*7168ae33SJames MorseIntel refers to this feature as Intel Resource Director Technology(Intel(R) RDT). 15*7168ae33SJames MorseAMD refers to this feature as AMD Platform Quality of Service(AMD QoS). 16*7168ae33SJames Morse 17*7168ae33SJames MorseThis feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo 18*7168ae33SJames Morseflag bits: 19*7168ae33SJames Morse 20*7168ae33SJames Morse=============================================== ================================ 21*7168ae33SJames MorseRDT (Resource Director Technology) Allocation "rdt_a" 22*7168ae33SJames MorseCAT (Cache Allocation Technology) "cat_l3", "cat_l2" 23*7168ae33SJames MorseCDP (Code and Data Prioritization) "cdp_l3", "cdp_l2" 24*7168ae33SJames MorseCQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc" 25*7168ae33SJames MorseMBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local" 26*7168ae33SJames MorseMBA (Memory Bandwidth Allocation) "mba" 27*7168ae33SJames MorseSMBA (Slow Memory Bandwidth Allocation) "" 28*7168ae33SJames MorseBMEC (Bandwidth Monitoring Event Configuration) "" 29*7168ae33SJames Morse=============================================== ================================ 30*7168ae33SJames Morse 31*7168ae33SJames MorseHistorically, new features were made visible by default in /proc/cpuinfo. This 32*7168ae33SJames Morseresulted in the feature flags becoming hard to parse by humans. Adding a new 33*7168ae33SJames Morseflag to /proc/cpuinfo should be avoided if user space can obtain information 34*7168ae33SJames Morseabout the feature from resctrl's info directory. 35*7168ae33SJames Morse 36*7168ae33SJames MorseTo use the feature mount the file system:: 37*7168ae33SJames Morse 38*7168ae33SJames Morse # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps][,debug]] /sys/fs/resctrl 39*7168ae33SJames Morse 40*7168ae33SJames Morsemount options are: 41*7168ae33SJames Morse 42*7168ae33SJames Morse"cdp": 43*7168ae33SJames Morse Enable code/data prioritization in L3 cache allocations. 44*7168ae33SJames Morse"cdpl2": 45*7168ae33SJames Morse Enable code/data prioritization in L2 cache allocations. 46*7168ae33SJames Morse"mba_MBps": 47*7168ae33SJames Morse Enable the MBA Software Controller(mba_sc) to specify MBA 48*7168ae33SJames Morse bandwidth in MiBps 49*7168ae33SJames Morse"debug": 50*7168ae33SJames Morse Make debug files accessible. Available debug files are annotated with 51*7168ae33SJames Morse "Available only with debug option". 52*7168ae33SJames Morse 53*7168ae33SJames MorseL2 and L3 CDP are controlled separately. 54*7168ae33SJames Morse 55*7168ae33SJames MorseRDT features are orthogonal. A particular system may support only 56*7168ae33SJames Morsemonitoring, only control, or both monitoring and control. Cache 57*7168ae33SJames Morsepseudo-locking is a unique way of using cache control to "pin" or 58*7168ae33SJames Morse"lock" data in the cache. Details can be found in 59*7168ae33SJames Morse"Cache Pseudo-Locking". 60*7168ae33SJames Morse 61*7168ae33SJames Morse 62*7168ae33SJames MorseThe mount succeeds if either of allocation or monitoring is present, but 63*7168ae33SJames Morseonly those files and directories supported by the system will be created. 64*7168ae33SJames MorseFor more details on the behavior of the interface during monitoring 65*7168ae33SJames Morseand allocation, see the "Resource alloc and monitor groups" section. 66*7168ae33SJames Morse 67*7168ae33SJames MorseInfo directory 68*7168ae33SJames Morse============== 69*7168ae33SJames Morse 70*7168ae33SJames MorseThe 'info' directory contains information about the enabled 71*7168ae33SJames Morseresources. Each resource has its own subdirectory. The subdirectory 72*7168ae33SJames Morsenames reflect the resource names. 73*7168ae33SJames Morse 74*7168ae33SJames MorseEach subdirectory contains the following files with respect to 75*7168ae33SJames Morseallocation: 76*7168ae33SJames Morse 77*7168ae33SJames MorseCache resource(L3/L2) subdirectory contains the following files 78*7168ae33SJames Morserelated to allocation: 79*7168ae33SJames Morse 80*7168ae33SJames Morse"num_closids": 81*7168ae33SJames Morse The number of CLOSIDs which are valid for this 82*7168ae33SJames Morse resource. The kernel uses the smallest number of 83*7168ae33SJames Morse CLOSIDs of all enabled resources as limit. 84*7168ae33SJames Morse"cbm_mask": 85*7168ae33SJames Morse The bitmask which is valid for this resource. 86*7168ae33SJames Morse This mask is equivalent to 100%. 87*7168ae33SJames Morse"min_cbm_bits": 88*7168ae33SJames Morse The minimum number of consecutive bits which 89*7168ae33SJames Morse must be set when writing a mask. 90*7168ae33SJames Morse 91*7168ae33SJames Morse"shareable_bits": 92*7168ae33SJames Morse Bitmask of shareable resource with other executing 93*7168ae33SJames Morse entities (e.g. I/O). User can use this when 94*7168ae33SJames Morse setting up exclusive cache partitions. Note that 95*7168ae33SJames Morse some platforms support devices that have their 96*7168ae33SJames Morse own settings for cache use which can over-ride 97*7168ae33SJames Morse these bits. 98*7168ae33SJames Morse"bit_usage": 99*7168ae33SJames Morse Annotated capacity bitmasks showing how all 100*7168ae33SJames Morse instances of the resource are used. The legend is: 101*7168ae33SJames Morse 102*7168ae33SJames Morse "0": 103*7168ae33SJames Morse Corresponding region is unused. When the system's 104*7168ae33SJames Morse resources have been allocated and a "0" is found 105*7168ae33SJames Morse in "bit_usage" it is a sign that resources are 106*7168ae33SJames Morse wasted. 107*7168ae33SJames Morse 108*7168ae33SJames Morse "H": 109*7168ae33SJames Morse Corresponding region is used by hardware only 110*7168ae33SJames Morse but available for software use. If a resource 111*7168ae33SJames Morse has bits set in "shareable_bits" but not all 112*7168ae33SJames Morse of these bits appear in the resource groups' 113*7168ae33SJames Morse schematas then the bits appearing in 114*7168ae33SJames Morse "shareable_bits" but no resource group will 115*7168ae33SJames Morse be marked as "H". 116*7168ae33SJames Morse "X": 117*7168ae33SJames Morse Corresponding region is available for sharing and 118*7168ae33SJames Morse used by hardware and software. These are the 119*7168ae33SJames Morse bits that appear in "shareable_bits" as 120*7168ae33SJames Morse well as a resource group's allocation. 121*7168ae33SJames Morse "S": 122*7168ae33SJames Morse Corresponding region is used by software 123*7168ae33SJames Morse and available for sharing. 124*7168ae33SJames Morse "E": 125*7168ae33SJames Morse Corresponding region is used exclusively by 126*7168ae33SJames Morse one resource group. No sharing allowed. 127*7168ae33SJames Morse "P": 128*7168ae33SJames Morse Corresponding region is pseudo-locked. No 129*7168ae33SJames Morse sharing allowed. 130*7168ae33SJames Morse"sparse_masks": 131*7168ae33SJames Morse Indicates if non-contiguous 1s value in CBM is supported. 132*7168ae33SJames Morse 133*7168ae33SJames Morse "0": 134*7168ae33SJames Morse Only contiguous 1s value in CBM is supported. 135*7168ae33SJames Morse "1": 136*7168ae33SJames Morse Non-contiguous 1s value in CBM is supported. 137*7168ae33SJames Morse 138*7168ae33SJames MorseMemory bandwidth(MB) subdirectory contains the following files 139*7168ae33SJames Morsewith respect to allocation: 140*7168ae33SJames Morse 141*7168ae33SJames Morse"min_bandwidth": 142*7168ae33SJames Morse The minimum memory bandwidth percentage which 143*7168ae33SJames Morse user can request. 144*7168ae33SJames Morse 145*7168ae33SJames Morse"bandwidth_gran": 146*7168ae33SJames Morse The granularity in which the memory bandwidth 147*7168ae33SJames Morse percentage is allocated. The allocated 148*7168ae33SJames Morse b/w percentage is rounded off to the next 149*7168ae33SJames Morse control step available on the hardware. The 150*7168ae33SJames Morse available bandwidth control steps are: 151*7168ae33SJames Morse min_bandwidth + N * bandwidth_gran. 152*7168ae33SJames Morse 153*7168ae33SJames Morse"delay_linear": 154*7168ae33SJames Morse Indicates if the delay scale is linear or 155*7168ae33SJames Morse non-linear. This field is purely informational 156*7168ae33SJames Morse only. 157*7168ae33SJames Morse 158*7168ae33SJames Morse"thread_throttle_mode": 159*7168ae33SJames Morse Indicator on Intel systems of how tasks running on threads 160*7168ae33SJames Morse of a physical core are throttled in cases where they 161*7168ae33SJames Morse request different memory bandwidth percentages: 162*7168ae33SJames Morse 163*7168ae33SJames Morse "max": 164*7168ae33SJames Morse the smallest percentage is applied 165*7168ae33SJames Morse to all threads 166*7168ae33SJames Morse "per-thread": 167*7168ae33SJames Morse bandwidth percentages are directly applied to 168*7168ae33SJames Morse the threads running on the core 169*7168ae33SJames Morse 170*7168ae33SJames MorseIf RDT monitoring is available there will be an "L3_MON" directory 171*7168ae33SJames Morsewith the following files: 172*7168ae33SJames Morse 173*7168ae33SJames Morse"num_rmids": 174*7168ae33SJames Morse The number of RMIDs available. This is the 175*7168ae33SJames Morse upper bound for how many "CTRL_MON" + "MON" 176*7168ae33SJames Morse groups can be created. 177*7168ae33SJames Morse 178*7168ae33SJames Morse"mon_features": 179*7168ae33SJames Morse Lists the monitoring events if 180*7168ae33SJames Morse monitoring is enabled for the resource. 181*7168ae33SJames Morse Example:: 182*7168ae33SJames Morse 183*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mon_features 184*7168ae33SJames Morse llc_occupancy 185*7168ae33SJames Morse mbm_total_bytes 186*7168ae33SJames Morse mbm_local_bytes 187*7168ae33SJames Morse 188*7168ae33SJames Morse If the system supports Bandwidth Monitoring Event 189*7168ae33SJames Morse Configuration (BMEC), then the bandwidth events will 190*7168ae33SJames Morse be configurable. The output will be:: 191*7168ae33SJames Morse 192*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mon_features 193*7168ae33SJames Morse llc_occupancy 194*7168ae33SJames Morse mbm_total_bytes 195*7168ae33SJames Morse mbm_total_bytes_config 196*7168ae33SJames Morse mbm_local_bytes 197*7168ae33SJames Morse mbm_local_bytes_config 198*7168ae33SJames Morse 199*7168ae33SJames Morse"mbm_total_bytes_config", "mbm_local_bytes_config": 200*7168ae33SJames Morse Read/write files containing the configuration for the mbm_total_bytes 201*7168ae33SJames Morse and mbm_local_bytes events, respectively, when the Bandwidth 202*7168ae33SJames Morse Monitoring Event Configuration (BMEC) feature is supported. 203*7168ae33SJames Morse The event configuration settings are domain specific and affect 204*7168ae33SJames Morse all the CPUs in the domain. When either event configuration is 205*7168ae33SJames Morse changed, the bandwidth counters for all RMIDs of both events 206*7168ae33SJames Morse (mbm_total_bytes as well as mbm_local_bytes) are cleared for that 207*7168ae33SJames Morse domain. The next read for every RMID will report "Unavailable" 208*7168ae33SJames Morse and subsequent reads will report the valid value. 209*7168ae33SJames Morse 210*7168ae33SJames Morse Following are the types of events supported: 211*7168ae33SJames Morse 212*7168ae33SJames Morse ==== ======================================================== 213*7168ae33SJames Morse Bits Description 214*7168ae33SJames Morse ==== ======================================================== 215*7168ae33SJames Morse 6 Dirty Victims from the QOS domain to all types of memory 216*7168ae33SJames Morse 5 Reads to slow memory in the non-local NUMA domain 217*7168ae33SJames Morse 4 Reads to slow memory in the local NUMA domain 218*7168ae33SJames Morse 3 Non-temporal writes to non-local NUMA domain 219*7168ae33SJames Morse 2 Non-temporal writes to local NUMA domain 220*7168ae33SJames Morse 1 Reads to memory in the non-local NUMA domain 221*7168ae33SJames Morse 0 Reads to memory in the local NUMA domain 222*7168ae33SJames Morse ==== ======================================================== 223*7168ae33SJames Morse 224*7168ae33SJames Morse By default, the mbm_total_bytes configuration is set to 0x7f to count 225*7168ae33SJames Morse all the event types and the mbm_local_bytes configuration is set to 226*7168ae33SJames Morse 0x15 to count all the local memory events. 227*7168ae33SJames Morse 228*7168ae33SJames Morse Examples: 229*7168ae33SJames Morse 230*7168ae33SJames Morse * To view the current configuration:: 231*7168ae33SJames Morse :: 232*7168ae33SJames Morse 233*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 234*7168ae33SJames Morse 0=0x7f;1=0x7f;2=0x7f;3=0x7f 235*7168ae33SJames Morse 236*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 237*7168ae33SJames Morse 0=0x15;1=0x15;3=0x15;4=0x15 238*7168ae33SJames Morse 239*7168ae33SJames Morse * To change the mbm_total_bytes to count only reads on domain 0, 240*7168ae33SJames Morse the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary 241*7168ae33SJames Morse (in hexadecimal 0x33): 242*7168ae33SJames Morse :: 243*7168ae33SJames Morse 244*7168ae33SJames Morse # echo "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 245*7168ae33SJames Morse 246*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 247*7168ae33SJames Morse 0=0x33;1=0x7f;2=0x7f;3=0x7f 248*7168ae33SJames Morse 249*7168ae33SJames Morse * To change the mbm_local_bytes to count all the slow memory reads on 250*7168ae33SJames Morse domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b 251*7168ae33SJames Morse in binary (in hexadecimal 0x30): 252*7168ae33SJames Morse :: 253*7168ae33SJames Morse 254*7168ae33SJames Morse # echo "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 255*7168ae33SJames Morse 256*7168ae33SJames Morse # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 257*7168ae33SJames Morse 0=0x30;1=0x30;3=0x15;4=0x15 258*7168ae33SJames Morse 259*7168ae33SJames Morse"max_threshold_occupancy": 260*7168ae33SJames Morse Read/write file provides the largest value (in 261*7168ae33SJames Morse bytes) at which a previously used LLC_occupancy 262*7168ae33SJames Morse counter can be considered for re-use. 263*7168ae33SJames Morse 264*7168ae33SJames MorseFinally, in the top level of the "info" directory there is a file 265*7168ae33SJames Morsenamed "last_cmd_status". This is reset with every "command" issued 266*7168ae33SJames Morsevia the file system (making new directories or writing to any of the 267*7168ae33SJames Morsecontrol files). If the command was successful, it will read as "ok". 268*7168ae33SJames MorseIf the command failed, it will provide more information that can be 269*7168ae33SJames Morseconveyed in the error returns from file operations. E.g. 270*7168ae33SJames Morse:: 271*7168ae33SJames Morse 272*7168ae33SJames Morse # echo L3:0=f7 > schemata 273*7168ae33SJames Morse bash: echo: write error: Invalid argument 274*7168ae33SJames Morse # cat info/last_cmd_status 275*7168ae33SJames Morse mask f7 has non-consecutive 1-bits 276*7168ae33SJames Morse 277*7168ae33SJames MorseResource alloc and monitor groups 278*7168ae33SJames Morse================================= 279*7168ae33SJames Morse 280*7168ae33SJames MorseResource groups are represented as directories in the resctrl file 281*7168ae33SJames Morsesystem. The default group is the root directory which, immediately 282*7168ae33SJames Morseafter mounting, owns all the tasks and cpus in the system and can make 283*7168ae33SJames Morsefull use of all resources. 284*7168ae33SJames Morse 285*7168ae33SJames MorseOn a system with RDT control features additional directories can be 286*7168ae33SJames Morsecreated in the root directory that specify different amounts of each 287*7168ae33SJames Morseresource (see "schemata" below). The root and these additional top level 288*7168ae33SJames Morsedirectories are referred to as "CTRL_MON" groups below. 289*7168ae33SJames Morse 290*7168ae33SJames MorseOn a system with RDT monitoring the root directory and other top level 291*7168ae33SJames Morsedirectories contain a directory named "mon_groups" in which additional 292*7168ae33SJames Morsedirectories can be created to monitor subsets of tasks in the CTRL_MON 293*7168ae33SJames Morsegroup that is their ancestor. These are called "MON" groups in the rest 294*7168ae33SJames Morseof this document. 295*7168ae33SJames Morse 296*7168ae33SJames MorseRemoving a directory will move all tasks and cpus owned by the group it 297*7168ae33SJames Morserepresents to the parent. Removing one of the created CTRL_MON groups 298*7168ae33SJames Morsewill automatically remove all MON groups below it. 299*7168ae33SJames Morse 300*7168ae33SJames MorseMoving MON group directories to a new parent CTRL_MON group is supported 301*7168ae33SJames Morsefor the purpose of changing the resource allocations of a MON group 302*7168ae33SJames Morsewithout impacting its monitoring data or assigned tasks. This operation 303*7168ae33SJames Morseis not allowed for MON groups which monitor CPUs. No other move 304*7168ae33SJames Morseoperation is currently allowed other than simply renaming a CTRL_MON or 305*7168ae33SJames MorseMON group. 306*7168ae33SJames Morse 307*7168ae33SJames MorseAll groups contain the following files: 308*7168ae33SJames Morse 309*7168ae33SJames Morse"tasks": 310*7168ae33SJames Morse Reading this file shows the list of all tasks that belong to 311*7168ae33SJames Morse this group. Writing a task id to the file will add a task to the 312*7168ae33SJames Morse group. Multiple tasks can be added by separating the task ids 313*7168ae33SJames Morse with commas. Tasks will be assigned sequentially. Multiple 314*7168ae33SJames Morse failures are not supported. A single failure encountered while 315*7168ae33SJames Morse attempting to assign a task will cause the operation to abort and 316*7168ae33SJames Morse already added tasks before the failure will remain in the group. 317*7168ae33SJames Morse Failures will be logged to /sys/fs/resctrl/info/last_cmd_status. 318*7168ae33SJames Morse 319*7168ae33SJames Morse If the group is a CTRL_MON group the task is removed from 320*7168ae33SJames Morse whichever previous CTRL_MON group owned the task and also from 321*7168ae33SJames Morse any MON group that owned the task. If the group is a MON group, 322*7168ae33SJames Morse then the task must already belong to the CTRL_MON parent of this 323*7168ae33SJames Morse group. The task is removed from any previous MON group. 324*7168ae33SJames Morse 325*7168ae33SJames Morse 326*7168ae33SJames Morse"cpus": 327*7168ae33SJames Morse Reading this file shows a bitmask of the logical CPUs owned by 328*7168ae33SJames Morse this group. Writing a mask to this file will add and remove 329*7168ae33SJames Morse CPUs to/from this group. As with the tasks file a hierarchy is 330*7168ae33SJames Morse maintained where MON groups may only include CPUs owned by the 331*7168ae33SJames Morse parent CTRL_MON group. 332*7168ae33SJames Morse When the resource group is in pseudo-locked mode this file will 333*7168ae33SJames Morse only be readable, reflecting the CPUs associated with the 334*7168ae33SJames Morse pseudo-locked region. 335*7168ae33SJames Morse 336*7168ae33SJames Morse 337*7168ae33SJames Morse"cpus_list": 338*7168ae33SJames Morse Just like "cpus", only using ranges of CPUs instead of bitmasks. 339*7168ae33SJames Morse 340*7168ae33SJames Morse 341*7168ae33SJames MorseWhen control is enabled all CTRL_MON groups will also contain: 342*7168ae33SJames Morse 343*7168ae33SJames Morse"schemata": 344*7168ae33SJames Morse A list of all the resources available to this group. 345*7168ae33SJames Morse Each resource has its own line and format - see below for details. 346*7168ae33SJames Morse 347*7168ae33SJames Morse"size": 348*7168ae33SJames Morse Mirrors the display of the "schemata" file to display the size in 349*7168ae33SJames Morse bytes of each allocation instead of the bits representing the 350*7168ae33SJames Morse allocation. 351*7168ae33SJames Morse 352*7168ae33SJames Morse"mode": 353*7168ae33SJames Morse The "mode" of the resource group dictates the sharing of its 354*7168ae33SJames Morse allocations. A "shareable" resource group allows sharing of its 355*7168ae33SJames Morse allocations while an "exclusive" resource group does not. A 356*7168ae33SJames Morse cache pseudo-locked region is created by first writing 357*7168ae33SJames Morse "pseudo-locksetup" to the "mode" file before writing the cache 358*7168ae33SJames Morse pseudo-locked region's schemata to the resource group's "schemata" 359*7168ae33SJames Morse file. On successful pseudo-locked region creation the mode will 360*7168ae33SJames Morse automatically change to "pseudo-locked". 361*7168ae33SJames Morse 362*7168ae33SJames Morse"ctrl_hw_id": 363*7168ae33SJames Morse Available only with debug option. The identifier used by hardware 364*7168ae33SJames Morse for the control group. On x86 this is the CLOSID. 365*7168ae33SJames Morse 366*7168ae33SJames MorseWhen monitoring is enabled all MON groups will also contain: 367*7168ae33SJames Morse 368*7168ae33SJames Morse"mon_data": 369*7168ae33SJames Morse This contains a set of files organized by L3 domain and by 370*7168ae33SJames Morse RDT event. E.g. on a system with two L3 domains there will 371*7168ae33SJames Morse be subdirectories "mon_L3_00" and "mon_L3_01". Each of these 372*7168ae33SJames Morse directories have one file per event (e.g. "llc_occupancy", 373*7168ae33SJames Morse "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these 374*7168ae33SJames Morse files provide a read out of the current value of the event for 375*7168ae33SJames Morse all tasks in the group. In CTRL_MON groups these files provide 376*7168ae33SJames Morse the sum for all tasks in the CTRL_MON group and all tasks in 377*7168ae33SJames Morse MON groups. Please see example section for more details on usage. 378*7168ae33SJames Morse On systems with Sub-NUMA Cluster (SNC) enabled there are extra 379*7168ae33SJames Morse directories for each node (located within the "mon_L3_XX" directory 380*7168ae33SJames Morse for the L3 cache they occupy). These are named "mon_sub_L3_YY" 381*7168ae33SJames Morse where "YY" is the node number. 382*7168ae33SJames Morse 383*7168ae33SJames Morse"mon_hw_id": 384*7168ae33SJames Morse Available only with debug option. The identifier used by hardware 385*7168ae33SJames Morse for the monitor group. On x86 this is the RMID. 386*7168ae33SJames Morse 387*7168ae33SJames MorseWhen the "mba_MBps" mount option is used all CTRL_MON groups will also contain: 388*7168ae33SJames Morse 389*7168ae33SJames Morse"mba_MBps_event": 390*7168ae33SJames Morse Reading this file shows which memory bandwidth event is used 391*7168ae33SJames Morse as input to the software feedback loop that keeps memory bandwidth 392*7168ae33SJames Morse below the value specified in the schemata file. Writing the 393*7168ae33SJames Morse name of one of the supported memory bandwidth events found in 394*7168ae33SJames Morse /sys/fs/resctrl/info/L3_MON/mon_features changes the input 395*7168ae33SJames Morse event. 396*7168ae33SJames Morse 397*7168ae33SJames MorseResource allocation rules 398*7168ae33SJames Morse------------------------- 399*7168ae33SJames Morse 400*7168ae33SJames MorseWhen a task is running the following rules define which resources are 401*7168ae33SJames Morseavailable to it: 402*7168ae33SJames Morse 403*7168ae33SJames Morse1) If the task is a member of a non-default group, then the schemata 404*7168ae33SJames Morse for that group is used. 405*7168ae33SJames Morse 406*7168ae33SJames Morse2) Else if the task belongs to the default group, but is running on a 407*7168ae33SJames Morse CPU that is assigned to some specific group, then the schemata for the 408*7168ae33SJames Morse CPU's group is used. 409*7168ae33SJames Morse 410*7168ae33SJames Morse3) Otherwise the schemata for the default group is used. 411*7168ae33SJames Morse 412*7168ae33SJames MorseResource monitoring rules 413*7168ae33SJames Morse------------------------- 414*7168ae33SJames Morse1) If a task is a member of a MON group, or non-default CTRL_MON group 415*7168ae33SJames Morse then RDT events for the task will be reported in that group. 416*7168ae33SJames Morse 417*7168ae33SJames Morse2) If a task is a member of the default CTRL_MON group, but is running 418*7168ae33SJames Morse on a CPU that is assigned to some specific group, then the RDT events 419*7168ae33SJames Morse for the task will be reported in that group. 420*7168ae33SJames Morse 421*7168ae33SJames Morse3) Otherwise RDT events for the task will be reported in the root level 422*7168ae33SJames Morse "mon_data" group. 423*7168ae33SJames Morse 424*7168ae33SJames Morse 425*7168ae33SJames MorseNotes on cache occupancy monitoring and control 426*7168ae33SJames Morse=============================================== 427*7168ae33SJames MorseWhen moving a task from one group to another you should remember that 428*7168ae33SJames Morsethis only affects *new* cache allocations by the task. E.g. you may have 429*7168ae33SJames Morsea task in a monitor group showing 3 MB of cache occupancy. If you move 430*7168ae33SJames Morseto a new group and immediately check the occupancy of the old and new 431*7168ae33SJames Morsegroups you will likely see that the old group is still showing 3 MB and 432*7168ae33SJames Morsethe new group zero. When the task accesses locations still in cache from 433*7168ae33SJames Morsebefore the move, the h/w does not update any counters. On a busy system 434*7168ae33SJames Morseyou will likely see the occupancy in the old group go down as cache lines 435*7168ae33SJames Morseare evicted and re-used while the occupancy in the new group rises as 436*7168ae33SJames Morsethe task accesses memory and loads into the cache are counted based on 437*7168ae33SJames Morsemembership in the new group. 438*7168ae33SJames Morse 439*7168ae33SJames MorseThe same applies to cache allocation control. Moving a task to a group 440*7168ae33SJames Morsewith a smaller cache partition will not evict any cache lines. The 441*7168ae33SJames Morseprocess may continue to use them from the old partition. 442*7168ae33SJames Morse 443*7168ae33SJames MorseHardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID) 444*7168ae33SJames Morseto identify a control group and a monitoring group respectively. Each of 445*7168ae33SJames Morsethe resource groups are mapped to these IDs based on the kind of group. The 446*7168ae33SJames Morsenumber of CLOSid and RMID are limited by the hardware and hence the creation of 447*7168ae33SJames Morsea "CTRL_MON" directory may fail if we run out of either CLOSID or RMID 448*7168ae33SJames Morseand creation of "MON" group may fail if we run out of RMIDs. 449*7168ae33SJames Morse 450*7168ae33SJames Morsemax_threshold_occupancy - generic concepts 451*7168ae33SJames Morse------------------------------------------ 452*7168ae33SJames Morse 453*7168ae33SJames MorseNote that an RMID once freed may not be immediately available for use as 454*7168ae33SJames Morsethe RMID is still tagged the cache lines of the previous user of RMID. 455*7168ae33SJames MorseHence such RMIDs are placed on limbo list and checked back if the cache 456*7168ae33SJames Morseoccupancy has gone down. If there is a time when system has a lot of 457*7168ae33SJames Morselimbo RMIDs but which are not ready to be used, user may see an -EBUSY 458*7168ae33SJames Morseduring mkdir. 459*7168ae33SJames Morse 460*7168ae33SJames Morsemax_threshold_occupancy is a user configurable value to determine the 461*7168ae33SJames Morseoccupancy at which an RMID can be freed. 462*7168ae33SJames Morse 463*7168ae33SJames MorseThe mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes 464*7168ae33SJames Morsefor a subset of RMID that are not immediately available for allocation. 465*7168ae33SJames MorseThis can't be relied on to produce output every second, it may be necessary 466*7168ae33SJames Morseto attempt to create an empty monitor group to force an update. Output may 467*7168ae33SJames Morseonly be produced if creation of a control or monitor group fails. 468*7168ae33SJames Morse 469*7168ae33SJames MorseSchemata files - general concepts 470*7168ae33SJames Morse--------------------------------- 471*7168ae33SJames MorseEach line in the file describes one resource. The line starts with 472*7168ae33SJames Morsethe name of the resource, followed by specific values to be applied 473*7168ae33SJames Morsein each of the instances of that resource on the system. 474*7168ae33SJames Morse 475*7168ae33SJames MorseCache IDs 476*7168ae33SJames Morse--------- 477*7168ae33SJames MorseOn current generation systems there is one L3 cache per socket and L2 478*7168ae33SJames Morsecaches are generally just shared by the hyperthreads on a core, but this 479*7168ae33SJames Morseisn't an architectural requirement. We could have multiple separate L3 480*7168ae33SJames Morsecaches on a socket, multiple cores could share an L2 cache. So instead 481*7168ae33SJames Morseof using "socket" or "core" to define the set of logical cpus sharing 482*7168ae33SJames Morsea resource we use a "Cache ID". At a given cache level this will be a 483*7168ae33SJames Morseunique number across the whole system (but it isn't guaranteed to be a 484*7168ae33SJames Morsecontiguous sequence, there may be gaps). To find the ID for each logical 485*7168ae33SJames MorseCPU look in /sys/devices/system/cpu/cpu*/cache/index*/id 486*7168ae33SJames Morse 487*7168ae33SJames MorseCache Bit Masks (CBM) 488*7168ae33SJames Morse--------------------- 489*7168ae33SJames MorseFor cache resources we describe the portion of the cache that is available 490*7168ae33SJames Morsefor allocation using a bitmask. The maximum value of the mask is defined 491*7168ae33SJames Morseby each cpu model (and may be different for different cache levels). It 492*7168ae33SJames Morseis found using CPUID, but is also provided in the "info" directory of 493*7168ae33SJames Morsethe resctrl file system in "info/{resource}/cbm_mask". Some Intel hardware 494*7168ae33SJames Morserequires that these masks have all the '1' bits in a contiguous block. So 495*7168ae33SJames Morse0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 496*7168ae33SJames Morseand 0xA are not. Check /sys/fs/resctrl/info/{resource}/sparse_masks 497*7168ae33SJames Morseif non-contiguous 1s value is supported. On a system with a 20-bit mask 498*7168ae33SJames Morseeach bit represents 5% of the capacity of the cache. You could partition 499*7168ae33SJames Morsethe cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. 500*7168ae33SJames Morse 501*7168ae33SJames MorseNotes on Sub-NUMA Cluster mode 502*7168ae33SJames Morse============================== 503*7168ae33SJames MorseWhen SNC mode is enabled, Linux may load balance tasks between Sub-NUMA 504*7168ae33SJames Morsenodes much more readily than between regular NUMA nodes since the CPUs 505*7168ae33SJames Morseon Sub-NUMA nodes share the same L3 cache and the system may report 506*7168ae33SJames Morsethe NUMA distance between Sub-NUMA nodes with a lower value than used 507*7168ae33SJames Morsefor regular NUMA nodes. 508*7168ae33SJames Morse 509*7168ae33SJames MorseThe top-level monitoring files in each "mon_L3_XX" directory provide 510*7168ae33SJames Morsethe sum of data across all SNC nodes sharing an L3 cache instance. 511*7168ae33SJames MorseUsers who bind tasks to the CPUs of a specific Sub-NUMA node can read 512*7168ae33SJames Morsethe "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes" in the 513*7168ae33SJames Morse"mon_sub_L3_YY" directories to get node local data. 514*7168ae33SJames Morse 515*7168ae33SJames MorseMemory bandwidth allocation is still performed at the L3 cache 516*7168ae33SJames Morselevel. I.e. throttling controls are applied to all SNC nodes. 517*7168ae33SJames Morse 518*7168ae33SJames MorseL3 cache allocation bitmaps also apply to all SNC nodes. But note that 519*7168ae33SJames Morsethe amount of L3 cache represented by each bit is divided by the number 520*7168ae33SJames Morseof SNC nodes per L3 cache. E.g. with a 100MB cache on a system with 10-bit 521*7168ae33SJames Morseallocation masks each bit normally represents 10MB. With SNC mode enabled 522*7168ae33SJames Morsewith two SNC nodes per L3 cache, each bit only represents 5MB. 523*7168ae33SJames Morse 524*7168ae33SJames MorseMemory bandwidth Allocation and monitoring 525*7168ae33SJames Morse========================================== 526*7168ae33SJames Morse 527*7168ae33SJames MorseFor Memory bandwidth resource, by default the user controls the resource 528*7168ae33SJames Morseby indicating the percentage of total memory bandwidth. 529*7168ae33SJames Morse 530*7168ae33SJames MorseThe minimum bandwidth percentage value for each cpu model is predefined 531*7168ae33SJames Morseand can be looked up through "info/MB/min_bandwidth". The bandwidth 532*7168ae33SJames Morsegranularity that is allocated is also dependent on the cpu model and can 533*7168ae33SJames Morsebe looked up at "info/MB/bandwidth_gran". The available bandwidth 534*7168ae33SJames Morsecontrol steps are: min_bw + N * bw_gran. Intermediate values are rounded 535*7168ae33SJames Morseto the next control step available on the hardware. 536*7168ae33SJames Morse 537*7168ae33SJames MorseThe bandwidth throttling is a core specific mechanism on some of Intel 538*7168ae33SJames MorseSKUs. Using a high bandwidth and a low bandwidth setting on two threads 539*7168ae33SJames Morsesharing a core may result in both threads being throttled to use the 540*7168ae33SJames Morselow bandwidth (see "thread_throttle_mode"). 541*7168ae33SJames Morse 542*7168ae33SJames MorseThe fact that Memory bandwidth allocation(MBA) may be a core 543*7168ae33SJames Morsespecific mechanism where as memory bandwidth monitoring(MBM) is done at 544*7168ae33SJames Morsethe package level may lead to confusion when users try to apply control 545*7168ae33SJames Morsevia the MBA and then monitor the bandwidth to see if the controls are 546*7168ae33SJames Morseeffective. Below are such scenarios: 547*7168ae33SJames Morse 548*7168ae33SJames Morse1. User may *not* see increase in actual bandwidth when percentage 549*7168ae33SJames Morse values are increased: 550*7168ae33SJames Morse 551*7168ae33SJames MorseThis can occur when aggregate L2 external bandwidth is more than L3 552*7168ae33SJames Morseexternal bandwidth. Consider an SKL SKU with 24 cores on a package and 553*7168ae33SJames Morsewhere L2 external is 10GBps (hence aggregate L2 external bandwidth is 554*7168ae33SJames Morse240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20 555*7168ae33SJames Morsethreads, having 50% bandwidth, each consuming 5GBps' consumes the max L3 556*7168ae33SJames Morsebandwidth of 100GBps although the percentage value specified is only 50% 557*7168ae33SJames Morse<< 100%. Hence increasing the bandwidth percentage will not yield any 558*7168ae33SJames Morsemore bandwidth. This is because although the L2 external bandwidth still 559*7168ae33SJames Morsehas capacity, the L3 external bandwidth is fully used. Also note that 560*7168ae33SJames Morsethis would be dependent on number of cores the benchmark is run on. 561*7168ae33SJames Morse 562*7168ae33SJames Morse2. Same bandwidth percentage may mean different actual bandwidth 563*7168ae33SJames Morse depending on # of threads: 564*7168ae33SJames Morse 565*7168ae33SJames MorseFor the same SKU in #1, a 'single thread, with 10% bandwidth' and '4 566*7168ae33SJames Morsethread, with 10% bandwidth' can consume upto 10GBps and 40GBps although 567*7168ae33SJames Morsethey have same percentage bandwidth of 10%. This is simply because as 568*7168ae33SJames Morsethreads start using more cores in an rdtgroup, the actual bandwidth may 569*7168ae33SJames Morseincrease or vary although user specified bandwidth percentage is same. 570*7168ae33SJames Morse 571*7168ae33SJames MorseIn order to mitigate this and make the interface more user friendly, 572*7168ae33SJames Morseresctrl added support for specifying the bandwidth in MiBps as well. The 573*7168ae33SJames Morsekernel underneath would use a software feedback mechanism or a "Software 574*7168ae33SJames MorseController(mba_sc)" which reads the actual bandwidth using MBM counters 575*7168ae33SJames Morseand adjust the memory bandwidth percentages to ensure:: 576*7168ae33SJames Morse 577*7168ae33SJames Morse "actual bandwidth < user specified bandwidth". 578*7168ae33SJames Morse 579*7168ae33SJames MorseBy default, the schemata would take the bandwidth percentage values 580*7168ae33SJames Morsewhere as user can switch to the "MBA software controller" mode using 581*7168ae33SJames Morsea mount option 'mba_MBps'. The schemata format is specified in the below 582*7168ae33SJames Morsesections. 583*7168ae33SJames Morse 584*7168ae33SJames MorseL3 schemata file details (code and data prioritization disabled) 585*7168ae33SJames Morse---------------------------------------------------------------- 586*7168ae33SJames MorseWith CDP disabled the L3 schemata format is:: 587*7168ae33SJames Morse 588*7168ae33SJames Morse L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 589*7168ae33SJames Morse 590*7168ae33SJames MorseL3 schemata file details (CDP enabled via mount option to resctrl) 591*7168ae33SJames Morse------------------------------------------------------------------ 592*7168ae33SJames MorseWhen CDP is enabled L3 control is split into two separate resources 593*7168ae33SJames Morseso you can specify independent masks for code and data like this:: 594*7168ae33SJames Morse 595*7168ae33SJames Morse L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 596*7168ae33SJames Morse L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 597*7168ae33SJames Morse 598*7168ae33SJames MorseL2 schemata file details 599*7168ae33SJames Morse------------------------ 600*7168ae33SJames MorseCDP is supported at L2 using the 'cdpl2' mount option. The schemata 601*7168ae33SJames Morseformat is either:: 602*7168ae33SJames Morse 603*7168ae33SJames Morse L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 604*7168ae33SJames Morse 605*7168ae33SJames Morseor 606*7168ae33SJames Morse 607*7168ae33SJames Morse L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 608*7168ae33SJames Morse L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 609*7168ae33SJames Morse 610*7168ae33SJames Morse 611*7168ae33SJames MorseMemory bandwidth Allocation (default mode) 612*7168ae33SJames Morse------------------------------------------ 613*7168ae33SJames Morse 614*7168ae33SJames MorseMemory b/w domain is L3 cache. 615*7168ae33SJames Morse:: 616*7168ae33SJames Morse 617*7168ae33SJames Morse MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... 618*7168ae33SJames Morse 619*7168ae33SJames MorseMemory bandwidth Allocation specified in MiBps 620*7168ae33SJames Morse---------------------------------------------- 621*7168ae33SJames Morse 622*7168ae33SJames MorseMemory bandwidth domain is L3 cache. 623*7168ae33SJames Morse:: 624*7168ae33SJames Morse 625*7168ae33SJames Morse MB:<cache_id0>=bw_MiBps0;<cache_id1>=bw_MiBps1;... 626*7168ae33SJames Morse 627*7168ae33SJames MorseSlow Memory Bandwidth Allocation (SMBA) 628*7168ae33SJames Morse--------------------------------------- 629*7168ae33SJames MorseAMD hardware supports Slow Memory Bandwidth Allocation (SMBA). 630*7168ae33SJames MorseCXL.memory is the only supported "slow" memory device. With the 631*7168ae33SJames Morsesupport of SMBA, the hardware enables bandwidth allocation on 632*7168ae33SJames Morsethe slow memory devices. If there are multiple such devices in 633*7168ae33SJames Morsethe system, the throttling logic groups all the slow sources 634*7168ae33SJames Morsetogether and applies the limit on them as a whole. 635*7168ae33SJames Morse 636*7168ae33SJames MorseThe presence of SMBA (with CXL.memory) is independent of slow memory 637*7168ae33SJames Morsedevices presence. If there are no such devices on the system, then 638*7168ae33SJames Morseconfiguring SMBA will have no impact on the performance of the system. 639*7168ae33SJames Morse 640*7168ae33SJames MorseThe bandwidth domain for slow memory is L3 cache. Its schemata file 641*7168ae33SJames Morseis formatted as: 642*7168ae33SJames Morse:: 643*7168ae33SJames Morse 644*7168ae33SJames Morse SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... 645*7168ae33SJames Morse 646*7168ae33SJames MorseReading/writing the schemata file 647*7168ae33SJames Morse--------------------------------- 648*7168ae33SJames MorseReading the schemata file will show the state of all resources 649*7168ae33SJames Morseon all domains. When writing you only need to specify those values 650*7168ae33SJames Morsewhich you wish to change. E.g. 651*7168ae33SJames Morse:: 652*7168ae33SJames Morse 653*7168ae33SJames Morse # cat schemata 654*7168ae33SJames Morse L3DATA:0=fffff;1=fffff;2=fffff;3=fffff 655*7168ae33SJames Morse L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 656*7168ae33SJames Morse # echo "L3DATA:2=3c0;" > schemata 657*7168ae33SJames Morse # cat schemata 658*7168ae33SJames Morse L3DATA:0=fffff;1=fffff;2=3c0;3=fffff 659*7168ae33SJames Morse L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 660*7168ae33SJames Morse 661*7168ae33SJames MorseReading/writing the schemata file (on AMD systems) 662*7168ae33SJames Morse-------------------------------------------------- 663*7168ae33SJames MorseReading the schemata file will show the current bandwidth limit on all 664*7168ae33SJames Morsedomains. The allocated resources are in multiples of one eighth GB/s. 665*7168ae33SJames MorseWhen writing to the file, you need to specify what cache id you wish to 666*7168ae33SJames Morseconfigure the bandwidth limit. 667*7168ae33SJames Morse 668*7168ae33SJames MorseFor example, to allocate 2GB/s limit on the first cache id: 669*7168ae33SJames Morse 670*7168ae33SJames Morse:: 671*7168ae33SJames Morse 672*7168ae33SJames Morse # cat schemata 673*7168ae33SJames Morse MB:0=2048;1=2048;2=2048;3=2048 674*7168ae33SJames Morse L3:0=ffff;1=ffff;2=ffff;3=ffff 675*7168ae33SJames Morse 676*7168ae33SJames Morse # echo "MB:1=16" > schemata 677*7168ae33SJames Morse # cat schemata 678*7168ae33SJames Morse MB:0=2048;1= 16;2=2048;3=2048 679*7168ae33SJames Morse L3:0=ffff;1=ffff;2=ffff;3=ffff 680*7168ae33SJames Morse 681*7168ae33SJames MorseReading/writing the schemata file (on AMD systems) with SMBA feature 682*7168ae33SJames Morse-------------------------------------------------------------------- 683*7168ae33SJames MorseReading and writing the schemata file is the same as without SMBA in 684*7168ae33SJames Morseabove section. 685*7168ae33SJames Morse 686*7168ae33SJames MorseFor example, to allocate 8GB/s limit on the first cache id: 687*7168ae33SJames Morse 688*7168ae33SJames Morse:: 689*7168ae33SJames Morse 690*7168ae33SJames Morse # cat schemata 691*7168ae33SJames Morse SMBA:0=2048;1=2048;2=2048;3=2048 692*7168ae33SJames Morse MB:0=2048;1=2048;2=2048;3=2048 693*7168ae33SJames Morse L3:0=ffff;1=ffff;2=ffff;3=ffff 694*7168ae33SJames Morse 695*7168ae33SJames Morse # echo "SMBA:1=64" > schemata 696*7168ae33SJames Morse # cat schemata 697*7168ae33SJames Morse SMBA:0=2048;1= 64;2=2048;3=2048 698*7168ae33SJames Morse MB:0=2048;1=2048;2=2048;3=2048 699*7168ae33SJames Morse L3:0=ffff;1=ffff;2=ffff;3=ffff 700*7168ae33SJames Morse 701*7168ae33SJames MorseCache Pseudo-Locking 702*7168ae33SJames Morse==================== 703*7168ae33SJames MorseCAT enables a user to specify the amount of cache space that an 704*7168ae33SJames Morseapplication can fill. Cache pseudo-locking builds on the fact that a 705*7168ae33SJames MorseCPU can still read and write data pre-allocated outside its current 706*7168ae33SJames Morseallocated area on a cache hit. With cache pseudo-locking, data can be 707*7168ae33SJames Morsepreloaded into a reserved portion of cache that no application can 708*7168ae33SJames Morsefill, and from that point on will only serve cache hits. The cache 709*7168ae33SJames Morsepseudo-locked memory is made accessible to user space where an 710*7168ae33SJames Morseapplication can map it into its virtual address space and thus have 711*7168ae33SJames Morsea region of memory with reduced average read latency. 712*7168ae33SJames Morse 713*7168ae33SJames MorseThe creation of a cache pseudo-locked region is triggered by a request 714*7168ae33SJames Morsefrom the user to do so that is accompanied by a schemata of the region 715*7168ae33SJames Morseto be pseudo-locked. The cache pseudo-locked region is created as follows: 716*7168ae33SJames Morse 717*7168ae33SJames Morse- Create a CAT allocation CLOSNEW with a CBM matching the schemata 718*7168ae33SJames Morse from the user of the cache region that will contain the pseudo-locked 719*7168ae33SJames Morse memory. This region must not overlap with any current CAT allocation/CLOS 720*7168ae33SJames Morse on the system and no future overlap with this cache region is allowed 721*7168ae33SJames Morse while the pseudo-locked region exists. 722*7168ae33SJames Morse- Create a contiguous region of memory of the same size as the cache 723*7168ae33SJames Morse region. 724*7168ae33SJames Morse- Flush the cache, disable hardware prefetchers, disable preemption. 725*7168ae33SJames Morse- Make CLOSNEW the active CLOS and touch the allocated memory to load 726*7168ae33SJames Morse it into the cache. 727*7168ae33SJames Morse- Set the previous CLOS as active. 728*7168ae33SJames Morse- At this point the closid CLOSNEW can be released - the cache 729*7168ae33SJames Morse pseudo-locked region is protected as long as its CBM does not appear in 730*7168ae33SJames Morse any CAT allocation. Even though the cache pseudo-locked region will from 731*7168ae33SJames Morse this point on not appear in any CBM of any CLOS an application running with 732*7168ae33SJames Morse any CLOS will be able to access the memory in the pseudo-locked region since 733*7168ae33SJames Morse the region continues to serve cache hits. 734*7168ae33SJames Morse- The contiguous region of memory loaded into the cache is exposed to 735*7168ae33SJames Morse user-space as a character device. 736*7168ae33SJames Morse 737*7168ae33SJames MorseCache pseudo-locking increases the probability that data will remain 738*7168ae33SJames Morsein the cache via carefully configuring the CAT feature and controlling 739*7168ae33SJames Morseapplication behavior. There is no guarantee that data is placed in 740*7168ae33SJames Morsecache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict 741*7168ae33SJames Morse“locked” data from cache. Power management C-states may shrink or 742*7168ae33SJames Morsepower off cache. Deeper C-states will automatically be restricted on 743*7168ae33SJames Morsepseudo-locked region creation. 744*7168ae33SJames Morse 745*7168ae33SJames MorseIt is required that an application using a pseudo-locked region runs 746*7168ae33SJames Morsewith affinity to the cores (or a subset of the cores) associated 747*7168ae33SJames Morsewith the cache on which the pseudo-locked region resides. A sanity check 748*7168ae33SJames Morsewithin the code will not allow an application to map pseudo-locked memory 749*7168ae33SJames Morseunless it runs with affinity to cores associated with the cache on which the 750*7168ae33SJames Morsepseudo-locked region resides. The sanity check is only done during the 751*7168ae33SJames Morseinitial mmap() handling, there is no enforcement afterwards and the 752*7168ae33SJames Morseapplication self needs to ensure it remains affine to the correct cores. 753*7168ae33SJames Morse 754*7168ae33SJames MorsePseudo-locking is accomplished in two stages: 755*7168ae33SJames Morse 756*7168ae33SJames Morse1) During the first stage the system administrator allocates a portion 757*7168ae33SJames Morse of cache that should be dedicated to pseudo-locking. At this time an 758*7168ae33SJames Morse equivalent portion of memory is allocated, loaded into allocated 759*7168ae33SJames Morse cache portion, and exposed as a character device. 760*7168ae33SJames Morse2) During the second stage a user-space application maps (mmap()) the 761*7168ae33SJames Morse pseudo-locked memory into its address space. 762*7168ae33SJames Morse 763*7168ae33SJames MorseCache Pseudo-Locking Interface 764*7168ae33SJames Morse------------------------------ 765*7168ae33SJames MorseA pseudo-locked region is created using the resctrl interface as follows: 766*7168ae33SJames Morse 767*7168ae33SJames Morse1) Create a new resource group by creating a new directory in /sys/fs/resctrl. 768*7168ae33SJames Morse2) Change the new resource group's mode to "pseudo-locksetup" by writing 769*7168ae33SJames Morse "pseudo-locksetup" to the "mode" file. 770*7168ae33SJames Morse3) Write the schemata of the pseudo-locked region to the "schemata" file. All 771*7168ae33SJames Morse bits within the schemata should be "unused" according to the "bit_usage" 772*7168ae33SJames Morse file. 773*7168ae33SJames Morse 774*7168ae33SJames MorseOn successful pseudo-locked region creation the "mode" file will contain 775*7168ae33SJames Morse"pseudo-locked" and a new character device with the same name as the resource 776*7168ae33SJames Morsegroup will exist in /dev/pseudo_lock. This character device can be mmap()'ed 777*7168ae33SJames Morseby user space in order to obtain access to the pseudo-locked memory region. 778*7168ae33SJames Morse 779*7168ae33SJames MorseAn example of cache pseudo-locked region creation and usage can be found below. 780*7168ae33SJames Morse 781*7168ae33SJames MorseCache Pseudo-Locking Debugging Interface 782*7168ae33SJames Morse---------------------------------------- 783*7168ae33SJames MorseThe pseudo-locking debugging interface is enabled by default (if 784*7168ae33SJames MorseCONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl. 785*7168ae33SJames Morse 786*7168ae33SJames MorseThere is no explicit way for the kernel to test if a provided memory 787*7168ae33SJames Morselocation is present in the cache. The pseudo-locking debugging interface uses 788*7168ae33SJames Morsethe tracing infrastructure to provide two ways to measure cache residency of 789*7168ae33SJames Morsethe pseudo-locked region: 790*7168ae33SJames Morse 791*7168ae33SJames Morse1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data 792*7168ae33SJames Morse from these measurements are best visualized using a hist trigger (see 793*7168ae33SJames Morse example below). In this test the pseudo-locked region is traversed at 794*7168ae33SJames Morse a stride of 32 bytes while hardware prefetchers and preemption 795*7168ae33SJames Morse are disabled. This also provides a substitute visualization of cache 796*7168ae33SJames Morse hits and misses. 797*7168ae33SJames Morse2) Cache hit and miss measurements using model specific precision counters if 798*7168ae33SJames Morse available. Depending on the levels of cache on the system the pseudo_lock_l2 799*7168ae33SJames Morse and pseudo_lock_l3 tracepoints are available. 800*7168ae33SJames Morse 801*7168ae33SJames MorseWhen a pseudo-locked region is created a new debugfs directory is created for 802*7168ae33SJames Morseit in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single 803*7168ae33SJames Morsewrite-only file, pseudo_lock_measure, is present in this directory. The 804*7168ae33SJames Morsemeasurement of the pseudo-locked region depends on the number written to this 805*7168ae33SJames Morsedebugfs file: 806*7168ae33SJames Morse 807*7168ae33SJames Morse1: 808*7168ae33SJames Morse writing "1" to the pseudo_lock_measure file will trigger the latency 809*7168ae33SJames Morse measurement captured in the pseudo_lock_mem_latency tracepoint. See 810*7168ae33SJames Morse example below. 811*7168ae33SJames Morse2: 812*7168ae33SJames Morse writing "2" to the pseudo_lock_measure file will trigger the L2 cache 813*7168ae33SJames Morse residency (cache hits and misses) measurement captured in the 814*7168ae33SJames Morse pseudo_lock_l2 tracepoint. See example below. 815*7168ae33SJames Morse3: 816*7168ae33SJames Morse writing "3" to the pseudo_lock_measure file will trigger the L3 cache 817*7168ae33SJames Morse residency (cache hits and misses) measurement captured in the 818*7168ae33SJames Morse pseudo_lock_l3 tracepoint. 819*7168ae33SJames Morse 820*7168ae33SJames MorseAll measurements are recorded with the tracing infrastructure. This requires 821*7168ae33SJames Morsethe relevant tracepoints to be enabled before the measurement is triggered. 822*7168ae33SJames Morse 823*7168ae33SJames MorseExample of latency debugging interface 824*7168ae33SJames Morse~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 825*7168ae33SJames MorseIn this example a pseudo-locked region named "newlock" was created. Here is 826*7168ae33SJames Morsehow we can measure the latency in cycles of reading from this region and 827*7168ae33SJames Morsevisualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS 828*7168ae33SJames Morseis set:: 829*7168ae33SJames Morse 830*7168ae33SJames Morse # :> /sys/kernel/tracing/trace 831*7168ae33SJames Morse # echo 'hist:keys=latency' > /sys/kernel/tracing/events/resctrl/pseudo_lock_mem_latency/trigger 832*7168ae33SJames Morse # echo 1 > /sys/kernel/tracing/events/resctrl/pseudo_lock_mem_latency/enable 833*7168ae33SJames Morse # echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure 834*7168ae33SJames Morse # echo 0 > /sys/kernel/tracing/events/resctrl/pseudo_lock_mem_latency/enable 835*7168ae33SJames Morse # cat /sys/kernel/tracing/events/resctrl/pseudo_lock_mem_latency/hist 836*7168ae33SJames Morse 837*7168ae33SJames Morse # event histogram 838*7168ae33SJames Morse # 839*7168ae33SJames Morse # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active] 840*7168ae33SJames Morse # 841*7168ae33SJames Morse 842*7168ae33SJames Morse { latency: 456 } hitcount: 1 843*7168ae33SJames Morse { latency: 50 } hitcount: 83 844*7168ae33SJames Morse { latency: 36 } hitcount: 96 845*7168ae33SJames Morse { latency: 44 } hitcount: 174 846*7168ae33SJames Morse { latency: 48 } hitcount: 195 847*7168ae33SJames Morse { latency: 46 } hitcount: 262 848*7168ae33SJames Morse { latency: 42 } hitcount: 693 849*7168ae33SJames Morse { latency: 40 } hitcount: 3204 850*7168ae33SJames Morse { latency: 38 } hitcount: 3484 851*7168ae33SJames Morse 852*7168ae33SJames Morse Totals: 853*7168ae33SJames Morse Hits: 8192 854*7168ae33SJames Morse Entries: 9 855*7168ae33SJames Morse Dropped: 0 856*7168ae33SJames Morse 857*7168ae33SJames MorseExample of cache hits/misses debugging 858*7168ae33SJames Morse~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 859*7168ae33SJames MorseIn this example a pseudo-locked region named "newlock" was created on the L2 860*7168ae33SJames Morsecache of a platform. Here is how we can obtain details of the cache hits 861*7168ae33SJames Morseand misses using the platform's precision counters. 862*7168ae33SJames Morse:: 863*7168ae33SJames Morse 864*7168ae33SJames Morse # :> /sys/kernel/tracing/trace 865*7168ae33SJames Morse # echo 1 > /sys/kernel/tracing/events/resctrl/pseudo_lock_l2/enable 866*7168ae33SJames Morse # echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure 867*7168ae33SJames Morse # echo 0 > /sys/kernel/tracing/events/resctrl/pseudo_lock_l2/enable 868*7168ae33SJames Morse # cat /sys/kernel/tracing/trace 869*7168ae33SJames Morse 870*7168ae33SJames Morse # tracer: nop 871*7168ae33SJames Morse # 872*7168ae33SJames Morse # _-----=> irqs-off 873*7168ae33SJames Morse # / _----=> need-resched 874*7168ae33SJames Morse # | / _---=> hardirq/softirq 875*7168ae33SJames Morse # || / _--=> preempt-depth 876*7168ae33SJames Morse # ||| / delay 877*7168ae33SJames Morse # TASK-PID CPU# |||| TIMESTAMP FUNCTION 878*7168ae33SJames Morse # | | | |||| | | 879*7168ae33SJames Morse pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0 880*7168ae33SJames Morse 881*7168ae33SJames Morse 882*7168ae33SJames MorseExamples for RDT allocation usage 883*7168ae33SJames Morse~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 884*7168ae33SJames Morse 885*7168ae33SJames Morse1) Example 1 886*7168ae33SJames Morse 887*7168ae33SJames MorseOn a two socket machine (one L3 cache per socket) with just four bits 888*7168ae33SJames Morsefor cache bit masks, minimum b/w of 10% with a memory bandwidth 889*7168ae33SJames Morsegranularity of 10%. 890*7168ae33SJames Morse:: 891*7168ae33SJames Morse 892*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 893*7168ae33SJames Morse # cd /sys/fs/resctrl 894*7168ae33SJames Morse # mkdir p0 p1 895*7168ae33SJames Morse # echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata 896*7168ae33SJames Morse # echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata 897*7168ae33SJames Morse 898*7168ae33SJames MorseThe default resource group is unmodified, so we have access to all parts 899*7168ae33SJames Morseof all caches (its schemata file reads "L3:0=f;1=f"). 900*7168ae33SJames Morse 901*7168ae33SJames MorseTasks that are under the control of group "p0" may only allocate from the 902*7168ae33SJames Morse"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. 903*7168ae33SJames MorseTasks in group "p1" use the "lower" 50% of cache on both sockets. 904*7168ae33SJames Morse 905*7168ae33SJames MorseSimilarly, tasks that are under the control of group "p0" may use a 906*7168ae33SJames Morsemaximum memory b/w of 50% on socket0 and 50% on socket 1. 907*7168ae33SJames MorseTasks in group "p1" may also use 50% memory b/w on both sockets. 908*7168ae33SJames MorseNote that unlike cache masks, memory b/w cannot specify whether these 909*7168ae33SJames Morseallocations can overlap or not. The allocations specifies the maximum 910*7168ae33SJames Morseb/w that the group may be able to use and the system admin can configure 911*7168ae33SJames Morsethe b/w accordingly. 912*7168ae33SJames Morse 913*7168ae33SJames MorseIf resctrl is using the software controller (mba_sc) then user can enter the 914*7168ae33SJames Morsemax b/w in MB rather than the percentage values. 915*7168ae33SJames Morse:: 916*7168ae33SJames Morse 917*7168ae33SJames Morse # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata 918*7168ae33SJames Morse # echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata 919*7168ae33SJames Morse 920*7168ae33SJames MorseIn the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w 921*7168ae33SJames Morseof 1024MB where as on socket 1 they would use 500MB. 922*7168ae33SJames Morse 923*7168ae33SJames Morse2) Example 2 924*7168ae33SJames Morse 925*7168ae33SJames MorseAgain two sockets, but this time with a more realistic 20-bit mask. 926*7168ae33SJames Morse 927*7168ae33SJames MorseTwo real time tasks pid=1234 running on processor 0 and pid=5678 running on 928*7168ae33SJames Morseprocessor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy 929*7168ae33SJames Morseneighbors, each of the two real-time tasks exclusively occupies one quarter 930*7168ae33SJames Morseof L3 cache on socket 0. 931*7168ae33SJames Morse:: 932*7168ae33SJames Morse 933*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 934*7168ae33SJames Morse # cd /sys/fs/resctrl 935*7168ae33SJames Morse 936*7168ae33SJames MorseFirst we reset the schemata for the default group so that the "upper" 937*7168ae33SJames Morse50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by 938*7168ae33SJames Morseordinary tasks:: 939*7168ae33SJames Morse 940*7168ae33SJames Morse # echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata 941*7168ae33SJames Morse 942*7168ae33SJames MorseNext we make a resource group for our first real time task and give 943*7168ae33SJames Morseit access to the "top" 25% of the cache on socket 0. 944*7168ae33SJames Morse:: 945*7168ae33SJames Morse 946*7168ae33SJames Morse # mkdir p0 947*7168ae33SJames Morse # echo "L3:0=f8000;1=fffff" > p0/schemata 948*7168ae33SJames Morse 949*7168ae33SJames MorseFinally we move our first real time task into this resource group. We 950*7168ae33SJames Morsealso use taskset(1) to ensure the task always runs on a dedicated CPU 951*7168ae33SJames Morseon socket 0. Most uses of resource groups will also constrain which 952*7168ae33SJames Morseprocessors tasks run on. 953*7168ae33SJames Morse:: 954*7168ae33SJames Morse 955*7168ae33SJames Morse # echo 1234 > p0/tasks 956*7168ae33SJames Morse # taskset -cp 1 1234 957*7168ae33SJames Morse 958*7168ae33SJames MorseDitto for the second real time task (with the remaining 25% of cache):: 959*7168ae33SJames Morse 960*7168ae33SJames Morse # mkdir p1 961*7168ae33SJames Morse # echo "L3:0=7c00;1=fffff" > p1/schemata 962*7168ae33SJames Morse # echo 5678 > p1/tasks 963*7168ae33SJames Morse # taskset -cp 2 5678 964*7168ae33SJames Morse 965*7168ae33SJames MorseFor the same 2 socket system with memory b/w resource and CAT L3 the 966*7168ae33SJames Morseschemata would look like(Assume min_bandwidth 10 and bandwidth_gran is 967*7168ae33SJames Morse10): 968*7168ae33SJames Morse 969*7168ae33SJames MorseFor our first real time task this would request 20% memory b/w on socket 0. 970*7168ae33SJames Morse:: 971*7168ae33SJames Morse 972*7168ae33SJames Morse # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata 973*7168ae33SJames Morse 974*7168ae33SJames MorseFor our second real time task this would request an other 20% memory b/w 975*7168ae33SJames Morseon socket 0. 976*7168ae33SJames Morse:: 977*7168ae33SJames Morse 978*7168ae33SJames Morse # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata 979*7168ae33SJames Morse 980*7168ae33SJames Morse3) Example 3 981*7168ae33SJames Morse 982*7168ae33SJames MorseA single socket system which has real-time tasks running on core 4-7 and 983*7168ae33SJames Morsenon real-time workload assigned to core 0-3. The real-time tasks share text 984*7168ae33SJames Morseand data, so a per task association is not required and due to interaction 985*7168ae33SJames Morsewith the kernel it's desired that the kernel on these cores shares L3 with 986*7168ae33SJames Morsethe tasks. 987*7168ae33SJames Morse:: 988*7168ae33SJames Morse 989*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 990*7168ae33SJames Morse # cd /sys/fs/resctrl 991*7168ae33SJames Morse 992*7168ae33SJames MorseFirst we reset the schemata for the default group so that the "upper" 993*7168ae33SJames Morse50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0 994*7168ae33SJames Morsecannot be used by ordinary tasks:: 995*7168ae33SJames Morse 996*7168ae33SJames Morse # echo "L3:0=3ff\nMB:0=50" > schemata 997*7168ae33SJames Morse 998*7168ae33SJames MorseNext we make a resource group for our real time cores and give it access 999*7168ae33SJames Morseto the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on 1000*7168ae33SJames Morsesocket 0. 1001*7168ae33SJames Morse:: 1002*7168ae33SJames Morse 1003*7168ae33SJames Morse # mkdir p0 1004*7168ae33SJames Morse # echo "L3:0=ffc00\nMB:0=50" > p0/schemata 1005*7168ae33SJames Morse 1006*7168ae33SJames MorseFinally we move core 4-7 over to the new group and make sure that the 1007*7168ae33SJames Morsekernel and the tasks running there get 50% of the cache. They should 1008*7168ae33SJames Morsealso get 50% of memory bandwidth assuming that the cores 4-7 are SMT 1009*7168ae33SJames Morsesiblings and only the real time threads are scheduled on the cores 4-7. 1010*7168ae33SJames Morse:: 1011*7168ae33SJames Morse 1012*7168ae33SJames Morse # echo F0 > p0/cpus 1013*7168ae33SJames Morse 1014*7168ae33SJames Morse4) Example 4 1015*7168ae33SJames Morse 1016*7168ae33SJames MorseThe resource groups in previous examples were all in the default "shareable" 1017*7168ae33SJames Morsemode allowing sharing of their cache allocations. If one resource group 1018*7168ae33SJames Morseconfigures a cache allocation then nothing prevents another resource group 1019*7168ae33SJames Morseto overlap with that allocation. 1020*7168ae33SJames Morse 1021*7168ae33SJames MorseIn this example a new exclusive resource group will be created on a L2 CAT 1022*7168ae33SJames Morsesystem with two L2 cache instances that can be configured with an 8-bit 1023*7168ae33SJames Morsecapacity bitmask. The new exclusive resource group will be configured to use 1024*7168ae33SJames Morse25% of each cache instance. 1025*7168ae33SJames Morse:: 1026*7168ae33SJames Morse 1027*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl/ 1028*7168ae33SJames Morse # cd /sys/fs/resctrl 1029*7168ae33SJames Morse 1030*7168ae33SJames MorseFirst, we observe that the default group is configured to allocate to all L2 1031*7168ae33SJames Morsecache:: 1032*7168ae33SJames Morse 1033*7168ae33SJames Morse # cat schemata 1034*7168ae33SJames Morse L2:0=ff;1=ff 1035*7168ae33SJames Morse 1036*7168ae33SJames MorseWe could attempt to create the new resource group at this point, but it will 1037*7168ae33SJames Morsefail because of the overlap with the schemata of the default group:: 1038*7168ae33SJames Morse 1039*7168ae33SJames Morse # mkdir p0 1040*7168ae33SJames Morse # echo 'L2:0=0x3;1=0x3' > p0/schemata 1041*7168ae33SJames Morse # cat p0/mode 1042*7168ae33SJames Morse shareable 1043*7168ae33SJames Morse # echo exclusive > p0/mode 1044*7168ae33SJames Morse -sh: echo: write error: Invalid argument 1045*7168ae33SJames Morse # cat info/last_cmd_status 1046*7168ae33SJames Morse schemata overlaps 1047*7168ae33SJames Morse 1048*7168ae33SJames MorseTo ensure that there is no overlap with another resource group the default 1049*7168ae33SJames Morseresource group's schemata has to change, making it possible for the new 1050*7168ae33SJames Morseresource group to become exclusive. 1051*7168ae33SJames Morse:: 1052*7168ae33SJames Morse 1053*7168ae33SJames Morse # echo 'L2:0=0xfc;1=0xfc' > schemata 1054*7168ae33SJames Morse # echo exclusive > p0/mode 1055*7168ae33SJames Morse # grep . p0/* 1056*7168ae33SJames Morse p0/cpus:0 1057*7168ae33SJames Morse p0/mode:exclusive 1058*7168ae33SJames Morse p0/schemata:L2:0=03;1=03 1059*7168ae33SJames Morse p0/size:L2:0=262144;1=262144 1060*7168ae33SJames Morse 1061*7168ae33SJames MorseA new resource group will on creation not overlap with an exclusive resource 1062*7168ae33SJames Morsegroup:: 1063*7168ae33SJames Morse 1064*7168ae33SJames Morse # mkdir p1 1065*7168ae33SJames Morse # grep . p1/* 1066*7168ae33SJames Morse p1/cpus:0 1067*7168ae33SJames Morse p1/mode:shareable 1068*7168ae33SJames Morse p1/schemata:L2:0=fc;1=fc 1069*7168ae33SJames Morse p1/size:L2:0=786432;1=786432 1070*7168ae33SJames Morse 1071*7168ae33SJames MorseThe bit_usage will reflect how the cache is used:: 1072*7168ae33SJames Morse 1073*7168ae33SJames Morse # cat info/L2/bit_usage 1074*7168ae33SJames Morse 0=SSSSSSEE;1=SSSSSSEE 1075*7168ae33SJames Morse 1076*7168ae33SJames MorseA resource group cannot be forced to overlap with an exclusive resource group:: 1077*7168ae33SJames Morse 1078*7168ae33SJames Morse # echo 'L2:0=0x1;1=0x1' > p1/schemata 1079*7168ae33SJames Morse -sh: echo: write error: Invalid argument 1080*7168ae33SJames Morse # cat info/last_cmd_status 1081*7168ae33SJames Morse overlaps with exclusive group 1082*7168ae33SJames Morse 1083*7168ae33SJames MorseExample of Cache Pseudo-Locking 1084*7168ae33SJames Morse~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1085*7168ae33SJames MorseLock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked 1086*7168ae33SJames Morseregion is exposed at /dev/pseudo_lock/newlock that can be provided to 1087*7168ae33SJames Morseapplication for argument to mmap(). 1088*7168ae33SJames Morse:: 1089*7168ae33SJames Morse 1090*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl/ 1091*7168ae33SJames Morse # cd /sys/fs/resctrl 1092*7168ae33SJames Morse 1093*7168ae33SJames MorseEnsure that there are bits available that can be pseudo-locked, since only 1094*7168ae33SJames Morseunused bits can be pseudo-locked the bits to be pseudo-locked needs to be 1095*7168ae33SJames Morseremoved from the default resource group's schemata:: 1096*7168ae33SJames Morse 1097*7168ae33SJames Morse # cat info/L2/bit_usage 1098*7168ae33SJames Morse 0=SSSSSSSS;1=SSSSSSSS 1099*7168ae33SJames Morse # echo 'L2:1=0xfc' > schemata 1100*7168ae33SJames Morse # cat info/L2/bit_usage 1101*7168ae33SJames Morse 0=SSSSSSSS;1=SSSSSS00 1102*7168ae33SJames Morse 1103*7168ae33SJames MorseCreate a new resource group that will be associated with the pseudo-locked 1104*7168ae33SJames Morseregion, indicate that it will be used for a pseudo-locked region, and 1105*7168ae33SJames Morseconfigure the requested pseudo-locked region capacity bitmask:: 1106*7168ae33SJames Morse 1107*7168ae33SJames Morse # mkdir newlock 1108*7168ae33SJames Morse # echo pseudo-locksetup > newlock/mode 1109*7168ae33SJames Morse # echo 'L2:1=0x3' > newlock/schemata 1110*7168ae33SJames Morse 1111*7168ae33SJames MorseOn success the resource group's mode will change to pseudo-locked, the 1112*7168ae33SJames Morsebit_usage will reflect the pseudo-locked region, and the character device 1113*7168ae33SJames Morseexposing the pseudo-locked region will exist:: 1114*7168ae33SJames Morse 1115*7168ae33SJames Morse # cat newlock/mode 1116*7168ae33SJames Morse pseudo-locked 1117*7168ae33SJames Morse # cat info/L2/bit_usage 1118*7168ae33SJames Morse 0=SSSSSSSS;1=SSSSSSPP 1119*7168ae33SJames Morse # ls -l /dev/pseudo_lock/newlock 1120*7168ae33SJames Morse crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock 1121*7168ae33SJames Morse 1122*7168ae33SJames Morse:: 1123*7168ae33SJames Morse 1124*7168ae33SJames Morse /* 1125*7168ae33SJames Morse * Example code to access one page of pseudo-locked cache region 1126*7168ae33SJames Morse * from user space. 1127*7168ae33SJames Morse */ 1128*7168ae33SJames Morse #define _GNU_SOURCE 1129*7168ae33SJames Morse #include <fcntl.h> 1130*7168ae33SJames Morse #include <sched.h> 1131*7168ae33SJames Morse #include <stdio.h> 1132*7168ae33SJames Morse #include <stdlib.h> 1133*7168ae33SJames Morse #include <unistd.h> 1134*7168ae33SJames Morse #include <sys/mman.h> 1135*7168ae33SJames Morse 1136*7168ae33SJames Morse /* 1137*7168ae33SJames Morse * It is required that the application runs with affinity to only 1138*7168ae33SJames Morse * cores associated with the pseudo-locked region. Here the cpu 1139*7168ae33SJames Morse * is hardcoded for convenience of example. 1140*7168ae33SJames Morse */ 1141*7168ae33SJames Morse static int cpuid = 2; 1142*7168ae33SJames Morse 1143*7168ae33SJames Morse int main(int argc, char *argv[]) 1144*7168ae33SJames Morse { 1145*7168ae33SJames Morse cpu_set_t cpuset; 1146*7168ae33SJames Morse long page_size; 1147*7168ae33SJames Morse void *mapping; 1148*7168ae33SJames Morse int dev_fd; 1149*7168ae33SJames Morse int ret; 1150*7168ae33SJames Morse 1151*7168ae33SJames Morse page_size = sysconf(_SC_PAGESIZE); 1152*7168ae33SJames Morse 1153*7168ae33SJames Morse CPU_ZERO(&cpuset); 1154*7168ae33SJames Morse CPU_SET(cpuid, &cpuset); 1155*7168ae33SJames Morse ret = sched_setaffinity(0, sizeof(cpuset), &cpuset); 1156*7168ae33SJames Morse if (ret < 0) { 1157*7168ae33SJames Morse perror("sched_setaffinity"); 1158*7168ae33SJames Morse exit(EXIT_FAILURE); 1159*7168ae33SJames Morse } 1160*7168ae33SJames Morse 1161*7168ae33SJames Morse dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR); 1162*7168ae33SJames Morse if (dev_fd < 0) { 1163*7168ae33SJames Morse perror("open"); 1164*7168ae33SJames Morse exit(EXIT_FAILURE); 1165*7168ae33SJames Morse } 1166*7168ae33SJames Morse 1167*7168ae33SJames Morse mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, 1168*7168ae33SJames Morse dev_fd, 0); 1169*7168ae33SJames Morse if (mapping == MAP_FAILED) { 1170*7168ae33SJames Morse perror("mmap"); 1171*7168ae33SJames Morse close(dev_fd); 1172*7168ae33SJames Morse exit(EXIT_FAILURE); 1173*7168ae33SJames Morse } 1174*7168ae33SJames Morse 1175*7168ae33SJames Morse /* Application interacts with pseudo-locked memory @mapping */ 1176*7168ae33SJames Morse 1177*7168ae33SJames Morse ret = munmap(mapping, page_size); 1178*7168ae33SJames Morse if (ret < 0) { 1179*7168ae33SJames Morse perror("munmap"); 1180*7168ae33SJames Morse close(dev_fd); 1181*7168ae33SJames Morse exit(EXIT_FAILURE); 1182*7168ae33SJames Morse } 1183*7168ae33SJames Morse 1184*7168ae33SJames Morse close(dev_fd); 1185*7168ae33SJames Morse exit(EXIT_SUCCESS); 1186*7168ae33SJames Morse } 1187*7168ae33SJames Morse 1188*7168ae33SJames MorseLocking between applications 1189*7168ae33SJames Morse---------------------------- 1190*7168ae33SJames Morse 1191*7168ae33SJames MorseCertain operations on the resctrl filesystem, composed of read/writes 1192*7168ae33SJames Morseto/from multiple files, must be atomic. 1193*7168ae33SJames Morse 1194*7168ae33SJames MorseAs an example, the allocation of an exclusive reservation of L3 cache 1195*7168ae33SJames Morseinvolves: 1196*7168ae33SJames Morse 1197*7168ae33SJames Morse 1. Read the cbmmasks from each directory or the per-resource "bit_usage" 1198*7168ae33SJames Morse 2. Find a contiguous set of bits in the global CBM bitmask that is clear 1199*7168ae33SJames Morse in any of the directory cbmmasks 1200*7168ae33SJames Morse 3. Create a new directory 1201*7168ae33SJames Morse 4. Set the bits found in step 2 to the new directory "schemata" file 1202*7168ae33SJames Morse 1203*7168ae33SJames MorseIf two applications attempt to allocate space concurrently then they can 1204*7168ae33SJames Morseend up allocating the same bits so the reservations are shared instead of 1205*7168ae33SJames Morseexclusive. 1206*7168ae33SJames Morse 1207*7168ae33SJames MorseTo coordinate atomic operations on the resctrlfs and to avoid the problem 1208*7168ae33SJames Morseabove, the following locking procedure is recommended: 1209*7168ae33SJames Morse 1210*7168ae33SJames MorseLocking is based on flock, which is available in libc and also as a shell 1211*7168ae33SJames Morsescript command 1212*7168ae33SJames Morse 1213*7168ae33SJames MorseWrite lock: 1214*7168ae33SJames Morse 1215*7168ae33SJames Morse A) Take flock(LOCK_EX) on /sys/fs/resctrl 1216*7168ae33SJames Morse B) Read/write the directory structure. 1217*7168ae33SJames Morse C) funlock 1218*7168ae33SJames Morse 1219*7168ae33SJames MorseRead lock: 1220*7168ae33SJames Morse 1221*7168ae33SJames Morse A) Take flock(LOCK_SH) on /sys/fs/resctrl 1222*7168ae33SJames Morse B) If success read the directory structure. 1223*7168ae33SJames Morse C) funlock 1224*7168ae33SJames Morse 1225*7168ae33SJames MorseExample with bash:: 1226*7168ae33SJames Morse 1227*7168ae33SJames Morse # Atomically read directory structure 1228*7168ae33SJames Morse $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl 1229*7168ae33SJames Morse 1230*7168ae33SJames Morse # Read directory contents and create new subdirectory 1231*7168ae33SJames Morse 1232*7168ae33SJames Morse $ cat create-dir.sh 1233*7168ae33SJames Morse find /sys/fs/resctrl/ > output.txt 1234*7168ae33SJames Morse mask = function-of(output.txt) 1235*7168ae33SJames Morse mkdir /sys/fs/resctrl/newres/ 1236*7168ae33SJames Morse echo mask > /sys/fs/resctrl/newres/schemata 1237*7168ae33SJames Morse 1238*7168ae33SJames Morse $ flock /sys/fs/resctrl/ ./create-dir.sh 1239*7168ae33SJames Morse 1240*7168ae33SJames MorseExample with C:: 1241*7168ae33SJames Morse 1242*7168ae33SJames Morse /* 1243*7168ae33SJames Morse * Example code do take advisory locks 1244*7168ae33SJames Morse * before accessing resctrl filesystem 1245*7168ae33SJames Morse */ 1246*7168ae33SJames Morse #include <sys/file.h> 1247*7168ae33SJames Morse #include <stdlib.h> 1248*7168ae33SJames Morse 1249*7168ae33SJames Morse void resctrl_take_shared_lock(int fd) 1250*7168ae33SJames Morse { 1251*7168ae33SJames Morse int ret; 1252*7168ae33SJames Morse 1253*7168ae33SJames Morse /* take shared lock on resctrl filesystem */ 1254*7168ae33SJames Morse ret = flock(fd, LOCK_SH); 1255*7168ae33SJames Morse if (ret) { 1256*7168ae33SJames Morse perror("flock"); 1257*7168ae33SJames Morse exit(-1); 1258*7168ae33SJames Morse } 1259*7168ae33SJames Morse } 1260*7168ae33SJames Morse 1261*7168ae33SJames Morse void resctrl_take_exclusive_lock(int fd) 1262*7168ae33SJames Morse { 1263*7168ae33SJames Morse int ret; 1264*7168ae33SJames Morse 1265*7168ae33SJames Morse /* release lock on resctrl filesystem */ 1266*7168ae33SJames Morse ret = flock(fd, LOCK_EX); 1267*7168ae33SJames Morse if (ret) { 1268*7168ae33SJames Morse perror("flock"); 1269*7168ae33SJames Morse exit(-1); 1270*7168ae33SJames Morse } 1271*7168ae33SJames Morse } 1272*7168ae33SJames Morse 1273*7168ae33SJames Morse void resctrl_release_lock(int fd) 1274*7168ae33SJames Morse { 1275*7168ae33SJames Morse int ret; 1276*7168ae33SJames Morse 1277*7168ae33SJames Morse /* take shared lock on resctrl filesystem */ 1278*7168ae33SJames Morse ret = flock(fd, LOCK_UN); 1279*7168ae33SJames Morse if (ret) { 1280*7168ae33SJames Morse perror("flock"); 1281*7168ae33SJames Morse exit(-1); 1282*7168ae33SJames Morse } 1283*7168ae33SJames Morse } 1284*7168ae33SJames Morse 1285*7168ae33SJames Morse void main(void) 1286*7168ae33SJames Morse { 1287*7168ae33SJames Morse int fd, ret; 1288*7168ae33SJames Morse 1289*7168ae33SJames Morse fd = open("/sys/fs/resctrl", O_DIRECTORY); 1290*7168ae33SJames Morse if (fd == -1) { 1291*7168ae33SJames Morse perror("open"); 1292*7168ae33SJames Morse exit(-1); 1293*7168ae33SJames Morse } 1294*7168ae33SJames Morse resctrl_take_shared_lock(fd); 1295*7168ae33SJames Morse /* code to read directory contents */ 1296*7168ae33SJames Morse resctrl_release_lock(fd); 1297*7168ae33SJames Morse 1298*7168ae33SJames Morse resctrl_take_exclusive_lock(fd); 1299*7168ae33SJames Morse /* code to read and write directory contents */ 1300*7168ae33SJames Morse resctrl_release_lock(fd); 1301*7168ae33SJames Morse } 1302*7168ae33SJames Morse 1303*7168ae33SJames MorseExamples for RDT Monitoring along with allocation usage 1304*7168ae33SJames Morse======================================================= 1305*7168ae33SJames MorseReading monitored data 1306*7168ae33SJames Morse---------------------- 1307*7168ae33SJames MorseReading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would 1308*7168ae33SJames Morseshow the current snapshot of LLC occupancy of the corresponding MON 1309*7168ae33SJames Morsegroup or CTRL_MON group. 1310*7168ae33SJames Morse 1311*7168ae33SJames Morse 1312*7168ae33SJames MorseExample 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group) 1313*7168ae33SJames Morse------------------------------------------------------------------------ 1314*7168ae33SJames MorseOn a two socket machine (one L3 cache per socket) with just four bits 1315*7168ae33SJames Morsefor cache bit masks:: 1316*7168ae33SJames Morse 1317*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 1318*7168ae33SJames Morse # cd /sys/fs/resctrl 1319*7168ae33SJames Morse # mkdir p0 p1 1320*7168ae33SJames Morse # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata 1321*7168ae33SJames Morse # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata 1322*7168ae33SJames Morse # echo 5678 > p1/tasks 1323*7168ae33SJames Morse # echo 5679 > p1/tasks 1324*7168ae33SJames Morse 1325*7168ae33SJames MorseThe default resource group is unmodified, so we have access to all parts 1326*7168ae33SJames Morseof all caches (its schemata file reads "L3:0=f;1=f"). 1327*7168ae33SJames Morse 1328*7168ae33SJames MorseTasks that are under the control of group "p0" may only allocate from the 1329*7168ae33SJames Morse"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. 1330*7168ae33SJames MorseTasks in group "p1" use the "lower" 50% of cache on both sockets. 1331*7168ae33SJames Morse 1332*7168ae33SJames MorseCreate monitor groups and assign a subset of tasks to each monitor group. 1333*7168ae33SJames Morse:: 1334*7168ae33SJames Morse 1335*7168ae33SJames Morse # cd /sys/fs/resctrl/p1/mon_groups 1336*7168ae33SJames Morse # mkdir m11 m12 1337*7168ae33SJames Morse # echo 5678 > m11/tasks 1338*7168ae33SJames Morse # echo 5679 > m12/tasks 1339*7168ae33SJames Morse 1340*7168ae33SJames Morsefetch data (data shown in bytes) 1341*7168ae33SJames Morse:: 1342*7168ae33SJames Morse 1343*7168ae33SJames Morse # cat m11/mon_data/mon_L3_00/llc_occupancy 1344*7168ae33SJames Morse 16234000 1345*7168ae33SJames Morse # cat m11/mon_data/mon_L3_01/llc_occupancy 1346*7168ae33SJames Morse 14789000 1347*7168ae33SJames Morse # cat m12/mon_data/mon_L3_00/llc_occupancy 1348*7168ae33SJames Morse 16789000 1349*7168ae33SJames Morse 1350*7168ae33SJames MorseThe parent ctrl_mon group shows the aggregated data. 1351*7168ae33SJames Morse:: 1352*7168ae33SJames Morse 1353*7168ae33SJames Morse # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 1354*7168ae33SJames Morse 31234000 1355*7168ae33SJames Morse 1356*7168ae33SJames MorseExample 2 (Monitor a task from its creation) 1357*7168ae33SJames Morse-------------------------------------------- 1358*7168ae33SJames MorseOn a two socket machine (one L3 cache per socket):: 1359*7168ae33SJames Morse 1360*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 1361*7168ae33SJames Morse # cd /sys/fs/resctrl 1362*7168ae33SJames Morse # mkdir p0 p1 1363*7168ae33SJames Morse 1364*7168ae33SJames MorseAn RMID is allocated to the group once its created and hence the <cmd> 1365*7168ae33SJames Morsebelow is monitored from its creation. 1366*7168ae33SJames Morse:: 1367*7168ae33SJames Morse 1368*7168ae33SJames Morse # echo $$ > /sys/fs/resctrl/p1/tasks 1369*7168ae33SJames Morse # <cmd> 1370*7168ae33SJames Morse 1371*7168ae33SJames MorseFetch the data:: 1372*7168ae33SJames Morse 1373*7168ae33SJames Morse # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 1374*7168ae33SJames Morse 31789000 1375*7168ae33SJames Morse 1376*7168ae33SJames MorseExample 3 (Monitor without CAT support or before creating CAT groups) 1377*7168ae33SJames Morse--------------------------------------------------------------------- 1378*7168ae33SJames Morse 1379*7168ae33SJames MorseAssume a system like HSW has only CQM and no CAT support. In this case 1380*7168ae33SJames Morsethe resctrl will still mount but cannot create CTRL_MON directories. 1381*7168ae33SJames MorseBut user can create different MON groups within the root group thereby 1382*7168ae33SJames Morseable to monitor all tasks including kernel threads. 1383*7168ae33SJames Morse 1384*7168ae33SJames MorseThis can also be used to profile jobs cache size footprint before being 1385*7168ae33SJames Morseable to allocate them to different allocation groups. 1386*7168ae33SJames Morse:: 1387*7168ae33SJames Morse 1388*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 1389*7168ae33SJames Morse # cd /sys/fs/resctrl 1390*7168ae33SJames Morse # mkdir mon_groups/m01 1391*7168ae33SJames Morse # mkdir mon_groups/m02 1392*7168ae33SJames Morse 1393*7168ae33SJames Morse # echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks 1394*7168ae33SJames Morse # echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks 1395*7168ae33SJames Morse 1396*7168ae33SJames MorseMonitor the groups separately and also get per domain data. From the 1397*7168ae33SJames Morsebelow its apparent that the tasks are mostly doing work on 1398*7168ae33SJames Morsedomain(socket) 0. 1399*7168ae33SJames Morse:: 1400*7168ae33SJames Morse 1401*7168ae33SJames Morse # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy 1402*7168ae33SJames Morse 31234000 1403*7168ae33SJames Morse # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy 1404*7168ae33SJames Morse 34555 1405*7168ae33SJames Morse # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy 1406*7168ae33SJames Morse 31234000 1407*7168ae33SJames Morse # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy 1408*7168ae33SJames Morse 32789 1409*7168ae33SJames Morse 1410*7168ae33SJames Morse 1411*7168ae33SJames MorseExample 4 (Monitor real time tasks) 1412*7168ae33SJames Morse----------------------------------- 1413*7168ae33SJames Morse 1414*7168ae33SJames MorseA single socket system which has real time tasks running on cores 4-7 1415*7168ae33SJames Morseand non real time tasks on other cpus. We want to monitor the cache 1416*7168ae33SJames Morseoccupancy of the real time threads on these cores. 1417*7168ae33SJames Morse:: 1418*7168ae33SJames Morse 1419*7168ae33SJames Morse # mount -t resctrl resctrl /sys/fs/resctrl 1420*7168ae33SJames Morse # cd /sys/fs/resctrl 1421*7168ae33SJames Morse # mkdir p1 1422*7168ae33SJames Morse 1423*7168ae33SJames MorseMove the cpus 4-7 over to p1:: 1424*7168ae33SJames Morse 1425*7168ae33SJames Morse # echo f0 > p1/cpus 1426*7168ae33SJames Morse 1427*7168ae33SJames MorseView the llc occupancy snapshot:: 1428*7168ae33SJames Morse 1429*7168ae33SJames Morse # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 1430*7168ae33SJames Morse 11234000 1431*7168ae33SJames Morse 1432*7168ae33SJames MorseIntel RDT Errata 1433*7168ae33SJames Morse================ 1434*7168ae33SJames Morse 1435*7168ae33SJames MorseIntel MBM Counters May Report System Memory Bandwidth Incorrectly 1436*7168ae33SJames Morse----------------------------------------------------------------- 1437*7168ae33SJames Morse 1438*7168ae33SJames MorseErrata SKX99 for Skylake server and BDF102 for Broadwell server. 1439*7168ae33SJames Morse 1440*7168ae33SJames MorseProblem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics 1441*7168ae33SJames Morseaccording to the assigned Resource Monitor ID (RMID) for that logical 1442*7168ae33SJames Morsecore. The IA32_QM_CTR register (MSR 0xC8E), used to report these 1443*7168ae33SJames Morsemetrics, may report incorrect system bandwidth for certain RMID values. 1444*7168ae33SJames Morse 1445*7168ae33SJames MorseImplication: Due to the errata, system memory bandwidth may not match 1446*7168ae33SJames Morsewhat is reported. 1447*7168ae33SJames Morse 1448*7168ae33SJames MorseWorkaround: MBM total and local readings are corrected according to the 1449*7168ae33SJames Morsefollowing correction factor table: 1450*7168ae33SJames Morse 1451*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1452*7168ae33SJames Morse|core count |rmid count |rmid threshold |correction factor| 1453*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1454*7168ae33SJames Morse|1 |8 |0 |1.000000 | 1455*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1456*7168ae33SJames Morse|2 |16 |0 |1.000000 | 1457*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1458*7168ae33SJames Morse|3 |24 |15 |0.969650 | 1459*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1460*7168ae33SJames Morse|4 |32 |0 |1.000000 | 1461*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1462*7168ae33SJames Morse|6 |48 |31 |0.969650 | 1463*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1464*7168ae33SJames Morse|7 |56 |47 |1.142857 | 1465*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1466*7168ae33SJames Morse|8 |64 |0 |1.000000 | 1467*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1468*7168ae33SJames Morse|9 |72 |63 |1.185115 | 1469*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1470*7168ae33SJames Morse|10 |80 |63 |1.066553 | 1471*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1472*7168ae33SJames Morse|11 |88 |79 |1.454545 | 1473*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1474*7168ae33SJames Morse|12 |96 |0 |1.000000 | 1475*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1476*7168ae33SJames Morse|13 |104 |95 |1.230769 | 1477*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1478*7168ae33SJames Morse|14 |112 |95 |1.142857 | 1479*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1480*7168ae33SJames Morse|15 |120 |95 |1.066667 | 1481*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1482*7168ae33SJames Morse|16 |128 |0 |1.000000 | 1483*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1484*7168ae33SJames Morse|17 |136 |127 |1.254863 | 1485*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1486*7168ae33SJames Morse|18 |144 |127 |1.185255 | 1487*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1488*7168ae33SJames Morse|19 |152 |0 |1.000000 | 1489*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1490*7168ae33SJames Morse|20 |160 |127 |1.066667 | 1491*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1492*7168ae33SJames Morse|21 |168 |0 |1.000000 | 1493*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1494*7168ae33SJames Morse|22 |176 |159 |1.454334 | 1495*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1496*7168ae33SJames Morse|23 |184 |0 |1.000000 | 1497*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1498*7168ae33SJames Morse|24 |192 |127 |0.969744 | 1499*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1500*7168ae33SJames Morse|25 |200 |191 |1.280246 | 1501*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1502*7168ae33SJames Morse|26 |208 |191 |1.230921 | 1503*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1504*7168ae33SJames Morse|27 |216 |0 |1.000000 | 1505*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1506*7168ae33SJames Morse|28 |224 |191 |1.143118 | 1507*7168ae33SJames Morse+---------------+---------------+---------------+-----------------+ 1508*7168ae33SJames Morse 1509*7168ae33SJames MorseIf rmid > rmid threshold, MBM total and local values should be multiplied 1510*7168ae33SJames Morseby the correction factor. 1511*7168ae33SJames Morse 1512*7168ae33SJames MorseSee: 1513*7168ae33SJames Morse 1514*7168ae33SJames Morse1. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification Update: 1515*7168ae33SJames Morsehttp://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html 1516*7168ae33SJames Morse 1517*7168ae33SJames Morse2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family Specification Update: 1518*7168ae33SJames Morsehttp://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf 1519*7168ae33SJames Morse 1520*7168ae33SJames Morse3. The errata in Intel Resource Director Technology (Intel RDT) on 2nd Generation Intel Xeon Scalable Processors Reference Manual: 1521*7168ae33SJames Morsehttps://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html 1522*7168ae33SJames Morse 1523*7168ae33SJames Morsefor further information. 1524