1Microarchitectural Data Sampling (MDS) mitigation 2================================================= 3 4.. _mds: 5 6Overview 7-------- 8 9Microarchitectural Data Sampling (MDS) is a family of side channel attacks 10on internal buffers in Intel CPUs. The variants are: 11 12 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) 13 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) 14 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) 15 - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) 16 17MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a 18dependent load (store-to-load forwarding) as an optimization. The forward 19can also happen to a faulting or assisting load operation for a different 20memory address, which can be exploited under certain conditions. Store 21buffers are partitioned between Hyper-Threads so cross thread forwarding is 22not possible. But if a thread enters or exits a sleep state the store 23buffer is repartitioned which can expose data from one thread to the other. 24 25MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage 26L1 miss situations and to hold data which is returned or sent in response 27to a memory or I/O operation. Fill buffers can forward data to a load 28operation and also write data to the cache. When the fill buffer is 29deallocated it can retain the stale data of the preceding operations which 30can then be forwarded to a faulting or assisting load operation, which can 31be exploited under certain conditions. Fill buffers are shared between 32Hyper-Threads so cross thread leakage is possible. 33 34MLPDS leaks Load Port Data. Load ports are used to perform load operations 35from memory or I/O. The received data is then forwarded to the register 36file or a subsequent operation. In some implementations the Load Port can 37contain stale data from a previous operation which can be forwarded to 38faulting or assisting loads under certain conditions, which again can be 39exploited eventually. Load ports are shared between Hyper-Threads so cross 40thread leakage is possible. 41 42MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from 43memory that takes a fault or assist can leave data in a microarchitectural 44structure that may later be observed using one of the same methods used by 45MSBDS, MFBDS or MLPDS. 46 47Exposure assumptions 48-------------------- 49 50It is assumed that attack code resides in user space or in a guest with one 51exception. The rationale behind this assumption is that the code construct 52needed for exploiting MDS requires: 53 54 - to control the load to trigger a fault or assist 55 56 - to have a disclosure gadget which exposes the speculatively accessed 57 data for consumption through a side channel. 58 59 - to control the pointer through which the disclosure gadget exposes the 60 data 61 62The existence of such a construct in the kernel cannot be excluded with 63100% certainty, but the complexity involved makes it extremely unlikely. 64 65There is one exception, which is untrusted BPF. The functionality of 66untrusted BPF is limited, but it needs to be thoroughly investigated 67whether it can be used to create such a construct. 68 69 70Mitigation strategy 71------------------- 72 73All variants have the same mitigation strategy at least for the single CPU 74thread case (SMT off): Force the CPU to clear the affected buffers. 75 76This is achieved by using the otherwise unused and obsolete VERW 77instruction in combination with a microcode update. The microcode clears 78the affected CPU buffers when the VERW instruction is executed. 79 80For virtualization there are two ways to achieve CPU buffer 81clearing. Either the modified VERW instruction or via the L1D Flush 82command. The latter is issued when L1TF mitigation is enabled so the extra 83VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to 84be issued. 85 86If the VERW instruction with the supplied segment selector argument is 87executed on a CPU without the microcode update there is no side effect 88other than a small number of pointlessly wasted CPU cycles. 89 90This does not protect against cross Hyper-Thread attacks except for MSBDS 91which is only exploitable cross Hyper-thread when one of the Hyper-Threads 92enters a C-state. 93 94The kernel provides a function to invoke the buffer clearing: 95 96 mds_clear_cpu_buffers() 97 98Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path. 99Other than CFLAGS.ZF, this macro doesn't clobber any registers. 100 101The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state 102(idle) transitions. 103 104As a special quirk to address virtualization scenarios where the host has 105the microcode updated, but the hypervisor does not (yet) expose the 106MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the 107hope that it might actually clear the buffers. The state is reflected 108accordingly. 109 110According to current knowledge additional mitigations inside the kernel 111itself are not required because the necessary gadgets to expose the leaked 112data cannot be controlled in a way which allows exploitation from malicious 113user space or VM guests. 114 115Kernel internal mitigation modes 116-------------------------------- 117 118 ======= ============================================================ 119 off Mitigation is disabled. Either the CPU is not affected or 120 mds=off is supplied on the kernel command line 121 122 full Mitigation is enabled. CPU is affected and MD_CLEAR is 123 advertised in CPUID. 124 125 vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not 126 advertised in CPUID. That is mainly for virtualization 127 scenarios where the host has the updated microcode but the 128 hypervisor does not expose MD_CLEAR in CPUID. It's a best 129 effort approach without guarantee. 130 ======= ============================================================ 131 132If the CPU is affected and mds=off is not supplied on the kernel command 133line then the kernel selects the appropriate mitigation mode depending on 134the availability of the MD_CLEAR CPUID bit. 135 136Mitigation points 137----------------- 138 1391. Return to user space 140^^^^^^^^^^^^^^^^^^^^^^^ 141 142 When transitioning from kernel to user space the CPU buffers are flushed 143 on affected CPUs when the mitigation is not disabled on the kernel 144 command line. The mitigation is enabled through the feature flag 145 X86_FEATURE_CLEAR_CPU_BUF. 146 147 The mitigation is invoked just before transitioning to userspace after 148 user registers are restored. This is done to minimize the window in 149 which kernel data could be accessed after VERW e.g. via an NMI after 150 VERW. 151 152 **Corner case not handled** 153 Interrupts returning to kernel don't clear CPUs buffers since the 154 exit-to-user path is expected to do that anyways. But, there could be 155 a case when an NMI is generated in kernel after the exit-to-user path 156 has cleared the buffers. This case is not handled and NMI returning to 157 kernel don't clear CPU buffers because: 158 159 1. It is rare to get an NMI after VERW, but before returning to userspace. 160 2. For an unprivileged user, there is no known way to make that NMI 161 less rare or target it. 162 3. It would take a large number of these precisely-timed NMIs to mount 163 an actual attack. There's presumably not enough bandwidth. 164 4. The NMI in question occurs after a VERW, i.e. when user state is 165 restored and most interesting data is already scrubbed. Whats left 166 is only the data that NMI touches, and that may or may not be of 167 any interest. 168 169 1702. C-State transition 171^^^^^^^^^^^^^^^^^^^^^ 172 173 When a CPU goes idle and enters a C-State the CPU buffers need to be 174 cleared on affected CPUs when SMT is active. This addresses the 175 repartitioning of the store buffer when one of the Hyper-Threads enters 176 a C-State. 177 178 When SMT is inactive, i.e. either the CPU does not support it or all 179 sibling threads are offline CPU buffer clearing is not required. 180 181 The idle clearing is enabled on CPUs which are only affected by MSBDS 182 and not by any other MDS variant. The other MDS variants cannot be 183 protected against cross Hyper-Thread attacks because the Fill Buffer and 184 the Load Ports are shared. So on CPUs affected by other variants, the 185 idle clearing would be a window dressing exercise and is therefore not 186 activated. 187 188 The invocation is controlled by the static key mds_idle_clear which is 189 switched depending on the chosen mitigation mode and the SMT state of 190 the system. 191 192 The buffer clear is only invoked before entering the C-State to prevent 193 that stale data from the idling CPU from spilling to the Hyper-Thread 194 sibling after the store buffer got repartitioned and all entries are 195 available to the non idle sibling. 196 197 When coming out of idle the store buffer is partitioned again so each 198 sibling has half of it available. The back from idle CPU could be then 199 speculatively exposed to contents of the sibling. The buffers are 200 flushed either on exit to user space or on VMENTER so malicious code 201 in user space or the guest cannot speculatively access them. 202 203 The mitigation is hooked into all variants of halt()/mwait(), but does 204 not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver 205 has been superseded by the intel_idle driver around 2010 and is 206 preferred on all affected CPUs which are expected to gain the MD_CLEAR 207 functionality in microcode. Aside of that the IO-Port mechanism is a 208 legacy interface which is only used on older systems which are either 209 not affected or do not receive microcode updates anymore. 210