1.. SPDX-License-Identifier: GPL-2.0 2 3Speculative Return Stack Overflow (SRSO) 4======================================== 5 6This is a mitigation for the speculative return stack overflow (SRSO) 7vulnerability found on AMD processors. The mechanism is by now the well 8known scenario of poisoning CPU functional units - the Branch Target 9Buffer (BTB) and Return Address Predictor (RAP) in this case - and then 10tricking the elevated privilege domain (the kernel) into leaking 11sensitive data. 12 13AMD CPUs predict RET instructions using a Return Address Predictor (aka 14Return Address Stack/Return Stack Buffer). In some cases, a non-architectural 15CALL instruction (i.e., an instruction predicted to be a CALL but is 16not actually a CALL) can create an entry in the RAP which may be used 17to predict the target of a subsequent RET instruction. 18 19The specific circumstances that lead to this varies by microarchitecture 20but the concern is that an attacker can mis-train the CPU BTB to predict 21non-architectural CALL instructions in kernel space and use this to 22control the speculative target of a subsequent kernel RET, potentially 23leading to information disclosure via a speculative side-channel. 24 25The issue is tracked under CVE-2023-20569. 26 27Affected processors 28------------------- 29 30AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older 31processors have not been investigated. 32 33System information and options 34------------------------------ 35 36First of all, it is required that the latest microcode be loaded for 37mitigations to be effective. 38 39The sysfs file showing SRSO mitigation status is: 40 41 /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow 42 43The possible values in this file are: 44 45 * 'Not affected': 46 47 The processor is not vulnerable 48 49* 'Vulnerable': 50 51 The processor is vulnerable and no mitigations have been applied. 52 53 * 'Vulnerable: No microcode': 54 55 The processor is vulnerable, no microcode extending IBPB 56 functionality to address the vulnerability has been applied. 57 58 * 'Vulnerable: Safe RET, no microcode': 59 60 The "Safe RET" mitigation (see below) has been applied to protect the 61 kernel, but the IBPB-extending microcode has not been applied. User 62 space tasks may still be vulnerable. 63 64 * 'Vulnerable: Microcode, no safe RET': 65 66 Extended IBPB functionality microcode patch has been applied. It does 67 not address User->Kernel and Guest->Host transitions protection but it 68 does address User->User and VM->VM attack vectors. 69 70 Note that User->User mitigation is controlled by how the IBPB aspect in 71 the Spectre v2 mitigation is selected: 72 73 * conditional IBPB: 74 75 where each process can select whether it needs an IBPB issued 76 around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre` 77 78 * strict: 79 80 i.e., always on - by supplying spectre_v2_user=on on the kernel 81 command line 82 83 (spec_rstack_overflow=microcode) 84 85 * 'Mitigation: Safe RET': 86 87 Combined microcode/software mitigation. It complements the 88 extended IBPB microcode patch functionality by addressing 89 User->Kernel and Guest->Host transitions protection. 90 91 Selected by default or by spec_rstack_overflow=safe-ret 92 93 * 'Mitigation: IBPB': 94 95 Similar protection as "safe RET" above but employs an IBPB barrier on 96 privilege domain crossings (User->Kernel, Guest->Host). 97 98 (spec_rstack_overflow=ibpb) 99 100 * 'Mitigation: IBPB on VMEXIT': 101 102 Mitigation addressing the cloud provider scenario - the Guest->Host 103 transitions only. 104 105 (spec_rstack_overflow=ibpb-vmexit) 106 107 108 109In order to exploit vulnerability, an attacker needs to: 110 111 - gain local access on the machine 112 113 - break kASLR 114 115 - find gadgets in the running kernel in order to use them in the exploit 116 117 - potentially create and pin an additional workload on the sibling 118 thread, depending on the microarchitecture (not necessary on fam 0x19) 119 120 - run the exploit 121 122Considering the performance implications of each mitigation type, the 123default one is 'Mitigation: safe RET' which should take care of most 124attack vectors, including the local User->Kernel one. 125 126As always, the user is advised to keep her/his system up-to-date by 127applying software updates regularly. 128 129The default setting will be reevaluated when needed and especially when 130new attack vectors appear. 131 132As one can surmise, 'Mitigation: safe RET' does come at the cost of some 133performance depending on the workload. If one trusts her/his userspace 134and does not want to suffer the performance impact, one can always 135disable the mitigation with spec_rstack_overflow=off. 136 137Similarly, 'Mitigation: IBPB' is another full mitigation type employing 138an indirect branch prediction barrier after having applied the required 139microcode patch for one's system. This mitigation comes also at 140a performance cost. 141 142Mitigation: Safe RET 143-------------------- 144 145The mitigation works by ensuring all RET instructions speculate to 146a controlled location, similar to how speculation is controlled in the 147retpoline sequence. To accomplish this, the __x86_return_thunk forces 148the CPU to mispredict every function return using a 'safe return' 149sequence. 150 151To ensure the safety of this mitigation, the kernel must ensure that the 152safe return sequence is itself free from attacker interference. In Zen3 153and Zen4, this is accomplished by creating a BTB alias between the 154untraining function srso_alias_untrain_ret() and the safe return 155function srso_alias_safe_ret() which results in evicting a potentially 156poisoned BTB entry and using that safe one for all function returns. 157 158In older Zen1 and Zen2, this is accomplished using a reinterpretation 159technique similar to Retbleed one: srso_untrain_ret() and 160srso_safe_ret(). 161 162Checking the safe RET mitigation actually works 163----------------------------------------------- 164 165In case one wants to validate whether the SRSO safe RET mitigation works 166on a kernel, one could use two performance counters 167 168* PMC_0xc8 - Count of RET/RET lw retired 169* PMC_0xc9 - Count of RET/RET lw retired mispredicted 170 171and compare the number of RETs retired properly vs those retired 172mispredicted, in kernel mode. Another way of specifying those events 173is:: 174 175 # perf list ex_ret_near_ret 176 177 List of pre-defined events (to be used in -e or -M): 178 179 core: 180 ex_ret_near_ret 181 [Retired Near Returns] 182 ex_ret_near_ret_mispred 183 [Retired Near Returns Mispredicted] 184 185Either the command using the event mnemonics:: 186 187 # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s 188 189or using the raw PMC numbers:: 190 191 # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 192 193should give the same amount. I.e., every RET retired should be 194mispredicted:: 195 196 [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 197 198 Performance counter stats for 'sleep 10s': 199 200 137,167 cpu/event=0xc8,umask=0/k 201 137,173 cpu/event=0xc9,umask=0/k 202 203 10.004110303 seconds time elapsed 204 205 0.000000000 seconds user 206 0.004462000 seconds sys 207 208vs the case when the mitigation is disabled (spec_rstack_overflow=off) 209or not functioning properly, showing usually a lot smaller number of 210mispredicted retired RETs vs the overall count of retired RETs during 211a workload:: 212 213 [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s 214 215 Performance counter stats for 'sleep 10s': 216 217 201,627 cpu/event=0xc8,umask=0/k 218 4,074 cpu/event=0xc9,umask=0/k 219 220 10.003267252 seconds time elapsed 221 222 0.002729000 seconds user 223 0.000000000 seconds sys 224 225Also, there is a selftest which performs the above, go to 226tools/testing/selftests/x86/ and do:: 227 228 make srso 229 ./srso 230