1.. SPDX-License-Identifier: GPL-2.0 2 3====================================================== 4Control-flow Enforcement Technology (CET) Shadow Stack 5====================================================== 6 7CET Background 8============== 9 10Control-flow Enforcement Technology (CET) covers several related x86 processor 11features that provide protection against control flow hijacking attacks. CET 12can protect both applications and the kernel. 13 14CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack 15is a secondary stack allocated from memory which cannot be directly modified by 16applications. When executing a CALL instruction, the processor pushes the 17return address to both the normal stack and the shadow stack. Upon 18function return, the processor pops the shadow stack copy and compares it 19to the normal stack copy. If the two differ, the processor raises a 20control-protection fault. IBT verifies indirect CALL/JMP targets are intended 21as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow 22Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace 23shadow stack and kernel IBT are supported. 24 25Requirements to use Shadow Stack 26================================ 27 28To use userspace shadow stack you need HW that supports it, a kernel 29configured with it and userspace libraries compiled with it. 30 31The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow 32stacks can be disabled at runtime with the kernel parameter: nousershstk. 33 34To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later 35are required. 36 37At run time, /proc/cpuinfo shows CET features if the processor supports 38CET. "user_shstk" means that userspace shadow stack is supported on the current 39kernel and HW. 40 41Application Enabling 42==================== 43 44An application's CET capability is marked in its ELF note and can be verified 45from readelf/llvm-readelf output:: 46 47 readelf -n <application> | grep -a SHSTK 48 properties: x86 feature: SHSTK 49 50The kernel does not process these applications markers directly. Applications 51or loaders must enable CET features using the interface described in section 4. 52Typically this would be done in dynamic loader or static runtime objects, as is 53the case in GLIBC. 54 55Enabling arch_prctl()'s 56======================= 57 58Elf features should be enabled by the loader using the below arch_prctl's. They 59are only supported in 64 bit user applications. These operate on the features 60on a per-thread basis. The enablement status is inherited on clone, so if the 61feature is enabled on the first thread, it will propagate to all the thread's 62in an app. 63 64arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) 65 Enable a single feature specified in 'feature'. Can only operate on 66 one feature at a time. 67 68arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) 69 Disable a single feature specified in 'feature'. Can only operate on 70 one feature at a time. 71 72arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) 73 Lock in features at their current enabled or disabled status. 'features' 74 is a mask of all features to lock. All bits set are processed, unset bits 75 are ignored. The mask is ORed with the existing value. So any feature bits 76 set here cannot be enabled or disabled afterwards. 77 78arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) 79 Unlock features. 'features' is a mask of all features to unlock. All 80 bits set are processed, unset bits are ignored. Only works via ptrace. 81 82arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr) 83 Copy the currently enabled features to the address passed in addr. The 84 features are described using the bits passed into the others in 85 'features'. 86 87The return values are as follows. On success, return 0. On error, errno can 88be:: 89 90 -EPERM if any of the passed feature are locked. 91 -ENOTSUPP if the feature is not supported by the hardware or 92 kernel. 93 -EINVAL arguments (non existing feature, etc) 94 -EFAULT if could not copy information back to userspace 95 96The feature's bits supported are:: 97 98 ARCH_SHSTK_SHSTK - Shadow stack 99 ARCH_SHSTK_WRSS - WRSS 100 101Currently shadow stack and WRSS are supported via this interface. WRSS 102can only be enabled with shadow stack, and is automatically disabled 103if shadow stack is disabled. 104 105Proc Status 106=========== 107To check if an application is actually running with shadow stack, the 108user can read the /proc/$PID/status. It will report "wrss" or "shstk" 109depending on what is enabled. The lines look like this:: 110 111 x86_Thread_features: shstk wrss 112 x86_Thread_features_locked: shstk wrss 113 114Implementation of the Shadow Stack 115================================== 116 117Shadow Stack Size 118----------------- 119 120A task's shadow stack is allocated from memory to a fixed size of 121MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to 122the maximum size of the normal stack, but capped to 4 GB. In the case 123of the clone3 syscall, there is a stack size passed in and shadow stack 124uses this instead of the rlimit. 125 126Signal 127------ 128 129The main program and its signal handlers use the same shadow stack. Because 130the shadow stack stores only return addresses, a large shadow stack covers 131the condition that both the program stack and the signal alternate stack run 132out. 133 134When a signal happens, the old pre-signal state is pushed on the stack. When 135shadow stack is enabled, the shadow stack specific state is pushed onto the 136shadow stack. Today this is only the old SSP (shadow stack pointer), pushed 137in a special format with bit 63 set. On sigreturn this old SSP token is 138verified and restored by the kernel. The kernel will also push the normal 139restorer address to the shadow stack to help userspace avoid a shadow stack 140violation on the sigreturn path that goes through the restorer. 141 142So the shadow stack signal frame format is as follows:: 143 144 |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format 145 (bit 63 set to 1) 146 | ...| - Other state may be added in the future 147 148 14932 bit ABI signals are not supported in shadow stack processes. Linux prevents 15032 bit execution while shadow stack is enabled by the allocating shadow stacks 151outside of the 32 bit address space. When execution enters 32 bit mode, either 152via far call or returning to userspace, a #GP is generated by the hardware 153which, will be delivered to the process as a segfault. When transitioning to 154userspace the register's state will be as if the userspace ip being returned to 155caused the segfault. 156 157Fork 158---- 159 160The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required 161to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a 162shadow access triggers a page fault with the shadow stack access bit set 163in the page fault error code. 164 165When a task forks a child, its shadow stack PTEs are copied and both the 166parent's and the child's shadow stack PTEs are cleared of the dirty bit. 167Upon the next shadow stack access, the resulting shadow stack page fault 168is handled by page copy/re-use. 169 170When a pthread child is created, the kernel allocates a new shadow stack 171for the new thread. New shadow stack creation behaves like mmap() with respect 172to ASLR behavior. Similarly, on thread exit the thread's shadow stack is 173disabled. 174 175Exec 176---- 177 178On exec, shadow stack features are disabled by the kernel. At which point, 179userspace can choose to re-enable, or lock them. 180