1*f89f20acSMark Rutland=================== 2*f89f20acSMark RutlandReliable Stacktrace 3*f89f20acSMark Rutland=================== 4*f89f20acSMark Rutland 5*f89f20acSMark RutlandThis document outlines basic information about reliable stacktracing. 6*f89f20acSMark Rutland 7*f89f20acSMark Rutland.. Table of Contents: 8*f89f20acSMark Rutland 9*f89f20acSMark Rutland.. contents:: :local: 10*f89f20acSMark Rutland 11*f89f20acSMark Rutland1. Introduction 12*f89f20acSMark Rutland=============== 13*f89f20acSMark Rutland 14*f89f20acSMark RutlandThe kernel livepatch consistency model relies on accurately identifying which 15*f89f20acSMark Rutlandfunctions may have live state and therefore may not be safe to patch. One way 16*f89f20acSMark Rutlandto identify which functions are live is to use a stacktrace. 17*f89f20acSMark Rutland 18*f89f20acSMark RutlandExisting stacktrace code may not always give an accurate picture of all 19*f89f20acSMark Rutlandfunctions with live state, and best-effort approaches which can be helpful for 20*f89f20acSMark Rutlanddebugging are unsound for livepatching. Livepatching depends on architectures 21*f89f20acSMark Rutlandto provide a *reliable* stacktrace which ensures it never omits any live 22*f89f20acSMark Rutlandfunctions from a trace. 23*f89f20acSMark Rutland 24*f89f20acSMark Rutland 25*f89f20acSMark Rutland2. Requirements 26*f89f20acSMark Rutland=============== 27*f89f20acSMark Rutland 28*f89f20acSMark RutlandArchitectures must implement one of the reliable stacktrace functions. 29*f89f20acSMark RutlandArchitectures using CONFIG_ARCH_STACKWALK must implement 30*f89f20acSMark Rutland'arch_stack_walk_reliable', and other architectures must implement 31*f89f20acSMark Rutland'save_stack_trace_tsk_reliable'. 32*f89f20acSMark Rutland 33*f89f20acSMark RutlandPrincipally, the reliable stacktrace function must ensure that either: 34*f89f20acSMark Rutland 35*f89f20acSMark Rutland* The trace includes all functions that the task may be returned to, and the 36*f89f20acSMark Rutland return code is zero to indicate that the trace is reliable. 37*f89f20acSMark Rutland 38*f89f20acSMark Rutland* The return code is non-zero to indicate that the trace is not reliable. 39*f89f20acSMark Rutland 40*f89f20acSMark Rutland.. note:: 41*f89f20acSMark Rutland In some cases it is legitimate to omit specific functions from the trace, 42*f89f20acSMark Rutland but all other functions must be reported. These cases are described in 43*f89f20acSMark Rutland futher detail below. 44*f89f20acSMark Rutland 45*f89f20acSMark RutlandSecondly, the reliable stacktrace function must be robust to cases where 46*f89f20acSMark Rutlandthe stack or other unwind state is corrupt or otherwise unreliable. The 47*f89f20acSMark Rutlandfunction should attempt to detect such cases and return a non-zero error 48*f89f20acSMark Rutlandcode, and should not get stuck in an infinite loop or access memory in 49*f89f20acSMark Rutlandan unsafe way. Specific cases are described in further detail below. 50*f89f20acSMark Rutland 51*f89f20acSMark Rutland 52*f89f20acSMark Rutland3. Compile-time analysis 53*f89f20acSMark Rutland======================== 54*f89f20acSMark Rutland 55*f89f20acSMark RutlandTo ensure that kernel code can be correctly unwound in all cases, 56*f89f20acSMark Rutlandarchitectures may need to verify that code has been compiled in a manner 57*f89f20acSMark Rutlandexpected by the unwinder. For example, an unwinder may expect that 58*f89f20acSMark Rutlandfunctions manipulate the stack pointer in a limited way, or that all 59*f89f20acSMark Rutlandfunctions use specific prologue and epilogue sequences. Architectures 60*f89f20acSMark Rutlandwith such requirements should verify the kernel compilation using 61*f89f20acSMark Rutlandobjtool. 62*f89f20acSMark Rutland 63*f89f20acSMark RutlandIn some cases, an unwinder may require metadata to correctly unwind. 64*f89f20acSMark RutlandWhere necessary, this metadata should be generated at build time using 65*f89f20acSMark Rutlandobjtool. 66*f89f20acSMark Rutland 67*f89f20acSMark Rutland 68*f89f20acSMark Rutland4. Considerations 69*f89f20acSMark Rutland================= 70*f89f20acSMark Rutland 71*f89f20acSMark RutlandThe unwinding process varies across architectures, their respective procedure 72*f89f20acSMark Rutlandcall standards, and kernel configurations. This section describes common 73*f89f20acSMark Rutlanddetails that architectures should consider. 74*f89f20acSMark Rutland 75*f89f20acSMark Rutland4.1 Identifying successful termination 76*f89f20acSMark Rutland-------------------------------------- 77*f89f20acSMark Rutland 78*f89f20acSMark RutlandUnwinding may terminate early for a number of reasons, including: 79*f89f20acSMark Rutland 80*f89f20acSMark Rutland* Stack or frame pointer corruption. 81*f89f20acSMark Rutland 82*f89f20acSMark Rutland* Missing unwind support for an uncommon scenario, or a bug in the unwinder. 83*f89f20acSMark Rutland 84*f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime 85*f89f20acSMark Rutland services) not following the conventions expected by the unwinder. 86*f89f20acSMark Rutland 87*f89f20acSMark RutlandTo ensure that this does not result in functions being omitted from the trace, 88*f89f20acSMark Rutlandeven if not caught by other checks, it is strongly recommended that 89*f89f20acSMark Rutlandarchitectures verify that a stacktrace ends at an expected location, e.g. 90*f89f20acSMark Rutland 91*f89f20acSMark Rutland* Within a specific function that is an entry point to the kernel. 92*f89f20acSMark Rutland 93*f89f20acSMark Rutland* At a specific location on a stack expected for a kernel entry point. 94*f89f20acSMark Rutland 95*f89f20acSMark Rutland* On a specific stack expected for a kernel entry point (e.g. if the 96*f89f20acSMark Rutland architecture has separate task and IRQ stacks). 97*f89f20acSMark Rutland 98*f89f20acSMark Rutland4.2 Identifying unwindable code 99*f89f20acSMark Rutland------------------------------- 100*f89f20acSMark Rutland 101*f89f20acSMark RutlandUnwinding typically relies on code following specific conventions (e.g. 102*f89f20acSMark Rutlandmanipulating a frame pointer), but there can be code which may not follow these 103*f89f20acSMark Rutlandconventions and may require special handling in the unwinder, e.g. 104*f89f20acSMark Rutland 105*f89f20acSMark Rutland* Exception vectors and entry assembly. 106*f89f20acSMark Rutland 107*f89f20acSMark Rutland* Procedure Linkage Table (PLT) entries and veneer functions. 108*f89f20acSMark Rutland 109*f89f20acSMark Rutland* Trampoline assembly (e.g. ftrace, kprobes). 110*f89f20acSMark Rutland 111*f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF, optprobe trampolines). 112*f89f20acSMark Rutland 113*f89f20acSMark Rutland* Foreign code (e.g. EFI runtime services). 114*f89f20acSMark Rutland 115*f89f20acSMark RutlandTo ensure that such cases do not result in functions being omitted from a 116*f89f20acSMark Rutlandtrace, it is strongly recommended that architectures positively identify code 117*f89f20acSMark Rutlandwhich is known to be reliable to unwind from, and reject unwinding from all 118*f89f20acSMark Rutlandother code. 119*f89f20acSMark Rutland 120*f89f20acSMark RutlandKernel code including modules and eBPF can be distinguished from foreign code 121*f89f20acSMark Rutlandusing '__kernel_text_address()'. Checking for this also helps to detect stack 122*f89f20acSMark Rutlandcorruption. 123*f89f20acSMark Rutland 124*f89f20acSMark RutlandThere are several ways an architecture may identify kernel code which is deemed 125*f89f20acSMark Rutlandunreliable to unwind from, e.g. 126*f89f20acSMark Rutland 127*f89f20acSMark Rutland* Placing such code into special linker sections, and rejecting unwinding from 128*f89f20acSMark Rutland any code in these sections. 129*f89f20acSMark Rutland 130*f89f20acSMark Rutland* Identifying specific portions of code using bounds information. 131*f89f20acSMark Rutland 132*f89f20acSMark Rutland4.3 Unwinding across interrupts and exceptions 133*f89f20acSMark Rutland---------------------------------------------- 134*f89f20acSMark Rutland 135*f89f20acSMark RutlandAt function call boundaries the stack and other unwind state is expected to be 136*f89f20acSMark Rutlandin a consistent state suitable for reliable unwinding, but this may not be the 137*f89f20acSMark Rutlandcase part-way through a function. For example, during a function prologue or 138*f89f20acSMark Rutlandepilogue a frame pointer may be transiently invalid, or during the function 139*f89f20acSMark Rutlandbody the return address may be held in an arbitrary general purpose register. 140*f89f20acSMark RutlandFor some architectures this may change at runtime as a result of dynamic 141*f89f20acSMark Rutlandinstrumentation. 142*f89f20acSMark Rutland 143*f89f20acSMark RutlandIf an interrupt or other exception is taken while the stack or other unwind 144*f89f20acSMark Rutlandstate is in an inconsistent state, it may not be possible to reliably unwind, 145*f89f20acSMark Rutlandand it may not be possible to identify whether such unwinding will be reliable. 146*f89f20acSMark RutlandSee below for examples. 147*f89f20acSMark Rutland 148*f89f20acSMark RutlandArchitectures which cannot identify when it is reliable to unwind such cases 149*f89f20acSMark Rutland(or where it is never reliable) must reject unwinding across exception 150*f89f20acSMark Rutlandboundaries. Note that it may be reliable to unwind across certain 151*f89f20acSMark Rutlandexceptions (e.g. IRQ) but unreliable to unwind across other exceptions 152*f89f20acSMark Rutland(e.g. NMI). 153*f89f20acSMark Rutland 154*f89f20acSMark RutlandArchitectures which can identify when it is reliable to unwind such cases (or 155*f89f20acSMark Rutlandhave no such cases) should attempt to unwind across exception boundaries, as 156*f89f20acSMark Rutlanddoing so can prevent unnecessarily stalling livepatch consistency checks and 157*f89f20acSMark Rutlandpermits livepatch transitions to complete more quickly. 158*f89f20acSMark Rutland 159*f89f20acSMark Rutland4.4 Rewriting of return addresses 160*f89f20acSMark Rutland--------------------------------- 161*f89f20acSMark Rutland 162*f89f20acSMark RutlandSome trampolines temporarily modify the return address of a function in order 163*f89f20acSMark Rutlandto intercept when that function returns with a return trampoline, e.g. 164*f89f20acSMark Rutland 165*f89f20acSMark Rutland* An ftrace trampoline may modify the return address so that function graph 166*f89f20acSMark Rutland tracing can intercept returns. 167*f89f20acSMark Rutland 168*f89f20acSMark Rutland* A kprobes (or optprobes) trampoline may modify the return address so that 169*f89f20acSMark Rutland kretprobes can intercept returns. 170*f89f20acSMark Rutland 171*f89f20acSMark RutlandWhen this happens, the original return address will not be in its usual 172*f89f20acSMark Rutlandlocation. For trampolines which are not subject to live patching, where an 173*f89f20acSMark Rutlandunwinder can reliably determine the original return address and no unwind state 174*f89f20acSMark Rutlandis altered by the trampoline, the unwinder may report the original return 175*f89f20acSMark Rutlandaddress in place of the trampoline and report this as reliable. Otherwise, an 176*f89f20acSMark Rutlandunwinder must report these cases as unreliable. 177*f89f20acSMark Rutland 178*f89f20acSMark RutlandSpecial care is required when identifying the original return address, as this 179*f89f20acSMark Rutlandinformation is not in a consistent location for the duration of the entry 180*f89f20acSMark Rutlandtrampoline or return trampoline. For example, considering the x86_64 181*f89f20acSMark Rutland'return_to_handler' return trampoline: 182*f89f20acSMark Rutland 183*f89f20acSMark Rutland.. code-block:: none 184*f89f20acSMark Rutland 185*f89f20acSMark Rutland SYM_CODE_START(return_to_handler) 186*f89f20acSMark Rutland UNWIND_HINT_EMPTY 187*f89f20acSMark Rutland subq $24, %rsp 188*f89f20acSMark Rutland 189*f89f20acSMark Rutland /* Save the return values */ 190*f89f20acSMark Rutland movq %rax, (%rsp) 191*f89f20acSMark Rutland movq %rdx, 8(%rsp) 192*f89f20acSMark Rutland movq %rbp, %rdi 193*f89f20acSMark Rutland 194*f89f20acSMark Rutland call ftrace_return_to_handler 195*f89f20acSMark Rutland 196*f89f20acSMark Rutland movq %rax, %rdi 197*f89f20acSMark Rutland movq 8(%rsp), %rdx 198*f89f20acSMark Rutland movq (%rsp), %rax 199*f89f20acSMark Rutland addq $24, %rsp 200*f89f20acSMark Rutland JMP_NOSPEC rdi 201*f89f20acSMark Rutland SYM_CODE_END(return_to_handler) 202*f89f20acSMark Rutland 203*f89f20acSMark RutlandWhile the traced function runs its return address on the stack points to 204*f89f20acSMark Rutlandthe start of return_to_handler, and the original return address is stored in 205*f89f20acSMark Rutlandthe task's cur_ret_stack. During this time the unwinder can find the return 206*f89f20acSMark Rutlandaddress using ftrace_graph_ret_addr(). 207*f89f20acSMark Rutland 208*f89f20acSMark RutlandWhen the traced function returns to return_to_handler, there is no longer a 209*f89f20acSMark Rutlandreturn address on the stack, though the original return address is still stored 210*f89f20acSMark Rutlandin the task's cur_ret_stack. Within ftrace_return_to_handler(), the original 211*f89f20acSMark Rutlandreturn address is removed from cur_ret_stack and is transiently moved 212*f89f20acSMark Rutlandarbitrarily by the compiler before being returned in rax. The return_to_handler 213*f89f20acSMark Rutlandtrampoline moves this into rdi before jumping to it. 214*f89f20acSMark Rutland 215*f89f20acSMark RutlandArchitectures might not always be able to unwind such sequences, such as when 216*f89f20acSMark Rutlandftrace_return_to_handler() has removed the address from cur_ret_stack, and the 217*f89f20acSMark Rutlandlocation of the return address cannot be reliably determined. 218*f89f20acSMark Rutland 219*f89f20acSMark RutlandIt is recommended that architectures unwind cases where return_to_handler has 220*f89f20acSMark Rutlandnot yet been returned to, but architectures are not required to unwind from the 221*f89f20acSMark Rutlandmiddle of return_to_handler and can report this as unreliable. Architectures 222*f89f20acSMark Rutlandare not required to unwind from other trampolines which modify the return 223*f89f20acSMark Rutlandaddress. 224*f89f20acSMark Rutland 225*f89f20acSMark Rutland4.5 Obscuring of return addresses 226*f89f20acSMark Rutland--------------------------------- 227*f89f20acSMark Rutland 228*f89f20acSMark RutlandSome trampolines do not rewrite the return address in order to intercept 229*f89f20acSMark Rutlandreturns, but do transiently clobber the return address or other unwind state. 230*f89f20acSMark Rutland 231*f89f20acSMark RutlandFor example, the x86_64 implementation of optprobes patches the probed function 232*f89f20acSMark Rutlandwith a JMP instruction which targets the associated optprobe trampoline. When 233*f89f20acSMark Rutlandthe probe is hit, the CPU will branch to the optprobe trampoline, and the 234*f89f20acSMark Rutlandaddress of the probed function is not held in any register or on the stack. 235*f89f20acSMark Rutland 236*f89f20acSMark RutlandSimilarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced 237*f89f20acSMark Rutlandfunctions with the following: 238*f89f20acSMark Rutland 239*f89f20acSMark Rutland.. code-block:: none 240*f89f20acSMark Rutland 241*f89f20acSMark Rutland MOV X9, X30 242*f89f20acSMark Rutland BL <trampoline> 243*f89f20acSMark Rutland 244*f89f20acSMark RutlandThe MOV saves the link register (X30) into X9 to preserve the return address 245*f89f20acSMark Rutlandbefore the BL clobbers the link register and branches to the trampoline. At the 246*f89f20acSMark Rutlandstart of the trampoline, the address of the traced function is in X9 rather 247*f89f20acSMark Rutlandthan the link register as would usually be the case. 248*f89f20acSMark Rutland 249*f89f20acSMark RutlandArchitectures must either ensure that unwinders either reliably unwind 250*f89f20acSMark Rutlandsuch cases, or report the unwinding as unreliable. 251*f89f20acSMark Rutland 252*f89f20acSMark Rutland4.6 Link register unreliability 253*f89f20acSMark Rutland------------------------------- 254*f89f20acSMark Rutland 255*f89f20acSMark RutlandOn some other architectures, 'call' instructions place the return address into a 256*f89f20acSMark Rutlandlink register, and 'return' instructions consume the return address from the 257*f89f20acSMark Rutlandlink register without modifying the register. On these architectures software 258*f89f20acSMark Rutlandmust save the return address to the stack prior to making a function call. Over 259*f89f20acSMark Rutlandthe duration of a function call, the return address may be held in the link 260*f89f20acSMark Rutlandregister alone, on the stack alone, or in both locations. 261*f89f20acSMark Rutland 262*f89f20acSMark RutlandUnwinders typically assume the link register is always live, but this 263*f89f20acSMark Rutlandassumption can lead to unreliable stack traces. For example, consider the 264*f89f20acSMark Rutlandfollowing arm64 assembly for a simple function: 265*f89f20acSMark Rutland 266*f89f20acSMark Rutland.. code-block:: none 267*f89f20acSMark Rutland 268*f89f20acSMark Rutland function: 269*f89f20acSMark Rutland STP X29, X30, [SP, -16]! 270*f89f20acSMark Rutland MOV X29, SP 271*f89f20acSMark Rutland BL <other_function> 272*f89f20acSMark Rutland LDP X29, X30, [SP], #16 273*f89f20acSMark Rutland RET 274*f89f20acSMark Rutland 275*f89f20acSMark RutlandAt entry to the function, the link register (x30) points to the caller, and the 276*f89f20acSMark Rutlandframe pointer (X29) points to the caller's frame including the caller's return 277*f89f20acSMark Rutlandaddress. The first two instructions create a new stackframe and update the 278*f89f20acSMark Rutlandframe pointer, and at this point the link register and the frame pointer both 279*f89f20acSMark Rutlanddescribe this function's return address. A trace at this point may describe 280*f89f20acSMark Rutlandthis function twice, and if the function return is being traced, the unwinder 281*f89f20acSMark Rutlandmay consume two entries from the fgraph return stack rather than one entry. 282*f89f20acSMark Rutland 283*f89f20acSMark RutlandThe BL invokes 'other_function' with the link register pointing to this 284*f89f20acSMark Rutlandfunction's LDR and the frame pointer pointing to this function's stackframe. 285*f89f20acSMark RutlandWhen 'other_function' returns, the link register is left pointing at the BL, 286*f89f20acSMark Rutlandand so a trace at this point could result in 'function' appearing twice in the 287*f89f20acSMark Rutlandbacktrace. 288*f89f20acSMark Rutland 289*f89f20acSMark RutlandSimilarly, a function may deliberately clobber the LR, e.g. 290*f89f20acSMark Rutland 291*f89f20acSMark Rutland.. code-block:: none 292*f89f20acSMark Rutland 293*f89f20acSMark Rutland caller: 294*f89f20acSMark Rutland STP X29, X30, [SP, -16]! 295*f89f20acSMark Rutland MOV X29, SP 296*f89f20acSMark Rutland ADR LR, <callee> 297*f89f20acSMark Rutland BLR LR 298*f89f20acSMark Rutland LDP X29, X30, [SP], #16 299*f89f20acSMark Rutland RET 300*f89f20acSMark Rutland 301*f89f20acSMark RutlandThe ADR places the address of 'callee' into the LR, before the BLR branches to 302*f89f20acSMark Rutlandthis address. If a trace is made immediately after the ADR, 'callee' will 303*f89f20acSMark Rutlandappear to be the parent of 'caller', rather than the child. 304*f89f20acSMark Rutland 305*f89f20acSMark RutlandDue to cases such as the above, it may only be possible to reliably consume a 306*f89f20acSMark Rutlandlink register value at a function call boundary. Architectures where this is 307*f89f20acSMark Rutlandthe case must reject unwinding across exception boundaries unless they can 308*f89f20acSMark Rutlandreliably identify when the LR or stack value should be used (e.g. using 309*f89f20acSMark Rutlandmetadata generated by objtool). 310