1f89f20acSMark Rutland=================== 2f89f20acSMark RutlandReliable Stacktrace 3f89f20acSMark Rutland=================== 4f89f20acSMark Rutland 5f89f20acSMark RutlandThis document outlines basic information about reliable stacktracing. 6f89f20acSMark Rutland 7f89f20acSMark Rutland.. Table of Contents: 8f89f20acSMark Rutland 9f89f20acSMark Rutland.. contents:: :local: 10f89f20acSMark Rutland 11f89f20acSMark Rutland1. Introduction 12f89f20acSMark Rutland=============== 13f89f20acSMark Rutland 14f89f20acSMark RutlandThe kernel livepatch consistency model relies on accurately identifying which 15f89f20acSMark Rutlandfunctions may have live state and therefore may not be safe to patch. One way 16f89f20acSMark Rutlandto identify which functions are live is to use a stacktrace. 17f89f20acSMark Rutland 18f89f20acSMark RutlandExisting stacktrace code may not always give an accurate picture of all 19f89f20acSMark Rutlandfunctions with live state, and best-effort approaches which can be helpful for 20f89f20acSMark Rutlanddebugging are unsound for livepatching. Livepatching depends on architectures 21f89f20acSMark Rutlandto provide a *reliable* stacktrace which ensures it never omits any live 22f89f20acSMark Rutlandfunctions from a trace. 23f89f20acSMark Rutland 24f89f20acSMark Rutland 25f89f20acSMark Rutland2. Requirements 26f89f20acSMark Rutland=============== 27f89f20acSMark Rutland 28f89f20acSMark RutlandArchitectures must implement one of the reliable stacktrace functions. 29f89f20acSMark RutlandArchitectures using CONFIG_ARCH_STACKWALK must implement 30f89f20acSMark Rutland'arch_stack_walk_reliable', and other architectures must implement 31f89f20acSMark Rutland'save_stack_trace_tsk_reliable'. 32f89f20acSMark Rutland 33f89f20acSMark RutlandPrincipally, the reliable stacktrace function must ensure that either: 34f89f20acSMark Rutland 35f89f20acSMark Rutland* The trace includes all functions that the task may be returned to, and the 36f89f20acSMark Rutland return code is zero to indicate that the trace is reliable. 37f89f20acSMark Rutland 38f89f20acSMark Rutland* The return code is non-zero to indicate that the trace is not reliable. 39f89f20acSMark Rutland 40f89f20acSMark Rutland.. note:: 41f89f20acSMark Rutland In some cases it is legitimate to omit specific functions from the trace, 42f89f20acSMark Rutland but all other functions must be reported. These cases are described in 43*d56b699dSBjorn Helgaas further detail below. 44f89f20acSMark Rutland 45f89f20acSMark RutlandSecondly, the reliable stacktrace function must be robust to cases where 46f89f20acSMark Rutlandthe stack or other unwind state is corrupt or otherwise unreliable. The 47f89f20acSMark Rutlandfunction should attempt to detect such cases and return a non-zero error 48f89f20acSMark Rutlandcode, and should not get stuck in an infinite loop or access memory in 49f89f20acSMark Rutlandan unsafe way. Specific cases are described in further detail below. 50f89f20acSMark Rutland 51f89f20acSMark Rutland 52f89f20acSMark Rutland3. Compile-time analysis 53f89f20acSMark Rutland======================== 54f89f20acSMark Rutland 55f89f20acSMark RutlandTo ensure that kernel code can be correctly unwound in all cases, 56f89f20acSMark Rutlandarchitectures may need to verify that code has been compiled in a manner 57f89f20acSMark Rutlandexpected by the unwinder. For example, an unwinder may expect that 58f89f20acSMark Rutlandfunctions manipulate the stack pointer in a limited way, or that all 59f89f20acSMark Rutlandfunctions use specific prologue and epilogue sequences. Architectures 60f89f20acSMark Rutlandwith such requirements should verify the kernel compilation using 61f89f20acSMark Rutlandobjtool. 62f89f20acSMark Rutland 63f89f20acSMark RutlandIn some cases, an unwinder may require metadata to correctly unwind. 64f89f20acSMark RutlandWhere necessary, this metadata should be generated at build time using 65f89f20acSMark Rutlandobjtool. 66f89f20acSMark Rutland 67f89f20acSMark Rutland 68f89f20acSMark Rutland4. Considerations 69f89f20acSMark Rutland================= 70f89f20acSMark Rutland 71f89f20acSMark RutlandThe unwinding process varies across architectures, their respective procedure 72f89f20acSMark Rutlandcall standards, and kernel configurations. This section describes common 73f89f20acSMark Rutlanddetails that architectures should consider. 74f89f20acSMark Rutland 75f89f20acSMark Rutland4.1 Identifying successful termination 76f89f20acSMark Rutland-------------------------------------- 77f89f20acSMark Rutland 78f89f20acSMark RutlandUnwinding may terminate early for a number of reasons, including: 79f89f20acSMark Rutland 80f89f20acSMark Rutland* Stack or frame pointer corruption. 81f89f20acSMark Rutland 82f89f20acSMark Rutland* Missing unwind support for an uncommon scenario, or a bug in the unwinder. 83f89f20acSMark Rutland 84f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF) or foreign code (e.g. EFI runtime 85f89f20acSMark Rutland services) not following the conventions expected by the unwinder. 86f89f20acSMark Rutland 87f89f20acSMark RutlandTo ensure that this does not result in functions being omitted from the trace, 88f89f20acSMark Rutlandeven if not caught by other checks, it is strongly recommended that 89f89f20acSMark Rutlandarchitectures verify that a stacktrace ends at an expected location, e.g. 90f89f20acSMark Rutland 91f89f20acSMark Rutland* Within a specific function that is an entry point to the kernel. 92f89f20acSMark Rutland 93f89f20acSMark Rutland* At a specific location on a stack expected for a kernel entry point. 94f89f20acSMark Rutland 95f89f20acSMark Rutland* On a specific stack expected for a kernel entry point (e.g. if the 96f89f20acSMark Rutland architecture has separate task and IRQ stacks). 97f89f20acSMark Rutland 98f89f20acSMark Rutland4.2 Identifying unwindable code 99f89f20acSMark Rutland------------------------------- 100f89f20acSMark Rutland 101f89f20acSMark RutlandUnwinding typically relies on code following specific conventions (e.g. 102f89f20acSMark Rutlandmanipulating a frame pointer), but there can be code which may not follow these 103f89f20acSMark Rutlandconventions and may require special handling in the unwinder, e.g. 104f89f20acSMark Rutland 105f89f20acSMark Rutland* Exception vectors and entry assembly. 106f89f20acSMark Rutland 107f89f20acSMark Rutland* Procedure Linkage Table (PLT) entries and veneer functions. 108f89f20acSMark Rutland 109f89f20acSMark Rutland* Trampoline assembly (e.g. ftrace, kprobes). 110f89f20acSMark Rutland 111f89f20acSMark Rutland* Dynamically generated code (e.g. eBPF, optprobe trampolines). 112f89f20acSMark Rutland 113f89f20acSMark Rutland* Foreign code (e.g. EFI runtime services). 114f89f20acSMark Rutland 115f89f20acSMark RutlandTo ensure that such cases do not result in functions being omitted from a 116f89f20acSMark Rutlandtrace, it is strongly recommended that architectures positively identify code 117f89f20acSMark Rutlandwhich is known to be reliable to unwind from, and reject unwinding from all 118f89f20acSMark Rutlandother code. 119f89f20acSMark Rutland 120f89f20acSMark RutlandKernel code including modules and eBPF can be distinguished from foreign code 121f89f20acSMark Rutlandusing '__kernel_text_address()'. Checking for this also helps to detect stack 122f89f20acSMark Rutlandcorruption. 123f89f20acSMark Rutland 124f89f20acSMark RutlandThere are several ways an architecture may identify kernel code which is deemed 125f89f20acSMark Rutlandunreliable to unwind from, e.g. 126f89f20acSMark Rutland 127f89f20acSMark Rutland* Placing such code into special linker sections, and rejecting unwinding from 128f89f20acSMark Rutland any code in these sections. 129f89f20acSMark Rutland 130f89f20acSMark Rutland* Identifying specific portions of code using bounds information. 131f89f20acSMark Rutland 132f89f20acSMark Rutland4.3 Unwinding across interrupts and exceptions 133f89f20acSMark Rutland---------------------------------------------- 134f89f20acSMark Rutland 135f89f20acSMark RutlandAt function call boundaries the stack and other unwind state is expected to be 136f89f20acSMark Rutlandin a consistent state suitable for reliable unwinding, but this may not be the 137f89f20acSMark Rutlandcase part-way through a function. For example, during a function prologue or 138f89f20acSMark Rutlandepilogue a frame pointer may be transiently invalid, or during the function 139f89f20acSMark Rutlandbody the return address may be held in an arbitrary general purpose register. 140f89f20acSMark RutlandFor some architectures this may change at runtime as a result of dynamic 141f89f20acSMark Rutlandinstrumentation. 142f89f20acSMark Rutland 143f89f20acSMark RutlandIf an interrupt or other exception is taken while the stack or other unwind 144f89f20acSMark Rutlandstate is in an inconsistent state, it may not be possible to reliably unwind, 145f89f20acSMark Rutlandand it may not be possible to identify whether such unwinding will be reliable. 146f89f20acSMark RutlandSee below for examples. 147f89f20acSMark Rutland 148f89f20acSMark RutlandArchitectures which cannot identify when it is reliable to unwind such cases 149f89f20acSMark Rutland(or where it is never reliable) must reject unwinding across exception 150f89f20acSMark Rutlandboundaries. Note that it may be reliable to unwind across certain 151f89f20acSMark Rutlandexceptions (e.g. IRQ) but unreliable to unwind across other exceptions 152f89f20acSMark Rutland(e.g. NMI). 153f89f20acSMark Rutland 154f89f20acSMark RutlandArchitectures which can identify when it is reliable to unwind such cases (or 155f89f20acSMark Rutlandhave no such cases) should attempt to unwind across exception boundaries, as 156f89f20acSMark Rutlanddoing so can prevent unnecessarily stalling livepatch consistency checks and 157f89f20acSMark Rutlandpermits livepatch transitions to complete more quickly. 158f89f20acSMark Rutland 159f89f20acSMark Rutland4.4 Rewriting of return addresses 160f89f20acSMark Rutland--------------------------------- 161f89f20acSMark Rutland 162f89f20acSMark RutlandSome trampolines temporarily modify the return address of a function in order 163f89f20acSMark Rutlandto intercept when that function returns with a return trampoline, e.g. 164f89f20acSMark Rutland 165f89f20acSMark Rutland* An ftrace trampoline may modify the return address so that function graph 166f89f20acSMark Rutland tracing can intercept returns. 167f89f20acSMark Rutland 168f89f20acSMark Rutland* A kprobes (or optprobes) trampoline may modify the return address so that 169f89f20acSMark Rutland kretprobes can intercept returns. 170f89f20acSMark Rutland 171f89f20acSMark RutlandWhen this happens, the original return address will not be in its usual 172f89f20acSMark Rutlandlocation. For trampolines which are not subject to live patching, where an 173f89f20acSMark Rutlandunwinder can reliably determine the original return address and no unwind state 174f89f20acSMark Rutlandis altered by the trampoline, the unwinder may report the original return 175f89f20acSMark Rutlandaddress in place of the trampoline and report this as reliable. Otherwise, an 176f89f20acSMark Rutlandunwinder must report these cases as unreliable. 177f89f20acSMark Rutland 178f89f20acSMark RutlandSpecial care is required when identifying the original return address, as this 179f89f20acSMark Rutlandinformation is not in a consistent location for the duration of the entry 180f89f20acSMark Rutlandtrampoline or return trampoline. For example, considering the x86_64 181f89f20acSMark Rutland'return_to_handler' return trampoline: 182f89f20acSMark Rutland 183f89f20acSMark Rutland.. code-block:: none 184f89f20acSMark Rutland 185f89f20acSMark Rutland SYM_CODE_START(return_to_handler) 186fb799447SJosh Poimboeuf UNWIND_HINT_UNDEFINED 187f89f20acSMark Rutland subq $24, %rsp 188f89f20acSMark Rutland 189f89f20acSMark Rutland /* Save the return values */ 190f89f20acSMark Rutland movq %rax, (%rsp) 191f89f20acSMark Rutland movq %rdx, 8(%rsp) 192f89f20acSMark Rutland movq %rbp, %rdi 193f89f20acSMark Rutland 194f89f20acSMark Rutland call ftrace_return_to_handler 195f89f20acSMark Rutland 196f89f20acSMark Rutland movq %rax, %rdi 197f89f20acSMark Rutland movq 8(%rsp), %rdx 198f89f20acSMark Rutland movq (%rsp), %rax 199f89f20acSMark Rutland addq $24, %rsp 200f89f20acSMark Rutland JMP_NOSPEC rdi 201f89f20acSMark Rutland SYM_CODE_END(return_to_handler) 202f89f20acSMark Rutland 203f89f20acSMark RutlandWhile the traced function runs its return address on the stack points to 204f89f20acSMark Rutlandthe start of return_to_handler, and the original return address is stored in 205f89f20acSMark Rutlandthe task's cur_ret_stack. During this time the unwinder can find the return 206f89f20acSMark Rutlandaddress using ftrace_graph_ret_addr(). 207f89f20acSMark Rutland 208f89f20acSMark RutlandWhen the traced function returns to return_to_handler, there is no longer a 209f89f20acSMark Rutlandreturn address on the stack, though the original return address is still stored 210f89f20acSMark Rutlandin the task's cur_ret_stack. Within ftrace_return_to_handler(), the original 211f89f20acSMark Rutlandreturn address is removed from cur_ret_stack and is transiently moved 212f89f20acSMark Rutlandarbitrarily by the compiler before being returned in rax. The return_to_handler 213f89f20acSMark Rutlandtrampoline moves this into rdi before jumping to it. 214f89f20acSMark Rutland 215f89f20acSMark RutlandArchitectures might not always be able to unwind such sequences, such as when 216f89f20acSMark Rutlandftrace_return_to_handler() has removed the address from cur_ret_stack, and the 217f89f20acSMark Rutlandlocation of the return address cannot be reliably determined. 218f89f20acSMark Rutland 219f89f20acSMark RutlandIt is recommended that architectures unwind cases where return_to_handler has 220f89f20acSMark Rutlandnot yet been returned to, but architectures are not required to unwind from the 221f89f20acSMark Rutlandmiddle of return_to_handler and can report this as unreliable. Architectures 222f89f20acSMark Rutlandare not required to unwind from other trampolines which modify the return 223f89f20acSMark Rutlandaddress. 224f89f20acSMark Rutland 225f89f20acSMark Rutland4.5 Obscuring of return addresses 226f89f20acSMark Rutland--------------------------------- 227f89f20acSMark Rutland 228f89f20acSMark RutlandSome trampolines do not rewrite the return address in order to intercept 229f89f20acSMark Rutlandreturns, but do transiently clobber the return address or other unwind state. 230f89f20acSMark Rutland 231f89f20acSMark RutlandFor example, the x86_64 implementation of optprobes patches the probed function 232f89f20acSMark Rutlandwith a JMP instruction which targets the associated optprobe trampoline. When 233f89f20acSMark Rutlandthe probe is hit, the CPU will branch to the optprobe trampoline, and the 234f89f20acSMark Rutlandaddress of the probed function is not held in any register or on the stack. 235f89f20acSMark Rutland 236f89f20acSMark RutlandSimilarly, the arm64 implementation of DYNAMIC_FTRACE_WITH_REGS patches traced 237f89f20acSMark Rutlandfunctions with the following: 238f89f20acSMark Rutland 239f89f20acSMark Rutland.. code-block:: none 240f89f20acSMark Rutland 241f89f20acSMark Rutland MOV X9, X30 242f89f20acSMark Rutland BL <trampoline> 243f89f20acSMark Rutland 244f89f20acSMark RutlandThe MOV saves the link register (X30) into X9 to preserve the return address 245f89f20acSMark Rutlandbefore the BL clobbers the link register and branches to the trampoline. At the 246f89f20acSMark Rutlandstart of the trampoline, the address of the traced function is in X9 rather 247f89f20acSMark Rutlandthan the link register as would usually be the case. 248f89f20acSMark Rutland 249f89f20acSMark RutlandArchitectures must either ensure that unwinders either reliably unwind 250f89f20acSMark Rutlandsuch cases, or report the unwinding as unreliable. 251f89f20acSMark Rutland 252f89f20acSMark Rutland4.6 Link register unreliability 253f89f20acSMark Rutland------------------------------- 254f89f20acSMark Rutland 255f89f20acSMark RutlandOn some other architectures, 'call' instructions place the return address into a 256f89f20acSMark Rutlandlink register, and 'return' instructions consume the return address from the 257f89f20acSMark Rutlandlink register without modifying the register. On these architectures software 258f89f20acSMark Rutlandmust save the return address to the stack prior to making a function call. Over 259f89f20acSMark Rutlandthe duration of a function call, the return address may be held in the link 260f89f20acSMark Rutlandregister alone, on the stack alone, or in both locations. 261f89f20acSMark Rutland 262f89f20acSMark RutlandUnwinders typically assume the link register is always live, but this 263f89f20acSMark Rutlandassumption can lead to unreliable stack traces. For example, consider the 264f89f20acSMark Rutlandfollowing arm64 assembly for a simple function: 265f89f20acSMark Rutland 266f89f20acSMark Rutland.. code-block:: none 267f89f20acSMark Rutland 268f89f20acSMark Rutland function: 269f89f20acSMark Rutland STP X29, X30, [SP, -16]! 270f89f20acSMark Rutland MOV X29, SP 271f89f20acSMark Rutland BL <other_function> 272f89f20acSMark Rutland LDP X29, X30, [SP], #16 273f89f20acSMark Rutland RET 274f89f20acSMark Rutland 275f89f20acSMark RutlandAt entry to the function, the link register (x30) points to the caller, and the 276f89f20acSMark Rutlandframe pointer (X29) points to the caller's frame including the caller's return 277f89f20acSMark Rutlandaddress. The first two instructions create a new stackframe and update the 278f89f20acSMark Rutlandframe pointer, and at this point the link register and the frame pointer both 279f89f20acSMark Rutlanddescribe this function's return address. A trace at this point may describe 280f89f20acSMark Rutlandthis function twice, and if the function return is being traced, the unwinder 281f89f20acSMark Rutlandmay consume two entries from the fgraph return stack rather than one entry. 282f89f20acSMark Rutland 283f89f20acSMark RutlandThe BL invokes 'other_function' with the link register pointing to this 284f89f20acSMark Rutlandfunction's LDR and the frame pointer pointing to this function's stackframe. 285f89f20acSMark RutlandWhen 'other_function' returns, the link register is left pointing at the BL, 286f89f20acSMark Rutlandand so a trace at this point could result in 'function' appearing twice in the 287f89f20acSMark Rutlandbacktrace. 288f89f20acSMark Rutland 289f89f20acSMark RutlandSimilarly, a function may deliberately clobber the LR, e.g. 290f89f20acSMark Rutland 291f89f20acSMark Rutland.. code-block:: none 292f89f20acSMark Rutland 293f89f20acSMark Rutland caller: 294f89f20acSMark Rutland STP X29, X30, [SP, -16]! 295f89f20acSMark Rutland MOV X29, SP 296f89f20acSMark Rutland ADR LR, <callee> 297f89f20acSMark Rutland BLR LR 298f89f20acSMark Rutland LDP X29, X30, [SP], #16 299f89f20acSMark Rutland RET 300f89f20acSMark Rutland 301f89f20acSMark RutlandThe ADR places the address of 'callee' into the LR, before the BLR branches to 302f89f20acSMark Rutlandthis address. If a trace is made immediately after the ADR, 'callee' will 303f89f20acSMark Rutlandappear to be the parent of 'caller', rather than the child. 304f89f20acSMark Rutland 305f89f20acSMark RutlandDue to cases such as the above, it may only be possible to reliably consume a 306f89f20acSMark Rutlandlink register value at a function call boundary. Architectures where this is 307f89f20acSMark Rutlandthe case must reject unwinding across exception boundaries unless they can 308f89f20acSMark Rutlandreliably identify when the LR or stack value should be used (e.g. using 309f89f20acSMark Rutlandmetadata generated by objtool). 310