1.. SPDX-License-Identifier: GPL-2.0 2.. Copyright (C) 2020, Google LLC. 3 4Kernel Electric-Fence (KFENCE) 5============================== 6 7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety 8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and 9invalid-free errors. 10 11KFENCE is designed to be enabled in production kernels, and has near zero 12performance overhead. Compared to KASAN, KFENCE trades performance for 13precision. The main motivation behind KFENCE's design, is that with enough 14total uptime KFENCE will detect bugs in code paths not typically exercised by 15non-production test workloads. One way to quickly achieve a large enough total 16uptime is when the tool is deployed across a large fleet of machines. 17 18Usage 19----- 20 21To enable KFENCE, configure the kernel with:: 22 23 CONFIG_KFENCE=y 24 25To build a kernel with KFENCE support, but disabled by default (to enable, set 26``kfence.sample_interval`` to non-zero value), configure the kernel with:: 27 28 CONFIG_KFENCE=y 29 CONFIG_KFENCE_SAMPLE_INTERVAL=0 30 31KFENCE provides several other configuration options to customize behaviour (see 32the respective help text in ``lib/Kconfig.kfence`` for more info). 33 34Tuning performance 35~~~~~~~~~~~~~~~~~~ 36 37The most important parameter is KFENCE's sample interval, which can be set via 38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The 39sample interval determines the frequency with which heap allocations will be 40guarded by KFENCE. The default is configurable via the Kconfig option 41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` 42disables KFENCE. 43 44The sample interval controls a timer that sets up KFENCE allocations. By 45default, to keep the real sample interval predictable, the normal timer also 46causes CPU wake-ups when the system is completely idle. This may be undesirable 47on power-constrained systems. The boot parameter ``kfence.deferrable=1`` 48instead switches to a "deferrable" timer which does not force CPU wake-ups on 49idle systems, at the risk of unpredictable sample intervals. The default is 50configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``. 51 52.. warning:: 53 The KUnit test suite is very likely to fail when using a deferrable timer 54 since it currently causes very unpredictable sample intervals. 55 56By default KFENCE will only sample 1 heap allocation within each sample 57interval. *Burst mode* allows to sample successive heap allocations, where the 58kernel boot parameter ``kfence.burst`` can be set to a non-zero value which 59denotes the *additional* successive allocations within a sample interval; 60setting ``kfence.burst=N`` means that ``1 + N`` successive allocations are 61attempted through KFENCE for each sample interval. 62 63The KFENCE memory pool is of fixed size, and if the pool is exhausted, no 64further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 65255), the number of available guarded objects can be controlled. Each object 66requires 2 pages, one for the object itself and the other one used as a guard 67page; object pages are interleaved with guard pages, and every object page is 68therefore surrounded by two guard pages. 69 70The total memory dedicated to the KFENCE memory pool can be computed as:: 71 72 ( #objects + 1 ) * 2 * PAGE_SIZE 73 74Using the default config, and assuming a page size of 4 KiB, results in 75dedicating 2 MiB to the KFENCE memory pool. 76 77Note: On architectures that support huge pages, KFENCE will ensure that the 78pool is using pages of size ``PAGE_SIZE``. This will result in additional page 79tables being allocated. 80 81Error reports 82~~~~~~~~~~~~~ 83 84A typical out-of-bounds access looks like this:: 85 86 ================================================================== 87 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234 88 89 Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): 90 test_out_of_bounds_read+0xa6/0x234 91 kunit_try_run_case+0x61/0xa0 92 kunit_generic_run_threadfn_adapter+0x16/0x30 93 kthread+0x176/0x1b0 94 ret_from_fork+0x22/0x30 95 96 kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32 97 98 allocated by task 484 on cpu 0 at 32.919330s: 99 test_alloc+0xfe/0x738 100 test_out_of_bounds_read+0x9b/0x234 101 kunit_try_run_case+0x61/0xa0 102 kunit_generic_run_threadfn_adapter+0x16/0x30 103 kthread+0x176/0x1b0 104 ret_from_fork+0x22/0x30 105 106 CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7 107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 108 ================================================================== 109 110The header of the report provides a short summary of the function involved in 111the access. It is followed by more detailed information about the access and 112its origin. Note that, real kernel addresses are only shown when using the 113kernel command line option ``no_hash_pointers``. 114 115Use-after-free accesses are reported as:: 116 117 ================================================================== 118 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 119 120 Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): 121 test_use_after_free_read+0xb3/0x143 122 kunit_try_run_case+0x61/0xa0 123 kunit_generic_run_threadfn_adapter+0x16/0x30 124 kthread+0x176/0x1b0 125 ret_from_fork+0x22/0x30 126 127 kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32 128 129 allocated by task 488 on cpu 2 at 33.871326s: 130 test_alloc+0xfe/0x738 131 test_use_after_free_read+0x76/0x143 132 kunit_try_run_case+0x61/0xa0 133 kunit_generic_run_threadfn_adapter+0x16/0x30 134 kthread+0x176/0x1b0 135 ret_from_fork+0x22/0x30 136 137 freed by task 488 on cpu 2 at 33.871358s: 138 test_use_after_free_read+0xa8/0x143 139 kunit_try_run_case+0x61/0xa0 140 kunit_generic_run_threadfn_adapter+0x16/0x30 141 kthread+0x176/0x1b0 142 ret_from_fork+0x22/0x30 143 144 CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 145 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 146 ================================================================== 147 148KFENCE also reports on invalid frees, such as double-frees:: 149 150 ================================================================== 151 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 152 153 Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): 154 test_double_free+0xdc/0x171 155 kunit_try_run_case+0x61/0xa0 156 kunit_generic_run_threadfn_adapter+0x16/0x30 157 kthread+0x176/0x1b0 158 ret_from_fork+0x22/0x30 159 160 kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32 161 162 allocated by task 490 on cpu 1 at 34.175321s: 163 test_alloc+0xfe/0x738 164 test_double_free+0x76/0x171 165 kunit_try_run_case+0x61/0xa0 166 kunit_generic_run_threadfn_adapter+0x16/0x30 167 kthread+0x176/0x1b0 168 ret_from_fork+0x22/0x30 169 170 freed by task 490 on cpu 1 at 34.175348s: 171 test_double_free+0xa8/0x171 172 kunit_try_run_case+0x61/0xa0 173 kunit_generic_run_threadfn_adapter+0x16/0x30 174 kthread+0x176/0x1b0 175 ret_from_fork+0x22/0x30 176 177 CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 178 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 179 ================================================================== 180 181KFENCE also uses pattern-based redzones on the other side of an object's guard 182page, to detect out-of-bounds writes on the unprotected side of the object. 183These are reported on frees:: 184 185 ================================================================== 186 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 187 188 Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): 189 test_kmalloc_aligned_oob_write+0xef/0x184 190 kunit_try_run_case+0x61/0xa0 191 kunit_generic_run_threadfn_adapter+0x16/0x30 192 kthread+0x176/0x1b0 193 ret_from_fork+0x22/0x30 194 195 kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96 196 197 allocated by task 502 on cpu 7 at 42.159302s: 198 test_alloc+0xfe/0x738 199 test_kmalloc_aligned_oob_write+0x57/0x184 200 kunit_try_run_case+0x61/0xa0 201 kunit_generic_run_threadfn_adapter+0x16/0x30 202 kthread+0x176/0x1b0 203 ret_from_fork+0x22/0x30 204 205 CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 207 ================================================================== 208 209For such errors, the address where the corruption occurred as well as the 210invalidly written bytes (offset from the address) are shown; in this 211representation, '.' denote untouched bytes. In the example above ``0xac`` is 212the value written to the invalid address at offset 0, and the remaining '.' 213denote that no following bytes have been touched. Note that, real values are 214only shown if the kernel was booted with ``no_hash_pointers``; to avoid 215information disclosure otherwise, '!' is used instead to denote invalidly 216written bytes. 217 218And finally, KFENCE may also report on invalid accesses to any protected page 219where it was not possible to determine an associated object, e.g. if adjacent 220object pages had not yet been allocated:: 221 222 ================================================================== 223 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 224 225 Invalid read at 0xffffffffb670b00a: 226 test_invalid_access+0x26/0xe0 227 kunit_try_run_case+0x51/0x85 228 kunit_generic_run_threadfn_adapter+0x16/0x30 229 kthread+0x137/0x160 230 ret_from_fork+0x22/0x30 231 232 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 233 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 234 ================================================================== 235 236DebugFS interface 237~~~~~~~~~~~~~~~~~ 238 239Some debugging information is exposed via debugfs: 240 241* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. 242 243* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects 244 allocated via KFENCE, including those already freed but protected. 245 246Implementation Details 247---------------------- 248 249Guarded allocations are set up based on the sample interval. After expiration 250of the sample interval, the next allocation through the main allocator (SLAB or 251SLUB) returns a guarded allocation from the KFENCE object pool (allocation 252sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and 253the next allocation is set up after the expiration of the interval. 254 255When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated" 256through the main allocator's fast-path by relying on static branches via the 257static keys infrastructure. The static branch is toggled to redirect the 258allocation to KFENCE. Depending on sample interval, target workloads, and 259system architecture, this may perform better than the simple dynamic branch. 260Careful benchmarking is recommended. 261 262KFENCE objects each reside on a dedicated page, at either the left or right 263page boundaries selected at random. The pages to the left and right of the 264object page are "guard pages", whose attributes are changed to a protected 265state, and cause page faults on any attempted access. Such page faults are then 266intercepted by KFENCE, which handles the fault gracefully by reporting an 267out-of-bounds access, and marking the page as accessible so that the faulting 268code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). 269 270To detect out-of-bounds writes to memory within the object's page itself, 271KFENCE also uses pattern-based redzones. For each object page, a redzone is set 272up for all non-object memory. For typical alignments, the redzone is only 273required on the unguarded side of an object. Because KFENCE must honor the 274cache's requested alignment, special alignments may result in unprotected gaps 275on either side of an object, all of which are redzoned. 276 277The following figure illustrates the page layout:: 278 279 ---+-----------+-----------+-----------+-----------+-----------+--- 280 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | 281 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | 282 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | 283 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | 284 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | 285 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | 286 ---+-----------+-----------+-----------+-----------+-----------+--- 287 288Upon deallocation of a KFENCE object, the object's page is again protected and 289the object is marked as freed. Any further access to the object causes a fault 290and KFENCE reports a use-after-free access. Freed objects are inserted at the 291tail of KFENCE's freelist, so that the least recently freed objects are reused 292first, and the chances of detecting use-after-frees of recently freed objects 293is increased. 294 295If pool utilization reaches 75% (default) or above, to reduce the risk of the 296pool eventually being fully occupied by allocated objects yet ensure diverse 297coverage of allocations, KFENCE limits currently covered allocations of the 298same source from further filling up the pool. The "source" of an allocation is 299based on its partial allocation stack trace. A side-effect is that this also 300limits frequent long-lived allocations (e.g. pagecache) of the same source 301filling up the pool permanently, which is the most common risk for the pool 302becoming full and the sampled allocation rate dropping to zero. The threshold 303at which to start limiting currently covered allocations can be configured via 304the boot parameter ``kfence.skip_covered_thresh`` (pool usage%). 305 306Interface 307--------- 308 309The following describes the functions which are used by allocators as well as 310page handling code to set up and deal with KFENCE allocations. 311 312.. kernel-doc:: include/linux/kfence.h 313 :functions: is_kfence_address 314 kfence_shutdown_cache 315 kfence_alloc kfence_free __kfence_free 316 kfence_ksize kfence_object_start 317 kfence_handle_page_fault 318 319Related Tools 320------------- 321 322In userspace, a similar approach is taken by `GWP-ASan 323<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and 324a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is 325directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another 326similar but non-sampling approach, that also inspired the name "KFENCE", can be 327found in the userspace `Electric Fence Malloc Debugger 328<https://linux.die.net/man/3/efence>`_. 329 330In the kernel, several tools exist to debug memory access errors, and in 331particular KASAN can detect all bug classes that KFENCE can detect. While KASAN 332is more precise, relying on compiler instrumentation, this comes at a 333performance cost. 334 335It is worth highlighting that KASAN and KFENCE are complementary, with 336different target environments. For instance, KASAN is the better debugging-aid, 337where test cases or reproducers exists: due to the lower chance to detect the 338error, it would require more effort using KFENCE to debug. Deployments at scale 339that cannot afford to enable KASAN, however, would benefit from using KFENCE to 340discover bugs due to code paths not exercised by test cases or fuzzers. 341