xref: /linux/Documentation/dev-tools/kfence.rst (revision 5ea5880764cbb164afb17a62e76ca75dc371409d)
1.. SPDX-License-Identifier: GPL-2.0
2.. Copyright (C) 2020, Google LLC.
3
4Kernel Electric-Fence (KFENCE)
5==============================
6
7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
9invalid-free errors.
10
11KFENCE is designed to be enabled in production kernels, and has near zero
12performance overhead. Compared to KASAN, KFENCE trades performance for
13precision. The main motivation behind KFENCE's design, is that with enough
14total uptime KFENCE will detect bugs in code paths not typically exercised by
15non-production test workloads. One way to quickly achieve a large enough total
16uptime is when the tool is deployed across a large fleet of machines.
17
18Usage
19-----
20
21To enable KFENCE, configure the kernel with::
22
23    CONFIG_KFENCE=y
24
25To build a kernel with KFENCE support, but disabled by default (to enable, set
26``kfence.sample_interval`` to non-zero value), configure the kernel with::
27
28    CONFIG_KFENCE=y
29    CONFIG_KFENCE_SAMPLE_INTERVAL=0
30
31KFENCE provides several other configuration options to customize behaviour (see
32the respective help text in ``lib/Kconfig.kfence`` for more info).
33
34Tuning performance
35~~~~~~~~~~~~~~~~~~
36
37The most important parameter is KFENCE's sample interval, which can be set via
38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
39sample interval determines the frequency with which heap allocations will be
40guarded by KFENCE. The default is configurable via the Kconfig option
41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
42disables KFENCE.
43
44The sample interval controls a timer that sets up KFENCE allocations. By
45default, to keep the real sample interval predictable, the normal timer also
46causes CPU wake-ups when the system is completely idle. This may be undesirable
47on power-constrained systems. The boot parameter ``kfence.deferrable=1``
48instead switches to a "deferrable" timer which does not force CPU wake-ups on
49idle systems, at the risk of unpredictable sample intervals. The default is
50configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
51
52.. warning::
53   The KUnit test suite is very likely to fail when using a deferrable timer
54   since it currently causes very unpredictable sample intervals.
55
56By default KFENCE will only sample 1 heap allocation within each sample
57interval. *Burst mode* allows to sample successive heap allocations, where the
58kernel boot parameter ``kfence.burst`` can be set to a non-zero value which
59denotes the *additional* successive allocations within a sample interval;
60setting ``kfence.burst=N`` means that ``1 + N`` successive allocations are
61attempted through KFENCE for each sample interval.
62
63The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
64further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
65255), the number of available guarded objects can be controlled. Each object
66requires 2 pages, one for the object itself and the other one used as a guard
67page; object pages are interleaved with guard pages, and every object page is
68therefore surrounded by two guard pages.
69
70The total memory dedicated to the KFENCE memory pool can be computed as::
71
72    ( #objects + 1 ) * 2 * PAGE_SIZE
73
74Using the default config, and assuming a page size of 4 KiB, results in
75dedicating 2 MiB to the KFENCE memory pool.
76
77Note: On architectures that support huge pages, KFENCE will ensure that the
78pool is using pages of size ``PAGE_SIZE``. This will result in additional page
79tables being allocated.
80
81Error reports
82~~~~~~~~~~~~~
83
84The boot parameter ``kfence.fault`` can be used to control the behavior when a
85KFENCE error is detected:
86
87- ``kfence.fault=report``: Print the error report and continue (default).
88- ``kfence.fault=oops``: Print the error report and oops.
89- ``kfence.fault=panic``: Print the error report and panic.
90
91A typical out-of-bounds access looks like this::
92
93    ==================================================================
94    BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234
95
96    Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
97     test_out_of_bounds_read+0xa6/0x234
98     kunit_try_run_case+0x61/0xa0
99     kunit_generic_run_threadfn_adapter+0x16/0x30
100     kthread+0x176/0x1b0
101     ret_from_fork+0x22/0x30
102
103    kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32
104
105    allocated by task 484 on cpu 0 at 32.919330s:
106     test_alloc+0xfe/0x738
107     test_out_of_bounds_read+0x9b/0x234
108     kunit_try_run_case+0x61/0xa0
109     kunit_generic_run_threadfn_adapter+0x16/0x30
110     kthread+0x176/0x1b0
111     ret_from_fork+0x22/0x30
112
113    CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
114    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
115    ==================================================================
116
117The header of the report provides a short summary of the function involved in
118the access. It is followed by more detailed information about the access and
119its origin. Note that, real kernel addresses are only shown when using the
120kernel command line option ``no_hash_pointers``.
121
122Use-after-free accesses are reported as::
123
124    ==================================================================
125    BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
126
127    Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79):
128     test_use_after_free_read+0xb3/0x143
129     kunit_try_run_case+0x61/0xa0
130     kunit_generic_run_threadfn_adapter+0x16/0x30
131     kthread+0x176/0x1b0
132     ret_from_fork+0x22/0x30
133
134    kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32
135
136    allocated by task 488 on cpu 2 at 33.871326s:
137     test_alloc+0xfe/0x738
138     test_use_after_free_read+0x76/0x143
139     kunit_try_run_case+0x61/0xa0
140     kunit_generic_run_threadfn_adapter+0x16/0x30
141     kthread+0x176/0x1b0
142     ret_from_fork+0x22/0x30
143
144    freed by task 488 on cpu 2 at 33.871358s:
145     test_use_after_free_read+0xa8/0x143
146     kunit_try_run_case+0x61/0xa0
147     kunit_generic_run_threadfn_adapter+0x16/0x30
148     kthread+0x176/0x1b0
149     ret_from_fork+0x22/0x30
150
151    CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
152    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
153    ==================================================================
154
155KFENCE also reports on invalid frees, such as double-frees::
156
157    ==================================================================
158    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
159
160    Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81):
161     test_double_free+0xdc/0x171
162     kunit_try_run_case+0x61/0xa0
163     kunit_generic_run_threadfn_adapter+0x16/0x30
164     kthread+0x176/0x1b0
165     ret_from_fork+0x22/0x30
166
167    kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32
168
169    allocated by task 490 on cpu 1 at 34.175321s:
170     test_alloc+0xfe/0x738
171     test_double_free+0x76/0x171
172     kunit_try_run_case+0x61/0xa0
173     kunit_generic_run_threadfn_adapter+0x16/0x30
174     kthread+0x176/0x1b0
175     ret_from_fork+0x22/0x30
176
177    freed by task 490 on cpu 1 at 34.175348s:
178     test_double_free+0xa8/0x171
179     kunit_try_run_case+0x61/0xa0
180     kunit_generic_run_threadfn_adapter+0x16/0x30
181     kthread+0x176/0x1b0
182     ret_from_fork+0x22/0x30
183
184    CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
185    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
186    ==================================================================
187
188KFENCE also uses pattern-based redzones on the other side of an object's guard
189page, to detect out-of-bounds writes on the unprotected side of the object.
190These are reported on frees::
191
192    ==================================================================
193    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
194
195    Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156):
196     test_kmalloc_aligned_oob_write+0xef/0x184
197     kunit_try_run_case+0x61/0xa0
198     kunit_generic_run_threadfn_adapter+0x16/0x30
199     kthread+0x176/0x1b0
200     ret_from_fork+0x22/0x30
201
202    kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96
203
204    allocated by task 502 on cpu 7 at 42.159302s:
205     test_alloc+0xfe/0x738
206     test_kmalloc_aligned_oob_write+0x57/0x184
207     kunit_try_run_case+0x61/0xa0
208     kunit_generic_run_threadfn_adapter+0x16/0x30
209     kthread+0x176/0x1b0
210     ret_from_fork+0x22/0x30
211
212    CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G    B             5.13.0-rc3+ #7
213    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
214    ==================================================================
215
216For such errors, the address where the corruption occurred as well as the
217invalidly written bytes (offset from the address) are shown; in this
218representation, '.' denote untouched bytes. In the example above ``0xac`` is
219the value written to the invalid address at offset 0, and the remaining '.'
220denote that no following bytes have been touched. Note that, real values are
221only shown if the kernel was booted with ``no_hash_pointers``; to avoid
222information disclosure otherwise, '!' is used instead to denote invalidly
223written bytes.
224
225And finally, KFENCE may also report on invalid accesses to any protected page
226where it was not possible to determine an associated object, e.g. if adjacent
227object pages had not yet been allocated::
228
229    ==================================================================
230    BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
231
232    Invalid read at 0xffffffffb670b00a:
233     test_invalid_access+0x26/0xe0
234     kunit_try_run_case+0x51/0x85
235     kunit_generic_run_threadfn_adapter+0x16/0x30
236     kthread+0x137/0x160
237     ret_from_fork+0x22/0x30
238
239    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
240    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
241    ==================================================================
242
243DebugFS interface
244~~~~~~~~~~~~~~~~~
245
246Some debugging information is exposed via debugfs:
247
248* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
249
250* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
251  allocated via KFENCE, including those already freed but protected.
252
253Implementation Details
254----------------------
255
256Guarded allocations are set up based on the sample interval. After expiration
257of the sample interval, the next allocation through the main allocator (SLAB or
258SLUB) returns a guarded allocation from the KFENCE object pool (allocation
259sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
260the next allocation is set up after the expiration of the interval.
261
262When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated"
263through the main allocator's fast-path by relying on static branches via the
264static keys infrastructure. The static branch is toggled to redirect the
265allocation to KFENCE. Depending on sample interval, target workloads, and
266system architecture, this may perform better than the simple dynamic branch.
267Careful benchmarking is recommended.
268
269KFENCE objects each reside on a dedicated page, at either the left or right
270page boundaries selected at random. The pages to the left and right of the
271object page are "guard pages", whose attributes are changed to a protected
272state, and cause page faults on any attempted access. Such page faults are then
273intercepted by KFENCE, which handles the fault gracefully by reporting an
274out-of-bounds access, and marking the page as accessible so that the faulting
275code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
276
277To detect out-of-bounds writes to memory within the object's page itself,
278KFENCE also uses pattern-based redzones. For each object page, a redzone is set
279up for all non-object memory. For typical alignments, the redzone is only
280required on the unguarded side of an object. Because KFENCE must honor the
281cache's requested alignment, special alignments may result in unprotected gaps
282on either side of an object, all of which are redzoned.
283
284The following figure illustrates the page layout::
285
286    ---+-----------+-----------+-----------+-----------+-----------+---
287       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
288       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
289       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
290       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
291       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
292       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
293    ---+-----------+-----------+-----------+-----------+-----------+---
294
295Upon deallocation of a KFENCE object, the object's page is again protected and
296the object is marked as freed. Any further access to the object causes a fault
297and KFENCE reports a use-after-free access. Freed objects are inserted at the
298tail of KFENCE's freelist, so that the least recently freed objects are reused
299first, and the chances of detecting use-after-frees of recently freed objects
300is increased.
301
302If pool utilization reaches 75% (default) or above, to reduce the risk of the
303pool eventually being fully occupied by allocated objects yet ensure diverse
304coverage of allocations, KFENCE limits currently covered allocations of the
305same source from further filling up the pool. The "source" of an allocation is
306based on its partial allocation stack trace. A side-effect is that this also
307limits frequent long-lived allocations (e.g. pagecache) of the same source
308filling up the pool permanently, which is the most common risk for the pool
309becoming full and the sampled allocation rate dropping to zero. The threshold
310at which to start limiting currently covered allocations can be configured via
311the boot parameter ``kfence.skip_covered_thresh`` (pool usage%).
312
313Interface
314---------
315
316The following describes the functions which are used by allocators as well as
317page handling code to set up and deal with KFENCE allocations.
318
319.. kernel-doc:: include/linux/kfence.h
320   :functions: is_kfence_address
321               kfence_shutdown_cache
322               kfence_alloc kfence_free __kfence_free
323               kfence_ksize kfence_object_start
324               kfence_handle_page_fault
325
326Related Tools
327-------------
328
329In userspace, a similar approach is taken by `GWP-ASan
330<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
331a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
332directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
333similar but non-sampling approach, that also inspired the name "KFENCE", can be
334found in the userspace `Electric Fence Malloc Debugger
335<https://linux.die.net/man/3/efence>`_.
336
337In the kernel, several tools exist to debug memory access errors, and in
338particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
339is more precise, relying on compiler instrumentation, this comes at a
340performance cost.
341
342It is worth highlighting that KASAN and KFENCE are complementary, with
343different target environments. For instance, KASAN is the better debugging-aid,
344where test cases or reproducers exists: due to the lower chance to detect the
345error, it would require more effort using KFENCE to debug. Deployments at scale
346that cannot afford to enable KASAN, however, would benefit from using KFENCE to
347discover bugs due to code paths not exercised by test cases or fuzzers.
348