xref: /linux/Documentation/fault-injection/fault-injection.rst (revision a1c613ae4c322ddd58d5a8539dbfba2a0380a8c0)
110ffebbeSMauro Carvalho Chehab===========================================
210ffebbeSMauro Carvalho ChehabFault injection capabilities infrastructure
310ffebbeSMauro Carvalho Chehab===========================================
410ffebbeSMauro Carvalho Chehab
510ffebbeSMauro Carvalho ChehabSee also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug.
610ffebbeSMauro Carvalho Chehab
710ffebbeSMauro Carvalho Chehab
810ffebbeSMauro Carvalho ChehabAvailable fault injection capabilities
910ffebbeSMauro Carvalho Chehab--------------------------------------
1010ffebbeSMauro Carvalho Chehab
1110ffebbeSMauro Carvalho Chehab- failslab
1210ffebbeSMauro Carvalho Chehab
1310ffebbeSMauro Carvalho Chehab  injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
1410ffebbeSMauro Carvalho Chehab
1510ffebbeSMauro Carvalho Chehab- fail_page_alloc
1610ffebbeSMauro Carvalho Chehab
1710ffebbeSMauro Carvalho Chehab  injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
1810ffebbeSMauro Carvalho Chehab
192c739cedSAlbert van der Linde- fail_usercopy
202c739cedSAlbert van der Linde
212c739cedSAlbert van der Linde  injects failures in user memory access functions. (copy_from_user(), get_user(), ...)
222c739cedSAlbert van der Linde
2310ffebbeSMauro Carvalho Chehab- fail_futex
2410ffebbeSMauro Carvalho Chehab
2510ffebbeSMauro Carvalho Chehab  injects futex deadlock and uaddr fault errors.
2610ffebbeSMauro Carvalho Chehab
27400edd8cSChuck Lever- fail_sunrpc
28400edd8cSChuck Lever
29400edd8cSChuck Lever  injects kernel RPC client and server failures.
30400edd8cSChuck Lever
3110ffebbeSMauro Carvalho Chehab- fail_make_request
3210ffebbeSMauro Carvalho Chehab
3310ffebbeSMauro Carvalho Chehab  injects disk IO errors on devices permitted by setting
3410ffebbeSMauro Carvalho Chehab  /sys/block/<device>/make-it-fail or
35ed00aabdSChristoph Hellwig  /sys/block/<device>/<partition>/make-it-fail. (submit_bio_noacct())
3610ffebbeSMauro Carvalho Chehab
3710ffebbeSMauro Carvalho Chehab- fail_mmc_request
3810ffebbeSMauro Carvalho Chehab
3910ffebbeSMauro Carvalho Chehab  injects MMC data errors on devices permitted by setting
4010ffebbeSMauro Carvalho Chehab  debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
4110ffebbeSMauro Carvalho Chehab
4210ffebbeSMauro Carvalho Chehab- fail_function
4310ffebbeSMauro Carvalho Chehab
4410ffebbeSMauro Carvalho Chehab  injects error return on specific functions, which are marked by
4510ffebbeSMauro Carvalho Chehab  ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
4610ffebbeSMauro Carvalho Chehab  under /sys/kernel/debug/fail_function. No boot option supported.
4710ffebbeSMauro Carvalho Chehab
4810ffebbeSMauro Carvalho Chehab- NVMe fault injection
4910ffebbeSMauro Carvalho Chehab
5010ffebbeSMauro Carvalho Chehab  inject NVMe status code and retry flag on devices permitted by setting
5110ffebbeSMauro Carvalho Chehab  debugfs entries under /sys/kernel/debug/nvme*/fault_inject. The default
5210ffebbeSMauro Carvalho Chehab  status code is NVME_SC_INVALID_OPCODE with no retry. The status code and
5310ffebbeSMauro Carvalho Chehab  retry flag can be set via the debugfs.
5410ffebbeSMauro Carvalho Chehab
55bb4c19e0SAkinobu Mita- Null test block driver fault injection
56bb4c19e0SAkinobu Mita
57bb4c19e0SAkinobu Mita  inject IO timeouts by setting config items under
58bb4c19e0SAkinobu Mita  /sys/kernel/config/nullb/<disk>/timeout_inject,
59bb4c19e0SAkinobu Mita  inject requeue requests by setting config items under
60bb4c19e0SAkinobu Mita  /sys/kernel/config/nullb/<disk>/requeue_inject, and
61bb4c19e0SAkinobu Mita  inject init_hctx() errors by setting config items under
62bb4c19e0SAkinobu Mita  /sys/kernel/config/nullb/<disk>/init_hctx_fault_inject.
6310ffebbeSMauro Carvalho Chehab
6410ffebbeSMauro Carvalho ChehabConfigure fault-injection capabilities behavior
6510ffebbeSMauro Carvalho Chehab-----------------------------------------------
6610ffebbeSMauro Carvalho Chehab
6710ffebbeSMauro Carvalho Chehabdebugfs entries
6810ffebbeSMauro Carvalho Chehab^^^^^^^^^^^^^^^
6910ffebbeSMauro Carvalho Chehab
7010ffebbeSMauro Carvalho Chehabfault-inject-debugfs kernel module provides some debugfs entries for runtime
7110ffebbeSMauro Carvalho Chehabconfiguration of fault-injection capabilities.
7210ffebbeSMauro Carvalho Chehab
7310ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/probability:
7410ffebbeSMauro Carvalho Chehab
7510ffebbeSMauro Carvalho Chehab	likelihood of failure injection, in percent.
7610ffebbeSMauro Carvalho Chehab
7710ffebbeSMauro Carvalho Chehab	Format: <percent>
7810ffebbeSMauro Carvalho Chehab
7910ffebbeSMauro Carvalho Chehab	Note that one-failure-per-hundred is a very high error rate
8010ffebbeSMauro Carvalho Chehab	for some testcases.  Consider setting probability=100 and configure
8110ffebbeSMauro Carvalho Chehab	/sys/kernel/debug/fail*/interval for such testcases.
8210ffebbeSMauro Carvalho Chehab
8310ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/interval:
8410ffebbeSMauro Carvalho Chehab
8510ffebbeSMauro Carvalho Chehab	specifies the interval between failures, for calls to
8610ffebbeSMauro Carvalho Chehab	should_fail() that pass all the other tests.
8710ffebbeSMauro Carvalho Chehab
8810ffebbeSMauro Carvalho Chehab	Note that if you enable this, by setting interval>1, you will
8910ffebbeSMauro Carvalho Chehab	probably want to set probability=100.
9010ffebbeSMauro Carvalho Chehab
9110ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/times:
9210ffebbeSMauro Carvalho Chehab
9300574752SWolfram Sang	specifies how many times failures may happen at most. A value of -1
94d472cf79SAkinobu Mita	means "no limit".
9510ffebbeSMauro Carvalho Chehab
9610ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/space:
9710ffebbeSMauro Carvalho Chehab
9810ffebbeSMauro Carvalho Chehab	specifies an initial resource "budget", decremented by "size"
9910ffebbeSMauro Carvalho Chehab	on each call to should_fail(,size).  Failure injection is
10010ffebbeSMauro Carvalho Chehab	suppressed until "space" reaches zero.
10110ffebbeSMauro Carvalho Chehab
10210ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/verbose
10310ffebbeSMauro Carvalho Chehab
10410ffebbeSMauro Carvalho Chehab	Format: { 0 | 1 | 2 }
10510ffebbeSMauro Carvalho Chehab
10610ffebbeSMauro Carvalho Chehab	specifies the verbosity of the messages when failure is
10710ffebbeSMauro Carvalho Chehab	injected.  '0' means no messages; '1' will print only a single
10810ffebbeSMauro Carvalho Chehab	log line per failure; '2' will print a call trace too -- useful
10910ffebbeSMauro Carvalho Chehab	to debug the problems revealed by fault injection.
11010ffebbeSMauro Carvalho Chehab
11110ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/task-filter:
11210ffebbeSMauro Carvalho Chehab
11310ffebbeSMauro Carvalho Chehab	Format: { 'Y' | 'N' }
11410ffebbeSMauro Carvalho Chehab
11510ffebbeSMauro Carvalho Chehab	A value of 'N' disables filtering by process (default).
11610ffebbeSMauro Carvalho Chehab	Any positive value limits failures to only processes indicated by
11710ffebbeSMauro Carvalho Chehab	/proc/<pid>/make-it-fail==1.
11810ffebbeSMauro Carvalho Chehab
11910ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/require-start,
12010ffebbeSMauro Carvalho Chehab  /sys/kernel/debug/fail*/require-end,
12110ffebbeSMauro Carvalho Chehab  /sys/kernel/debug/fail*/reject-start,
12210ffebbeSMauro Carvalho Chehab  /sys/kernel/debug/fail*/reject-end:
12310ffebbeSMauro Carvalho Chehab
12410ffebbeSMauro Carvalho Chehab	specifies the range of virtual addresses tested during
12510ffebbeSMauro Carvalho Chehab	stacktrace walking.  Failure is injected only if some caller
12610ffebbeSMauro Carvalho Chehab	in the walked stacktrace lies within the required range, and
12710ffebbeSMauro Carvalho Chehab	none lies within the rejected range.
12810ffebbeSMauro Carvalho Chehab	Default required range is [0,ULONG_MAX) (whole of virtual address space).
12910ffebbeSMauro Carvalho Chehab	Default rejected range is [0,0).
13010ffebbeSMauro Carvalho Chehab
13110ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail*/stacktrace-depth:
13210ffebbeSMauro Carvalho Chehab
13310ffebbeSMauro Carvalho Chehab	specifies the maximum stacktrace depth walked during search
13410ffebbeSMauro Carvalho Chehab	for a caller within [require-start,require-end) OR
13510ffebbeSMauro Carvalho Chehab	[reject-start,reject-end).
13610ffebbeSMauro Carvalho Chehab
13710ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
13810ffebbeSMauro Carvalho Chehab
13910ffebbeSMauro Carvalho Chehab	Format: { 'Y' | 'N' }
14010ffebbeSMauro Carvalho Chehab
141bad3fbb2SDylan Yudaken	default is 'Y', setting it to 'N' will also inject failures into
142bad3fbb2SDylan Yudaken	highmem/user allocations (__GFP_HIGHMEM allocations).
14310ffebbeSMauro Carvalho Chehab
14410ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/failslab/ignore-gfp-wait:
14510ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
14610ffebbeSMauro Carvalho Chehab
14710ffebbeSMauro Carvalho Chehab	Format: { 'Y' | 'N' }
14810ffebbeSMauro Carvalho Chehab
149bad3fbb2SDylan Yudaken	default is 'Y', setting it to 'N' will also inject failures
150bad3fbb2SDylan Yudaken	into allocations that can sleep (__GFP_DIRECT_RECLAIM allocations).
15110ffebbeSMauro Carvalho Chehab
15210ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_page_alloc/min-order:
15310ffebbeSMauro Carvalho Chehab
15410ffebbeSMauro Carvalho Chehab	specifies the minimum page allocation order to be injected
15510ffebbeSMauro Carvalho Chehab	failures.
15610ffebbeSMauro Carvalho Chehab
15710ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_futex/ignore-private:
15810ffebbeSMauro Carvalho Chehab
15910ffebbeSMauro Carvalho Chehab	Format: { 'Y' | 'N' }
16010ffebbeSMauro Carvalho Chehab
16110ffebbeSMauro Carvalho Chehab	default is 'N', setting it to 'Y' will disable failure injections
16210ffebbeSMauro Carvalho Chehab	when dealing with private (address space) futexes.
16310ffebbeSMauro Carvalho Chehab
164400edd8cSChuck Lever- /sys/kernel/debug/fail_sunrpc/ignore-client-disconnect:
165400edd8cSChuck Lever
166400edd8cSChuck Lever	Format: { 'Y' | 'N' }
167400edd8cSChuck Lever
168400edd8cSChuck Lever	default is 'N', setting it to 'Y' will disable disconnect
169400edd8cSChuck Lever	injection on the RPC client.
170400edd8cSChuck Lever
171400edd8cSChuck Lever- /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect:
172400edd8cSChuck Lever
173400edd8cSChuck Lever	Format: { 'Y' | 'N' }
174400edd8cSChuck Lever
175400edd8cSChuck Lever	default is 'N', setting it to 'Y' will disable disconnect
176400edd8cSChuck Lever	injection on the RPC server.
177400edd8cSChuck Lever
17836f2ef2dSChuck Lever- /sys/kernel/debug/fail_sunrpc/ignore-cache-wait:
17936f2ef2dSChuck Lever
18036f2ef2dSChuck Lever	Format: { 'Y' | 'N' }
18136f2ef2dSChuck Lever
18236f2ef2dSChuck Lever	default is 'N', setting it to 'Y' will disable cache wait
18336f2ef2dSChuck Lever	injection on the RPC server.
18436f2ef2dSChuck Lever
18510ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_function/inject:
18610ffebbeSMauro Carvalho Chehab
18710ffebbeSMauro Carvalho Chehab	Format: { 'function-name' | '!function-name' | '' }
18810ffebbeSMauro Carvalho Chehab
18910ffebbeSMauro Carvalho Chehab	specifies the target function of error injection by name.
19010ffebbeSMauro Carvalho Chehab	If the function name leads '!' prefix, given function is
19110ffebbeSMauro Carvalho Chehab	removed from injection list. If nothing specified ('')
19210ffebbeSMauro Carvalho Chehab	injection list is cleared.
19310ffebbeSMauro Carvalho Chehab
19410ffebbeSMauro Carvalho Chehab- /sys/kernel/debug/fail_function/injectable:
19510ffebbeSMauro Carvalho Chehab
19610ffebbeSMauro Carvalho Chehab	(read only) shows error injectable functions and what type of
19710ffebbeSMauro Carvalho Chehab	error values can be specified. The error type will be one of
19810ffebbeSMauro Carvalho Chehab	below;
19910ffebbeSMauro Carvalho Chehab	- NULL:	retval must be 0.
20010ffebbeSMauro Carvalho Chehab	- ERRNO: retval must be -1 to -MAX_ERRNO (-4096).
20110ffebbeSMauro Carvalho Chehab	- ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096).
20210ffebbeSMauro Carvalho Chehab
20300574752SWolfram Sang- /sys/kernel/debug/fail_function/<function-name>/retval:
20410ffebbeSMauro Carvalho Chehab
20500574752SWolfram Sang	specifies the "error" return value to inject to the given function.
20600574752SWolfram Sang	This will be created when the user specifies a new injection entry.
20700574752SWolfram Sang	Note that this file only accepts unsigned values. So, if you want to
20800574752SWolfram Sang	use a negative errno, you better use 'printf' instead of 'echo', e.g.:
20900574752SWolfram Sang	$ printf %#x -12 > retval
21010ffebbeSMauro Carvalho Chehab
21110ffebbeSMauro Carvalho ChehabBoot option
21210ffebbeSMauro Carvalho Chehab^^^^^^^^^^^
21310ffebbeSMauro Carvalho Chehab
21410ffebbeSMauro Carvalho ChehabIn order to inject faults while debugfs is not available (early boot time),
21510ffebbeSMauro Carvalho Chehabuse the boot option::
21610ffebbeSMauro Carvalho Chehab
21710ffebbeSMauro Carvalho Chehab	failslab=
21810ffebbeSMauro Carvalho Chehab	fail_page_alloc=
2192c739cedSAlbert van der Linde	fail_usercopy=
22010ffebbeSMauro Carvalho Chehab	fail_make_request=
22110ffebbeSMauro Carvalho Chehab	fail_futex=
22210ffebbeSMauro Carvalho Chehab	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
22310ffebbeSMauro Carvalho Chehab
22410ffebbeSMauro Carvalho Chehabproc entries
22510ffebbeSMauro Carvalho Chehab^^^^^^^^^^^^
22610ffebbeSMauro Carvalho Chehab
22710ffebbeSMauro Carvalho Chehab- /proc/<pid>/fail-nth,
22810ffebbeSMauro Carvalho Chehab  /proc/self/task/<tid>/fail-nth:
22910ffebbeSMauro Carvalho Chehab
23010ffebbeSMauro Carvalho Chehab	Write to this file of integer N makes N-th call in the task fail.
23110ffebbeSMauro Carvalho Chehab	Read from this file returns a integer value. A value of '0' indicates
23210ffebbeSMauro Carvalho Chehab	that the fault setup with a previous write to this file was injected.
23310ffebbeSMauro Carvalho Chehab	A positive integer N indicates that the fault wasn't yet injected.
23410ffebbeSMauro Carvalho Chehab	Note that this file enables all types of faults (slab, futex, etc).
23510ffebbeSMauro Carvalho Chehab	This setting takes precedence over all other generic debugfs settings
23610ffebbeSMauro Carvalho Chehab	like probability, interval, times, etc. But per-capability settings
23710ffebbeSMauro Carvalho Chehab	(e.g. fail_futex/ignore-private) take precedence over it.
23810ffebbeSMauro Carvalho Chehab
23910ffebbeSMauro Carvalho Chehab	This feature is intended for systematic testing of faults in a single
24010ffebbeSMauro Carvalho Chehab	system call. See an example below.
24110ffebbeSMauro Carvalho Chehab
242bef7ec4eSMasami Hiramatsu (Google)
243bef7ec4eSMasami Hiramatsu (Google)Error Injectable Functions
244bef7ec4eSMasami Hiramatsu (Google)--------------------------
245bef7ec4eSMasami Hiramatsu (Google)
246*d56b699dSBjorn HelgaasThis part is for the kernel developers considering to add a function to
247bef7ec4eSMasami Hiramatsu (Google)ALLOW_ERROR_INJECTION() macro.
248bef7ec4eSMasami Hiramatsu (Google)
249bef7ec4eSMasami Hiramatsu (Google)Requirements for the Error Injectable Functions
250bef7ec4eSMasami Hiramatsu (Google)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
251bef7ec4eSMasami Hiramatsu (Google)
252bef7ec4eSMasami Hiramatsu (Google)Since the function-level error injection forcibly changes the code path
253bef7ec4eSMasami Hiramatsu (Google)and returns an error even if the input and conditions are proper, this can
254bef7ec4eSMasami Hiramatsu (Google)cause unexpected kernel crash if you allow error injection on the function
255bef7ec4eSMasami Hiramatsu (Google)which is NOT error injectable. Thus, you (and reviewers) must ensure;
256bef7ec4eSMasami Hiramatsu (Google)
257bef7ec4eSMasami Hiramatsu (Google)- The function returns an error code if it fails, and the callers must check
258bef7ec4eSMasami Hiramatsu (Google)  it correctly (need to recover from it).
259bef7ec4eSMasami Hiramatsu (Google)
260bef7ec4eSMasami Hiramatsu (Google)- The function does not execute any code which can change any state before
261bef7ec4eSMasami Hiramatsu (Google)  the first error return. The state includes global or local, or input
262bef7ec4eSMasami Hiramatsu (Google)  variable. For example, clear output address storage (e.g. `*ret = NULL`),
263bef7ec4eSMasami Hiramatsu (Google)  increments/decrements counter, set a flag, preempt/irq disable or get
264bef7ec4eSMasami Hiramatsu (Google)  a lock (if those are recovered before returning error, that will be OK.)
265bef7ec4eSMasami Hiramatsu (Google)
266bef7ec4eSMasami Hiramatsu (Google)The first requirement is important, and it will result in that the release
267bef7ec4eSMasami Hiramatsu (Google)(free objects) functions are usually harder to inject errors than allocate
268bef7ec4eSMasami Hiramatsu (Google)functions. If errors of such release functions are not correctly handled
269bef7ec4eSMasami Hiramatsu (Google)it will cause a memory leak easily (the caller will confuse that the object
270bef7ec4eSMasami Hiramatsu (Google)has been released or corrupted.)
271bef7ec4eSMasami Hiramatsu (Google)
272bef7ec4eSMasami Hiramatsu (Google)The second one is for the caller which expects the function should always
273bef7ec4eSMasami Hiramatsu (Google)does something. Thus if the function error injection skips whole of the
274bef7ec4eSMasami Hiramatsu (Google)function, the expectation is betrayed and causes an unexpected error.
275bef7ec4eSMasami Hiramatsu (Google)
276bef7ec4eSMasami Hiramatsu (Google)Type of the Error Injectable Functions
277bef7ec4eSMasami Hiramatsu (Google)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
278bef7ec4eSMasami Hiramatsu (Google)
279bef7ec4eSMasami Hiramatsu (Google)Each error injectable functions will have the error type specified by the
280bef7ec4eSMasami Hiramatsu (Google)ALLOW_ERROR_INJECTION() macro. You have to choose it carefully if you add
281bef7ec4eSMasami Hiramatsu (Google)a new error injectable function. If the wrong error type is chosen, the
282bef7ec4eSMasami Hiramatsu (Google)kernel may crash because it may not be able to handle the error.
283bef7ec4eSMasami Hiramatsu (Google)There are 4 types of errors defined in include/asm-generic/error-injection.h
284bef7ec4eSMasami Hiramatsu (Google)
285bef7ec4eSMasami Hiramatsu (Google)EI_ETYPE_NULL
286bef7ec4eSMasami Hiramatsu (Google)  This function will return `NULL` if it fails. e.g. return an allocateed
287bef7ec4eSMasami Hiramatsu (Google)  object address.
288bef7ec4eSMasami Hiramatsu (Google)
289bef7ec4eSMasami Hiramatsu (Google)EI_ETYPE_ERRNO
290bef7ec4eSMasami Hiramatsu (Google)  This function will return an `-errno` error code if it fails. e.g. return
291bef7ec4eSMasami Hiramatsu (Google)  -EINVAL if the input is wrong. This will include the functions which will
292bef7ec4eSMasami Hiramatsu (Google)  return an address which encodes `-errno` by ERR_PTR() macro.
293bef7ec4eSMasami Hiramatsu (Google)
294bef7ec4eSMasami Hiramatsu (Google)EI_ETYPE_ERRNO_NULL
295bef7ec4eSMasami Hiramatsu (Google)  This function will return an `-errno` or `NULL` if it fails. If the caller
296bef7ec4eSMasami Hiramatsu (Google)  of this function checks the return value with IS_ERR_OR_NULL() macro, this
297bef7ec4eSMasami Hiramatsu (Google)  type will be appropriate.
298bef7ec4eSMasami Hiramatsu (Google)
299bef7ec4eSMasami Hiramatsu (Google)EI_ETYPE_TRUE
300bef7ec4eSMasami Hiramatsu (Google)  This function will return `true` (non-zero positive value) if it fails.
301bef7ec4eSMasami Hiramatsu (Google)
302bef7ec4eSMasami Hiramatsu (Google)If you specifies a wrong type, for example, EI_TYPE_ERRNO for the function
303bef7ec4eSMasami Hiramatsu (Google)which returns an allocated object, it may cause a problem because the returned
304bef7ec4eSMasami Hiramatsu (Google)value is not an object address and the caller can not access to the address.
305bef7ec4eSMasami Hiramatsu (Google)
306bef7ec4eSMasami Hiramatsu (Google)
30710ffebbeSMauro Carvalho ChehabHow to add new fault injection capability
30810ffebbeSMauro Carvalho Chehab-----------------------------------------
30910ffebbeSMauro Carvalho Chehab
31010ffebbeSMauro Carvalho Chehab- #include <linux/fault-inject.h>
31110ffebbeSMauro Carvalho Chehab
31210ffebbeSMauro Carvalho Chehab- define the fault attributes
31310ffebbeSMauro Carvalho Chehab
31410ffebbeSMauro Carvalho Chehab  DECLARE_FAULT_ATTR(name);
31510ffebbeSMauro Carvalho Chehab
31610ffebbeSMauro Carvalho Chehab  Please see the definition of struct fault_attr in fault-inject.h
31710ffebbeSMauro Carvalho Chehab  for details.
31810ffebbeSMauro Carvalho Chehab
31910ffebbeSMauro Carvalho Chehab- provide a way to configure fault attributes
32010ffebbeSMauro Carvalho Chehab
32110ffebbeSMauro Carvalho Chehab- boot option
32210ffebbeSMauro Carvalho Chehab
32310ffebbeSMauro Carvalho Chehab  If you need to enable the fault injection capability from boot time, you can
32410ffebbeSMauro Carvalho Chehab  provide boot option to configure it. There is a helper function for it:
32510ffebbeSMauro Carvalho Chehab
32610ffebbeSMauro Carvalho Chehab	setup_fault_attr(attr, str);
32710ffebbeSMauro Carvalho Chehab
32810ffebbeSMauro Carvalho Chehab- debugfs entries
32910ffebbeSMauro Carvalho Chehab
3302c739cedSAlbert van der Linde  failslab, fail_page_alloc, fail_usercopy, and fail_make_request use this way.
33110ffebbeSMauro Carvalho Chehab  Helper functions:
33210ffebbeSMauro Carvalho Chehab
33310ffebbeSMauro Carvalho Chehab	fault_create_debugfs_attr(name, parent, attr);
33410ffebbeSMauro Carvalho Chehab
33510ffebbeSMauro Carvalho Chehab- module parameters
33610ffebbeSMauro Carvalho Chehab
33710ffebbeSMauro Carvalho Chehab  If the scope of the fault injection capability is limited to a
33810ffebbeSMauro Carvalho Chehab  single kernel module, it is better to provide module parameters to
33910ffebbeSMauro Carvalho Chehab  configure the fault attributes.
34010ffebbeSMauro Carvalho Chehab
34110ffebbeSMauro Carvalho Chehab- add a hook to insert failures
34210ffebbeSMauro Carvalho Chehab
34310ffebbeSMauro Carvalho Chehab  Upon should_fail() returning true, client code should inject a failure:
34410ffebbeSMauro Carvalho Chehab
34510ffebbeSMauro Carvalho Chehab	should_fail(attr, size);
34610ffebbeSMauro Carvalho Chehab
34710ffebbeSMauro Carvalho ChehabApplication Examples
34810ffebbeSMauro Carvalho Chehab--------------------
34910ffebbeSMauro Carvalho Chehab
35010ffebbeSMauro Carvalho Chehab- Inject slab allocation failures into module init/exit code::
35110ffebbeSMauro Carvalho Chehab
35210ffebbeSMauro Carvalho Chehab    #!/bin/bash
35310ffebbeSMauro Carvalho Chehab
35410ffebbeSMauro Carvalho Chehab    FAILTYPE=failslab
35510ffebbeSMauro Carvalho Chehab    echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
35610ffebbeSMauro Carvalho Chehab    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
35710ffebbeSMauro Carvalho Chehab    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
358d472cf79SAkinobu Mita    echo -1 > /sys/kernel/debug/$FAILTYPE/times
35910ffebbeSMauro Carvalho Chehab    echo 0 > /sys/kernel/debug/$FAILTYPE/space
36010ffebbeSMauro Carvalho Chehab    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
361bad3fbb2SDylan Yudaken    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
36210ffebbeSMauro Carvalho Chehab
36310ffebbeSMauro Carvalho Chehab    faulty_system()
36410ffebbeSMauro Carvalho Chehab    {
36510ffebbeSMauro Carvalho Chehab	bash -c "echo 1 > /proc/self/make-it-fail && exec $*"
36610ffebbeSMauro Carvalho Chehab    }
36710ffebbeSMauro Carvalho Chehab
36810ffebbeSMauro Carvalho Chehab    if [ $# -eq 0 ]
36910ffebbeSMauro Carvalho Chehab    then
37010ffebbeSMauro Carvalho Chehab	echo "Usage: $0 modulename [ modulename ... ]"
37110ffebbeSMauro Carvalho Chehab	exit 1
37210ffebbeSMauro Carvalho Chehab    fi
37310ffebbeSMauro Carvalho Chehab
37410ffebbeSMauro Carvalho Chehab    for m in $*
37510ffebbeSMauro Carvalho Chehab    do
37610ffebbeSMauro Carvalho Chehab	echo inserting $m...
37710ffebbeSMauro Carvalho Chehab	faulty_system modprobe $m
37810ffebbeSMauro Carvalho Chehab
37910ffebbeSMauro Carvalho Chehab	echo removing $m...
38010ffebbeSMauro Carvalho Chehab	faulty_system modprobe -r $m
38110ffebbeSMauro Carvalho Chehab    done
38210ffebbeSMauro Carvalho Chehab
38310ffebbeSMauro Carvalho Chehab------------------------------------------------------------------------------
38410ffebbeSMauro Carvalho Chehab
38510ffebbeSMauro Carvalho Chehab- Inject page allocation failures only for a specific module::
38610ffebbeSMauro Carvalho Chehab
38710ffebbeSMauro Carvalho Chehab    #!/bin/bash
38810ffebbeSMauro Carvalho Chehab
38910ffebbeSMauro Carvalho Chehab    FAILTYPE=fail_page_alloc
39010ffebbeSMauro Carvalho Chehab    module=$1
39110ffebbeSMauro Carvalho Chehab
39210ffebbeSMauro Carvalho Chehab    if [ -z $module ]
39310ffebbeSMauro Carvalho Chehab    then
39410ffebbeSMauro Carvalho Chehab	echo "Usage: $0 <modulename>"
39510ffebbeSMauro Carvalho Chehab	exit 1
39610ffebbeSMauro Carvalho Chehab    fi
39710ffebbeSMauro Carvalho Chehab
39810ffebbeSMauro Carvalho Chehab    modprobe $module
39910ffebbeSMauro Carvalho Chehab
40010ffebbeSMauro Carvalho Chehab    if [ ! -d /sys/module/$module/sections ]
40110ffebbeSMauro Carvalho Chehab    then
40210ffebbeSMauro Carvalho Chehab	echo Module $module is not loaded
40310ffebbeSMauro Carvalho Chehab	exit 1
40410ffebbeSMauro Carvalho Chehab    fi
40510ffebbeSMauro Carvalho Chehab
40610ffebbeSMauro Carvalho Chehab    cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
40710ffebbeSMauro Carvalho Chehab    cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
40810ffebbeSMauro Carvalho Chehab
40910ffebbeSMauro Carvalho Chehab    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
41010ffebbeSMauro Carvalho Chehab    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
41110ffebbeSMauro Carvalho Chehab    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
412d472cf79SAkinobu Mita    echo -1 > /sys/kernel/debug/$FAILTYPE/times
41310ffebbeSMauro Carvalho Chehab    echo 0 > /sys/kernel/debug/$FAILTYPE/space
41410ffebbeSMauro Carvalho Chehab    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
415bad3fbb2SDylan Yudaken    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
416bad3fbb2SDylan Yudaken    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
41710ffebbeSMauro Carvalho Chehab    echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
41810ffebbeSMauro Carvalho Chehab
41910ffebbeSMauro Carvalho Chehab    trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
42010ffebbeSMauro Carvalho Chehab
42110ffebbeSMauro Carvalho Chehab    echo "Injecting errors into the module $module... (interrupt to stop)"
42210ffebbeSMauro Carvalho Chehab    sleep 1000000
42310ffebbeSMauro Carvalho Chehab
42410ffebbeSMauro Carvalho Chehab------------------------------------------------------------------------------
42510ffebbeSMauro Carvalho Chehab
42610ffebbeSMauro Carvalho Chehab- Inject open_ctree error while btrfs mount::
42710ffebbeSMauro Carvalho Chehab
42810ffebbeSMauro Carvalho Chehab    #!/bin/bash
42910ffebbeSMauro Carvalho Chehab
43010ffebbeSMauro Carvalho Chehab    rm -f testfile.img
43110ffebbeSMauro Carvalho Chehab    dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
43210ffebbeSMauro Carvalho Chehab    DEVICE=$(losetup --show -f testfile.img)
43310ffebbeSMauro Carvalho Chehab    mkfs.btrfs -f $DEVICE
43410ffebbeSMauro Carvalho Chehab    mkdir -p tmpmnt
43510ffebbeSMauro Carvalho Chehab
43610ffebbeSMauro Carvalho Chehab    FAILTYPE=fail_function
43710ffebbeSMauro Carvalho Chehab    FAILFUNC=open_ctree
43810ffebbeSMauro Carvalho Chehab    echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject
43900574752SWolfram Sang    printf %#x -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval
44010ffebbeSMauro Carvalho Chehab    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
44110ffebbeSMauro Carvalho Chehab    echo 100 > /sys/kernel/debug/$FAILTYPE/probability
44210ffebbeSMauro Carvalho Chehab    echo 0 > /sys/kernel/debug/$FAILTYPE/interval
443d472cf79SAkinobu Mita    echo -1 > /sys/kernel/debug/$FAILTYPE/times
44410ffebbeSMauro Carvalho Chehab    echo 0 > /sys/kernel/debug/$FAILTYPE/space
44510ffebbeSMauro Carvalho Chehab    echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
44610ffebbeSMauro Carvalho Chehab
44710ffebbeSMauro Carvalho Chehab    mount -t btrfs $DEVICE tmpmnt
44810ffebbeSMauro Carvalho Chehab    if [ $? -ne 0 ]
44910ffebbeSMauro Carvalho Chehab    then
45010ffebbeSMauro Carvalho Chehab	echo "SUCCESS!"
45110ffebbeSMauro Carvalho Chehab    else
45210ffebbeSMauro Carvalho Chehab	echo "FAILED!"
45310ffebbeSMauro Carvalho Chehab	umount tmpmnt
45410ffebbeSMauro Carvalho Chehab    fi
45510ffebbeSMauro Carvalho Chehab
45610ffebbeSMauro Carvalho Chehab    echo > /sys/kernel/debug/$FAILTYPE/inject
45710ffebbeSMauro Carvalho Chehab
45810ffebbeSMauro Carvalho Chehab    rmdir tmpmnt
45910ffebbeSMauro Carvalho Chehab    losetup -d $DEVICE
46010ffebbeSMauro Carvalho Chehab    rm testfile.img
46110ffebbeSMauro Carvalho Chehab
46210ffebbeSMauro Carvalho Chehab
46310ffebbeSMauro Carvalho ChehabTool to run command with failslab or fail_page_alloc
46410ffebbeSMauro Carvalho Chehab----------------------------------------------------
46510ffebbeSMauro Carvalho ChehabIn order to make it easier to accomplish the tasks mentioned above, we can use
46610ffebbeSMauro Carvalho Chehabtools/testing/fault-injection/failcmd.sh.  Please run a command
46710ffebbeSMauro Carvalho Chehab"./tools/testing/fault-injection/failcmd.sh --help" for more information and
46810ffebbeSMauro Carvalho Chehabsee the following examples.
46910ffebbeSMauro Carvalho Chehab
47010ffebbeSMauro Carvalho ChehabExamples:
47110ffebbeSMauro Carvalho Chehab
47210ffebbeSMauro Carvalho ChehabRun a command "make -C tools/testing/selftests/ run_tests" with injecting slab
47310ffebbeSMauro Carvalho Chehaballocation failure::
47410ffebbeSMauro Carvalho Chehab
47510ffebbeSMauro Carvalho Chehab	# ./tools/testing/fault-injection/failcmd.sh \
47610ffebbeSMauro Carvalho Chehab		-- make -C tools/testing/selftests/ run_tests
47710ffebbeSMauro Carvalho Chehab
47810ffebbeSMauro Carvalho ChehabSame as above except to specify 100 times failures at most instead of one time
47910ffebbeSMauro Carvalho Chehabat most by default::
48010ffebbeSMauro Carvalho Chehab
48110ffebbeSMauro Carvalho Chehab	# ./tools/testing/fault-injection/failcmd.sh --times=100 \
48210ffebbeSMauro Carvalho Chehab		-- make -C tools/testing/selftests/ run_tests
48310ffebbeSMauro Carvalho Chehab
48410ffebbeSMauro Carvalho ChehabSame as above except to inject page allocation failure instead of slab
48510ffebbeSMauro Carvalho Chehaballocation failure::
48610ffebbeSMauro Carvalho Chehab
48710ffebbeSMauro Carvalho Chehab	# env FAILCMD_TYPE=fail_page_alloc \
48810ffebbeSMauro Carvalho Chehab		./tools/testing/fault-injection/failcmd.sh --times=100 \
48910ffebbeSMauro Carvalho Chehab		-- make -C tools/testing/selftests/ run_tests
49010ffebbeSMauro Carvalho Chehab
49110ffebbeSMauro Carvalho ChehabSystematic faults using fail-nth
49210ffebbeSMauro Carvalho Chehab---------------------------------
49310ffebbeSMauro Carvalho Chehab
49410ffebbeSMauro Carvalho ChehabThe following code systematically faults 0-th, 1-st, 2-nd and so on
49510ffebbeSMauro Carvalho Chehabcapabilities in the socketpair() system call::
49610ffebbeSMauro Carvalho Chehab
49710ffebbeSMauro Carvalho Chehab  #include <sys/types.h>
49810ffebbeSMauro Carvalho Chehab  #include <sys/stat.h>
49910ffebbeSMauro Carvalho Chehab  #include <sys/socket.h>
50010ffebbeSMauro Carvalho Chehab  #include <sys/syscall.h>
50110ffebbeSMauro Carvalho Chehab  #include <fcntl.h>
50210ffebbeSMauro Carvalho Chehab  #include <unistd.h>
50310ffebbeSMauro Carvalho Chehab  #include <string.h>
50410ffebbeSMauro Carvalho Chehab  #include <stdlib.h>
50510ffebbeSMauro Carvalho Chehab  #include <stdio.h>
50610ffebbeSMauro Carvalho Chehab  #include <errno.h>
50710ffebbeSMauro Carvalho Chehab
50810ffebbeSMauro Carvalho Chehab  int main()
50910ffebbeSMauro Carvalho Chehab  {
51010ffebbeSMauro Carvalho Chehab	int i, err, res, fail_nth, fds[2];
51110ffebbeSMauro Carvalho Chehab	char buf[128];
51210ffebbeSMauro Carvalho Chehab
51310ffebbeSMauro Carvalho Chehab	system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
51410ffebbeSMauro Carvalho Chehab	sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
51510ffebbeSMauro Carvalho Chehab	fail_nth = open(buf, O_RDWR);
51610ffebbeSMauro Carvalho Chehab	for (i = 1;; i++) {
51710ffebbeSMauro Carvalho Chehab		sprintf(buf, "%d", i);
51810ffebbeSMauro Carvalho Chehab		write(fail_nth, buf, strlen(buf));
51910ffebbeSMauro Carvalho Chehab		res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
52010ffebbeSMauro Carvalho Chehab		err = errno;
52110ffebbeSMauro Carvalho Chehab		pread(fail_nth, buf, sizeof(buf), 0);
52210ffebbeSMauro Carvalho Chehab		if (res == 0) {
52310ffebbeSMauro Carvalho Chehab			close(fds[0]);
52410ffebbeSMauro Carvalho Chehab			close(fds[1]);
52510ffebbeSMauro Carvalho Chehab		}
52610ffebbeSMauro Carvalho Chehab		printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
52710ffebbeSMauro Carvalho Chehab			res, err);
52810ffebbeSMauro Carvalho Chehab		if (atoi(buf))
52910ffebbeSMauro Carvalho Chehab			break;
53010ffebbeSMauro Carvalho Chehab	}
53110ffebbeSMauro Carvalho Chehab	return 0;
53210ffebbeSMauro Carvalho Chehab  }
53310ffebbeSMauro Carvalho Chehab
53410ffebbeSMauro Carvalho ChehabAn example output::
53510ffebbeSMauro Carvalho Chehab
53610ffebbeSMauro Carvalho Chehab	1-th fault Y: res=-1/23
53710ffebbeSMauro Carvalho Chehab	2-th fault Y: res=-1/23
53810ffebbeSMauro Carvalho Chehab	3-th fault Y: res=-1/12
53910ffebbeSMauro Carvalho Chehab	4-th fault Y: res=-1/12
54010ffebbeSMauro Carvalho Chehab	5-th fault Y: res=-1/23
54110ffebbeSMauro Carvalho Chehab	6-th fault Y: res=-1/23
54210ffebbeSMauro Carvalho Chehab	7-th fault Y: res=-1/23
54310ffebbeSMauro Carvalho Chehab	8-th fault Y: res=-1/12
54410ffebbeSMauro Carvalho Chehab	9-th fault Y: res=-1/12
54510ffebbeSMauro Carvalho Chehab	10-th fault Y: res=-1/12
54610ffebbeSMauro Carvalho Chehab	11-th fault Y: res=-1/12
54710ffebbeSMauro Carvalho Chehab	12-th fault Y: res=-1/12
54810ffebbeSMauro Carvalho Chehab	13-th fault Y: res=-1/12
54910ffebbeSMauro Carvalho Chehab	14-th fault Y: res=-1/12
55010ffebbeSMauro Carvalho Chehab	15-th fault Y: res=-1/12
55110ffebbeSMauro Carvalho Chehab	16-th fault N: res=0/12
552