xref: /freebsd/sys/contrib/openzfs/cmd/zed/agents/README.md (revision e25152834cdf3b353892835a4f3b157e066a8ed4)
1*eda14cbcSMatt Macy## Fault Management Logic for ZED ##
2*eda14cbcSMatt Macy
3*eda14cbcSMatt MacyThe integration of Fault Management Daemon (FMD) logic from illumos
4*eda14cbcSMatt Macyis being deployed in three phases. This logic is encapsulated in
5*eda14cbcSMatt Macyseveral software modules inside ZED.
6*eda14cbcSMatt Macy
7*eda14cbcSMatt Macy### ZED+FM Phase 1 ###
8*eda14cbcSMatt Macy
9*eda14cbcSMatt MacyAll the phase 1 work is in current Master branch. Phase I work includes:
10*eda14cbcSMatt Macy
11*eda14cbcSMatt Macy* Add new paths to the persistent VDEV label for device matching.
12*eda14cbcSMatt Macy* Add a disk monitor for generating _disk-add_ and _disk-change_ events.
13*eda14cbcSMatt Macy* Add support for automated VDEV auto-online, auto-replace and auto-expand.
14*eda14cbcSMatt Macy* Expand the statechange event to include all VDEV state transitions.
15*eda14cbcSMatt Macy
16*eda14cbcSMatt Macy### ZED+FM Phase 2 (WIP) ###
17*eda14cbcSMatt Macy
18*eda14cbcSMatt MacyThe phase 2 work primarily entails the _Diagnosis Engine_ and the
19*eda14cbcSMatt Macy_Retire Agent_ modules. It also includes infrastructure to support a
20*eda14cbcSMatt Macycrude FMD environment to host these modules. For additional
21*eda14cbcSMatt Macyinformation see the **FMD Components in ZED** and **Implementation
22*eda14cbcSMatt MacyNotes** sections below.
23*eda14cbcSMatt Macy
24*eda14cbcSMatt Macy### ZED+FM Phase 3 ###
25*eda14cbcSMatt Macy
26*eda14cbcSMatt MacyFuture work will add additional functionality and will likely include:
27*eda14cbcSMatt Macy
28*eda14cbcSMatt Macy* Add FMD module garbage collection (periodically call `fmd_module_gc()`).
29*eda14cbcSMatt Macy* Add real module property retrieval (currently hard-coded in accessors).
30*eda14cbcSMatt Macy* Additional diagnosis telemetry (like latency outliers and SMART data).
31*eda14cbcSMatt Macy* Export FMD module statistics.
32*eda14cbcSMatt Macy* Zedlet parallel execution and resiliency (add watchdog).
33*eda14cbcSMatt Macy
34*eda14cbcSMatt Macy### ZFS Fault Management Overview ###
35*eda14cbcSMatt Macy
36*eda14cbcSMatt MacyThe primary purpose with ZFS fault management is automated diagnosis
37*eda14cbcSMatt Macyand isolation of VDEV faults. A fault is something we can associate
38*eda14cbcSMatt Macywith an impact (e.g. loss of data redundancy) and a corrective action
39*eda14cbcSMatt Macy(e.g. offline or replace a disk). A typical ZFS fault management stack
40*eda14cbcSMatt Macyis comprised of _error detectors_ (e.g. `zfs_ereport_post()`), a _disk
41*eda14cbcSMatt Macymonitor_, a _diagnosis engine_ and _response agents_.
42*eda14cbcSMatt Macy
43*eda14cbcSMatt MacyAfter detecting a software error, the ZFS kernel module sends error
44*eda14cbcSMatt Macyevents to the ZED user daemon which in turn routes the events to its
45*eda14cbcSMatt Macyinternal FMA modules based on their event subscriptions. Likewise, if
46*eda14cbcSMatt Macya disk is added or changed in the system, the disk monitor sends disk
47*eda14cbcSMatt Macyevents which are consumed by a response agent.
48*eda14cbcSMatt Macy
49*eda14cbcSMatt Macy### FMD Components in ZED ###
50*eda14cbcSMatt Macy
51*eda14cbcSMatt MacyThere are three FMD modules (aka agents) that are now built into ZED.
52*eda14cbcSMatt Macy
53*eda14cbcSMatt Macy  1. A _Diagnosis Engine_ module (`agents/zfs_diagnosis.c`)
54*eda14cbcSMatt Macy  2. A _Retire Agent_ module (`agents/zfs_retire.c`)
55*eda14cbcSMatt Macy  3. A _Disk Add Agent_ module (`agents/zfs_mod.c`)
56*eda14cbcSMatt Macy
57*eda14cbcSMatt MacyTo begin with, a **Diagnosis Engine** consumes per-vdev I/O and checksum
58*eda14cbcSMatt Macyereports and feeds them into a Soft Error Rate Discrimination (SERD)
59*eda14cbcSMatt Macyalgorithm which will generate a corresponding fault diagnosis when the
60*eda14cbcSMatt Macytracked VDEV encounters **N** events in a given **T** time window. The
61*eda14cbcSMatt Macyinitial N and T values for the SERD algorithm are estimates inherited
62*eda14cbcSMatt Macyfrom illumos (10 errors in 10 minutes).
63*eda14cbcSMatt Macy
64*eda14cbcSMatt MacyIn turn, a **Retire Agent** responds to diagnosed faults by isolating
65*eda14cbcSMatt Macythe faulty VDEV. It will notify the ZFS kernel module of the new VDEV
66*eda14cbcSMatt Macystate (degraded or faulted). The retire agent is also responsible for
67*eda14cbcSMatt Macymanaging hot spares across all pools. When it encounters a device fault
68*eda14cbcSMatt Macyor a device removal it will replace the device with an appropriate
69*eda14cbcSMatt Macyspare if available.
70*eda14cbcSMatt Macy
71*eda14cbcSMatt MacyFinally, a **Disk Add Agent** responds to events from a libudev disk
72*eda14cbcSMatt Macymonitor (`EC_DEV_ADD` or `EC_DEV_STATUS`) and will online, replace or
73*eda14cbcSMatt Macyexpand the associated VDEV. This agent is also known as the `zfs_mod`
74*eda14cbcSMatt Macyor Sysevent Loadable Module (SLM) on the illumos platform. The added
75*eda14cbcSMatt Macydisk is matched to a specific VDEV using its device id, physical path
76*eda14cbcSMatt Macyor VDEV GUID.
77*eda14cbcSMatt Macy
78*eda14cbcSMatt MacyNote that the _auto-replace_ feature (aka hot plug) is opt-in and you
79*eda14cbcSMatt Macymust set the pool's `autoreplace` property to enable it. The new disk
80*eda14cbcSMatt Macywill be matched to the corresponding leaf VDEV by physical location
81*eda14cbcSMatt Macyand labeled with a GPT partition before replacing the original VDEV
82*eda14cbcSMatt Macyin the pool.
83*eda14cbcSMatt Macy
84*eda14cbcSMatt Macy### Implementation Notes ###
85*eda14cbcSMatt Macy
86*eda14cbcSMatt Macy* The FMD module API required for logic modules is emulated and implemented
87*eda14cbcSMatt Macy  in the `fmd_api.c` and `fmd_serd.c` source files. This support includes
88*eda14cbcSMatt Macy  module registration, memory allocation, module property accessors, basic
89*eda14cbcSMatt Macy  case management, one-shot timers and SERD engines.
90*eda14cbcSMatt Macy  For detailed information on the FMD module API, see the document --
91*eda14cbcSMatt Macy  _"Fault Management Daemon Programmer's Reference Manual"_.
92*eda14cbcSMatt Macy
93*eda14cbcSMatt Macy* The event subscriptions for the modules (located in a module specific
94*eda14cbcSMatt Macy  configuration file on illumos) are currently hard-coded into the ZED
95*eda14cbcSMatt Macy  `zfs_agent_dispatch()` function.
96*eda14cbcSMatt Macy
97*eda14cbcSMatt Macy* The FMD modules are called one at a time from a single thread that
98*eda14cbcSMatt Macy  consumes events queued to the modules. These events are sourced from
99*eda14cbcSMatt Macy  the normal ZED events and also include events posted from the diagnosis
100*eda14cbcSMatt Macy  engine and the libudev disk event monitor.
101*eda14cbcSMatt Macy
102*eda14cbcSMatt Macy* The FMD code modules have minimal changes and were intentionally left
103*eda14cbcSMatt Macy  as similar as possible to their upstream source files.
104*eda14cbcSMatt Macy
105*eda14cbcSMatt Macy* The sysevent namespace in ZED differs from illumos. For example:
106*eda14cbcSMatt Macy    * illumos uses `"resource.sysevent.EC_zfs.ESC_ZFS_vdev_remove"`
107*eda14cbcSMatt Macy    * Linux uses `"sysevent.fs.zfs.vdev_remove"`
108*eda14cbcSMatt Macy
109*eda14cbcSMatt Macy* The FMD Modules port was produced by Intel Federal, LLC under award
110*eda14cbcSMatt Macy  number B609815 between the U.S. Department of Energy (DOE) and Intel
111*eda14cbcSMatt Macy  Federal, LLC.
112*eda14cbcSMatt Macy
113