1*eda14cbcSMatt Macy## Fault Management Logic for ZED ## 2*eda14cbcSMatt Macy 3*eda14cbcSMatt MacyThe integration of Fault Management Daemon (FMD) logic from illumos 4*eda14cbcSMatt Macyis being deployed in three phases. This logic is encapsulated in 5*eda14cbcSMatt Macyseveral software modules inside ZED. 6*eda14cbcSMatt Macy 7*eda14cbcSMatt Macy### ZED+FM Phase 1 ### 8*eda14cbcSMatt Macy 9*eda14cbcSMatt MacyAll the phase 1 work is in current Master branch. Phase I work includes: 10*eda14cbcSMatt Macy 11*eda14cbcSMatt Macy* Add new paths to the persistent VDEV label for device matching. 12*eda14cbcSMatt Macy* Add a disk monitor for generating _disk-add_ and _disk-change_ events. 13*eda14cbcSMatt Macy* Add support for automated VDEV auto-online, auto-replace and auto-expand. 14*eda14cbcSMatt Macy* Expand the statechange event to include all VDEV state transitions. 15*eda14cbcSMatt Macy 16*eda14cbcSMatt Macy### ZED+FM Phase 2 (WIP) ### 17*eda14cbcSMatt Macy 18*eda14cbcSMatt MacyThe phase 2 work primarily entails the _Diagnosis Engine_ and the 19*eda14cbcSMatt Macy_Retire Agent_ modules. It also includes infrastructure to support a 20*eda14cbcSMatt Macycrude FMD environment to host these modules. For additional 21*eda14cbcSMatt Macyinformation see the **FMD Components in ZED** and **Implementation 22*eda14cbcSMatt MacyNotes** sections below. 23*eda14cbcSMatt Macy 24*eda14cbcSMatt Macy### ZED+FM Phase 3 ### 25*eda14cbcSMatt Macy 26*eda14cbcSMatt MacyFuture work will add additional functionality and will likely include: 27*eda14cbcSMatt Macy 28*eda14cbcSMatt Macy* Add FMD module garbage collection (periodically call `fmd_module_gc()`). 29*eda14cbcSMatt Macy* Add real module property retrieval (currently hard-coded in accessors). 30*eda14cbcSMatt Macy* Additional diagnosis telemetry (like latency outliers and SMART data). 31*eda14cbcSMatt Macy* Export FMD module statistics. 32*eda14cbcSMatt Macy* Zedlet parallel execution and resiliency (add watchdog). 33*eda14cbcSMatt Macy 34*eda14cbcSMatt Macy### ZFS Fault Management Overview ### 35*eda14cbcSMatt Macy 36*eda14cbcSMatt MacyThe primary purpose with ZFS fault management is automated diagnosis 37*eda14cbcSMatt Macyand isolation of VDEV faults. A fault is something we can associate 38*eda14cbcSMatt Macywith an impact (e.g. loss of data redundancy) and a corrective action 39*eda14cbcSMatt Macy(e.g. offline or replace a disk). A typical ZFS fault management stack 40*eda14cbcSMatt Macyis comprised of _error detectors_ (e.g. `zfs_ereport_post()`), a _disk 41*eda14cbcSMatt Macymonitor_, a _diagnosis engine_ and _response agents_. 42*eda14cbcSMatt Macy 43*eda14cbcSMatt MacyAfter detecting a software error, the ZFS kernel module sends error 44*eda14cbcSMatt Macyevents to the ZED user daemon which in turn routes the events to its 45*eda14cbcSMatt Macyinternal FMA modules based on their event subscriptions. Likewise, if 46*eda14cbcSMatt Macya disk is added or changed in the system, the disk monitor sends disk 47*eda14cbcSMatt Macyevents which are consumed by a response agent. 48*eda14cbcSMatt Macy 49*eda14cbcSMatt Macy### FMD Components in ZED ### 50*eda14cbcSMatt Macy 51*eda14cbcSMatt MacyThere are three FMD modules (aka agents) that are now built into ZED. 52*eda14cbcSMatt Macy 53*eda14cbcSMatt Macy 1. A _Diagnosis Engine_ module (`agents/zfs_diagnosis.c`) 54*eda14cbcSMatt Macy 2. A _Retire Agent_ module (`agents/zfs_retire.c`) 55*eda14cbcSMatt Macy 3. A _Disk Add Agent_ module (`agents/zfs_mod.c`) 56*eda14cbcSMatt Macy 57*eda14cbcSMatt MacyTo begin with, a **Diagnosis Engine** consumes per-vdev I/O and checksum 58*eda14cbcSMatt Macyereports and feeds them into a Soft Error Rate Discrimination (SERD) 59*eda14cbcSMatt Macyalgorithm which will generate a corresponding fault diagnosis when the 60*eda14cbcSMatt Macytracked VDEV encounters **N** events in a given **T** time window. The 61*eda14cbcSMatt Macyinitial N and T values for the SERD algorithm are estimates inherited 62*eda14cbcSMatt Macyfrom illumos (10 errors in 10 minutes). 63*eda14cbcSMatt Macy 64*eda14cbcSMatt MacyIn turn, a **Retire Agent** responds to diagnosed faults by isolating 65*eda14cbcSMatt Macythe faulty VDEV. It will notify the ZFS kernel module of the new VDEV 66*eda14cbcSMatt Macystate (degraded or faulted). The retire agent is also responsible for 67*eda14cbcSMatt Macymanaging hot spares across all pools. When it encounters a device fault 68*eda14cbcSMatt Macyor a device removal it will replace the device with an appropriate 69*eda14cbcSMatt Macyspare if available. 70*eda14cbcSMatt Macy 71*eda14cbcSMatt MacyFinally, a **Disk Add Agent** responds to events from a libudev disk 72*eda14cbcSMatt Macymonitor (`EC_DEV_ADD` or `EC_DEV_STATUS`) and will online, replace or 73*eda14cbcSMatt Macyexpand the associated VDEV. This agent is also known as the `zfs_mod` 74*eda14cbcSMatt Macyor Sysevent Loadable Module (SLM) on the illumos platform. The added 75*eda14cbcSMatt Macydisk is matched to a specific VDEV using its device id, physical path 76*eda14cbcSMatt Macyor VDEV GUID. 77*eda14cbcSMatt Macy 78*eda14cbcSMatt MacyNote that the _auto-replace_ feature (aka hot plug) is opt-in and you 79*eda14cbcSMatt Macymust set the pool's `autoreplace` property to enable it. The new disk 80*eda14cbcSMatt Macywill be matched to the corresponding leaf VDEV by physical location 81*eda14cbcSMatt Macyand labeled with a GPT partition before replacing the original VDEV 82*eda14cbcSMatt Macyin the pool. 83*eda14cbcSMatt Macy 84*eda14cbcSMatt Macy### Implementation Notes ### 85*eda14cbcSMatt Macy 86*eda14cbcSMatt Macy* The FMD module API required for logic modules is emulated and implemented 87*eda14cbcSMatt Macy in the `fmd_api.c` and `fmd_serd.c` source files. This support includes 88*eda14cbcSMatt Macy module registration, memory allocation, module property accessors, basic 89*eda14cbcSMatt Macy case management, one-shot timers and SERD engines. 90*eda14cbcSMatt Macy For detailed information on the FMD module API, see the document -- 91*eda14cbcSMatt Macy _"Fault Management Daemon Programmer's Reference Manual"_. 92*eda14cbcSMatt Macy 93*eda14cbcSMatt Macy* The event subscriptions for the modules (located in a module specific 94*eda14cbcSMatt Macy configuration file on illumos) are currently hard-coded into the ZED 95*eda14cbcSMatt Macy `zfs_agent_dispatch()` function. 96*eda14cbcSMatt Macy 97*eda14cbcSMatt Macy* The FMD modules are called one at a time from a single thread that 98*eda14cbcSMatt Macy consumes events queued to the modules. These events are sourced from 99*eda14cbcSMatt Macy the normal ZED events and also include events posted from the diagnosis 100*eda14cbcSMatt Macy engine and the libudev disk event monitor. 101*eda14cbcSMatt Macy 102*eda14cbcSMatt Macy* The FMD code modules have minimal changes and were intentionally left 103*eda14cbcSMatt Macy as similar as possible to their upstream source files. 104*eda14cbcSMatt Macy 105*eda14cbcSMatt Macy* The sysevent namespace in ZED differs from illumos. For example: 106*eda14cbcSMatt Macy * illumos uses `"resource.sysevent.EC_zfs.ESC_ZFS_vdev_remove"` 107*eda14cbcSMatt Macy * Linux uses `"sysevent.fs.zfs.vdev_remove"` 108*eda14cbcSMatt Macy 109*eda14cbcSMatt Macy* The FMD Modules port was produced by Intel Federal, LLC under award 110*eda14cbcSMatt Macy number B609815 between the U.S. Department of Energy (DOE) and Intel 111*eda14cbcSMatt Macy Federal, LLC. 112*eda14cbcSMatt Macy 113