1## Fault Management Logic for ZED ## 2 3The integration of Fault Management Daemon (FMD) logic from illumos 4is being deployed in three phases. This logic is encapsulated in 5several software modules inside ZED. 6 7### ZED+FM Phase 1 ### 8 9All the phase 1 work is in current Master branch. Phase I work includes: 10 11* Add new paths to the persistent VDEV label for device matching. 12* Add a disk monitor for generating _disk-add_ and _disk-change_ events. 13* Add support for automated VDEV auto-online, auto-replace and auto-expand. 14* Expand the statechange event to include all VDEV state transitions. 15 16### ZED+FM Phase 2 (WIP) ### 17 18The phase 2 work primarily entails the _Diagnosis Engine_ and the 19_Retire Agent_ modules. It also includes infrastructure to support a 20crude FMD environment to host these modules. For additional 21information see the **FMD Components in ZED** and **Implementation 22Notes** sections below. 23 24### ZED+FM Phase 3 ### 25 26Future work will add additional functionality and will likely include: 27 28* Add FMD module garbage collection (periodically call `fmd_module_gc()`). 29* Add real module property retrieval (currently hard-coded in accessors). 30* Additional diagnosis telemetry (like latency outliers and SMART data). 31* Export FMD module statistics. 32* Zedlet parallel execution and resiliency (add watchdog). 33 34### ZFS Fault Management Overview ### 35 36The primary purpose with ZFS fault management is automated diagnosis 37and isolation of VDEV faults. A fault is something we can associate 38with an impact (e.g. loss of data redundancy) and a corrective action 39(e.g. offline or replace a disk). A typical ZFS fault management stack 40is comprised of _error detectors_ (e.g. `zfs_ereport_post()`), a _disk 41monitor_, a _diagnosis engine_ and _response agents_. 42 43After detecting a software error, the ZFS kernel module sends error 44events to the ZED user daemon which in turn routes the events to its 45internal FMA modules based on their event subscriptions. Likewise, if 46a disk is added or changed in the system, the disk monitor sends disk 47events which are consumed by a response agent. 48 49### FMD Components in ZED ### 50 51There are three FMD modules (aka agents) that are now built into ZED. 52 53 1. A _Diagnosis Engine_ module (`agents/zfs_diagnosis.c`) 54 2. A _Retire Agent_ module (`agents/zfs_retire.c`) 55 3. A _Disk Add Agent_ module (`agents/zfs_mod.c`) 56 57To begin with, a **Diagnosis Engine** consumes per-vdev I/O and checksum 58ereports and feeds them into a Soft Error Rate Discrimination (SERD) 59algorithm which will generate a corresponding fault diagnosis when the 60tracked VDEV encounters **N** events in a given **T** time window. The 61initial N and T values for the SERD algorithm are estimates inherited 62from illumos (10 errors in 10 minutes). 63 64In turn, a **Retire Agent** responds to diagnosed faults by isolating 65the faulty VDEV. It will notify the ZFS kernel module of the new VDEV 66state (degraded or faulted). The retire agent is also responsible for 67managing hot spares across all pools. When it encounters a device fault 68or a device removal it will replace the device with an appropriate 69spare if available. 70 71Finally, a **Disk Add Agent** responds to events from a libudev disk 72monitor (`EC_DEV_ADD` or `EC_DEV_STATUS`) and will online, replace or 73expand the associated VDEV. This agent is also known as the `zfs_mod` 74or Sysevent Loadable Module (SLM) on the illumos platform. The added 75disk is matched to a specific VDEV using its device id, physical path 76or VDEV GUID. 77 78Note that the _auto-replace_ feature (aka hot plug) is opt-in and you 79must set the pool's `autoreplace` property to enable it. The new disk 80will be matched to the corresponding leaf VDEV by physical location 81and labeled with a GPT partition before replacing the original VDEV 82in the pool. 83 84### Implementation Notes ### 85 86* The FMD module API required for logic modules is emulated and implemented 87 in the `fmd_api.c` and `fmd_serd.c` source files. This support includes 88 module registration, memory allocation, module property accessors, basic 89 case management, one-shot timers and SERD engines. 90 For detailed information on the FMD module API, see the document -- 91 _"Fault Management Daemon Programmer's Reference Manual"_. 92 93* The event subscriptions for the modules (located in a module specific 94 configuration file on illumos) are currently hard-coded into the ZED 95 `zfs_agent_dispatch()` function. 96 97* The FMD modules are called one at a time from a single thread that 98 consumes events queued to the modules. These events are sourced from 99 the normal ZED events and also include events posted from the diagnosis 100 engine and the libudev disk event monitor. 101 102* The FMD code modules have minimal changes and were intentionally left 103 as similar as possible to their upstream source files. 104 105* The sysevent namespace in ZED differs from illumos. For example: 106 * illumos uses `"resource.sysevent.EC_zfs.ESC_ZFS_vdev_remove"` 107 * Linux uses `"sysevent.fs.zfs.vdev_remove"` 108 109* The FMD Modules port was produced by Intel Federal, LLC under award 110 number B609815 between the U.S. Department of Energy (DOE) and Intel 111 Federal, LLC. 112 113