1*7877fdebSMatt Macy# Influxdb Metrics for ZFS Pools 2*7877fdebSMatt MacyThe _zpool_influxdb_ program produces 3*7877fdebSMatt Macy[influxdb](https://github.com/influxdata/influxdb) line protocol 4*7877fdebSMatt Macycompatible metrics from zpools. In the UNIX tradition, _zpool_influxdb_ 5*7877fdebSMatt Macydoes one thing: read statistics from a pool and print them to 6*7877fdebSMatt Macystdout. In many ways, this is a metrics-friendly output of 7*7877fdebSMatt Macystatistics normally observed via the `zpool` command. 8*7877fdebSMatt Macy 9*7877fdebSMatt Macy## Usage 10*7877fdebSMatt MacyWhen run without arguments, _zpool_influxdb_ runs once, reading data 11*7877fdebSMatt Macyfrom all imported pools, and prints to stdout. 12*7877fdebSMatt Macy```shell 13*7877fdebSMatt Macyzpool_influxdb [options] [poolname] 14*7877fdebSMatt Macy``` 15*7877fdebSMatt MacyIf no poolname is specified, then all pools are sampled. 16*7877fdebSMatt Macy 17*7877fdebSMatt Macy| option | short option | description | 18*7877fdebSMatt Macy|---|---|---| 19*7877fdebSMatt Macy| --execd | -e | For use with telegraf's `execd` plugin. When [enter] is pressed, the pools are sampled. To exit, use [ctrl+D] | 20*7877fdebSMatt Macy| --no-histogram | -n | Do not print histogram information | 21*7877fdebSMatt Macy| --signed-int | -i | Use signed integer data type (default=unsigned) | 22*7877fdebSMatt Macy| --sum-histogram-buckets | -s | Sum histogram bucket values | 23*7877fdebSMatt Macy| --tags key=value[,key=value...] | -t | Add tags to data points. No tag sanity checking is performed. | 24*7877fdebSMatt Macy| --help | -h | Print a short usage message | 25*7877fdebSMatt Macy 26*7877fdebSMatt Macy#### Histogram Bucket Values 27*7877fdebSMatt MacyThe histogram data collected by ZFS is stored as independent bucket values. 28*7877fdebSMatt MacyThis works well out-of-the-box with an influxdb data source and grafana's 29*7877fdebSMatt Macyheatmap visualization. The influxdb query for a grafana heatmap 30*7877fdebSMatt Macyvisualization looks like: 31*7877fdebSMatt Macy``` 32*7877fdebSMatt Macyfield(disk_read) last() non_negative_derivative(1s) 33*7877fdebSMatt Macy``` 34*7877fdebSMatt Macy 35*7877fdebSMatt MacyAnother method for storing histogram data sums the values for lower-value 36*7877fdebSMatt Macybuckets. For example, a latency bucket tagged "le=10" includes the values 37*7877fdebSMatt Macyin the bucket "le=1". 38*7877fdebSMatt MacyThis method is often used for prometheus histograms. 39*7877fdebSMatt MacyThe `zpool_influxdb --sum-histogram-buckets` option presents the data from ZFS 40*7877fdebSMatt Macyas summed values. 41*7877fdebSMatt Macy 42*7877fdebSMatt Macy## Measurements 43*7877fdebSMatt MacyThe following measurements are collected: 44*7877fdebSMatt Macy 45*7877fdebSMatt Macy| measurement | description | zpool equivalent | 46*7877fdebSMatt Macy|---|---|---| 47*7877fdebSMatt Macy| zpool_stats | general size and data | zpool list | 48*7877fdebSMatt Macy| zpool_scan_stats | scrub, rebuild, and resilver statistics (omitted if no scan has been requested) | zpool status | 49*7877fdebSMatt Macy| zpool_vdev_stats | per-vdev statistics | zpool iostat -q | 50*7877fdebSMatt Macy| zpool_io_size | per-vdev I/O size histogram | zpool iostat -r | 51*7877fdebSMatt Macy| zpool_latency | per-vdev I/O latency histogram | zpool iostat -w | 52*7877fdebSMatt Macy| zpool_vdev_queue | per-vdev instantaneous queue depth | zpool iostat -q | 53*7877fdebSMatt Macy 54*7877fdebSMatt Macy### zpool_stats Description 55*7877fdebSMatt Macyzpool_stats contains top-level summary statistics for the pool. 56*7877fdebSMatt MacyPerformance counters measure the I/Os to the pool's devices. 57*7877fdebSMatt Macy 58*7877fdebSMatt Macy#### zpool_stats Tags 59*7877fdebSMatt Macy 60*7877fdebSMatt Macy| label | description | 61*7877fdebSMatt Macy|---|---| 62*7877fdebSMatt Macy| name | pool name | 63*7877fdebSMatt Macy| path | for leaf vdevs, the pathname | 64*7877fdebSMatt Macy| state | pool state, as shown by _zpool status_ | 65*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) | 66*7877fdebSMatt Macy 67*7877fdebSMatt Macy#### zpool_stats Fields 68*7877fdebSMatt Macy 69*7877fdebSMatt Macy| field | units | description | 70*7877fdebSMatt Macy|---|---|---| 71*7877fdebSMatt Macy| alloc | bytes | allocated space | 72*7877fdebSMatt Macy| free | bytes | unallocated space | 73*7877fdebSMatt Macy| size | bytes | total pool size | 74*7877fdebSMatt Macy| read_bytes | bytes | bytes read since pool import | 75*7877fdebSMatt Macy| read_errors | count | number of read errors | 76*7877fdebSMatt Macy| read_ops | count | number of read operations | 77*7877fdebSMatt Macy| write_bytes | bytes | bytes written since pool import | 78*7877fdebSMatt Macy| write_errors | count | number of write errors | 79*7877fdebSMatt Macy| write_ops | count | number of write operations | 80*7877fdebSMatt Macy 81*7877fdebSMatt Macy### zpool_scan_stats Description 82*7877fdebSMatt MacyOnce a pool has been scrubbed, resilvered, or rebuilt, the zpool_scan_stats 83*7877fdebSMatt Macycontain information about the status and performance of the operation. 84*7877fdebSMatt MacyOtherwise, the zpool_scan_stats do not exist in the kernel, and therefore 85*7877fdebSMatt Macycannot be reported by this collector. 86*7877fdebSMatt Macy 87*7877fdebSMatt Macy#### zpool_scan_stats Tags 88*7877fdebSMatt Macy 89*7877fdebSMatt Macy| label | description | 90*7877fdebSMatt Macy|---|---| 91*7877fdebSMatt Macy| name | pool name | 92*7877fdebSMatt Macy| function | name of the scan function running or recently completed | 93*7877fdebSMatt Macy| state | scan state, as shown by _zpool status_ | 94*7877fdebSMatt Macy 95*7877fdebSMatt Macy#### zpool_scan_stats Fields 96*7877fdebSMatt Macy 97*7877fdebSMatt Macy| field | units | description | 98*7877fdebSMatt Macy|---|---|---| 99*7877fdebSMatt Macy| errors | count | number of errors encountered by scan | 100*7877fdebSMatt Macy| examined | bytes | total data examined during scan | 101*7877fdebSMatt Macy| to_examine | bytes | prediction of total bytes to be scanned | 102*7877fdebSMatt Macy| pass_examined | bytes | data examined during current scan pass | 103*7877fdebSMatt Macy| issued | bytes | size of I/Os issued to disks | 104*7877fdebSMatt Macy| pass_issued | bytes | size of I/Os issued to disks for current pass | 105*7877fdebSMatt Macy| processed | bytes | data reconstructed during scan | 106*7877fdebSMatt Macy| to_process | bytes | total bytes to be repaired | 107*7877fdebSMatt Macy| rate | bytes/sec | examination rate | 108*7877fdebSMatt Macy| start_ts | epoch timestamp | start timestamp for scan | 109*7877fdebSMatt Macy| pause_ts | epoch timestamp | timestamp for a scan pause request | 110*7877fdebSMatt Macy| end_ts | epoch timestamp | completion timestamp for scan | 111*7877fdebSMatt Macy| paused_t | seconds | elapsed time while paused | 112*7877fdebSMatt Macy| remaining_t | seconds | estimate of time remaining for scan | 113*7877fdebSMatt Macy 114*7877fdebSMatt Macy### zpool_vdev_stats Description 115*7877fdebSMatt MacyThe ZFS I/O (ZIO) scheduler uses five queues to schedule I/Os to each vdev. 116*7877fdebSMatt MacyThese queues are further divided into active and pending states. 117*7877fdebSMatt MacyAn I/O is pending prior to being issued to the vdev. An active 118*7877fdebSMatt MacyI/O has been issued to the vdev. The scheduler and its tunable 119*7877fdebSMatt Macyparameters are described at the 120*7877fdebSMatt Macy[ZFS documentation for ZIO Scheduler] 121*7877fdebSMatt Macy(https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/ZIO%20Scheduler.html) 122*7877fdebSMatt MacyThe ZIO scheduler reports the queue depths as gauges where the value 123*7877fdebSMatt Macyrepresents an instantaneous snapshot of the queue depth at 124*7877fdebSMatt Macythe sample time. Therefore, it is not unusual to see all zeroes 125*7877fdebSMatt Macyfor an idle pool. 126*7877fdebSMatt Macy 127*7877fdebSMatt Macy#### zpool_vdev_stats Tags 128*7877fdebSMatt Macy| label | description | 129*7877fdebSMatt Macy|---|---| 130*7877fdebSMatt Macy| name | pool name | 131*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) | 132*7877fdebSMatt Macy 133*7877fdebSMatt Macy#### zpool_vdev_stats Fields 134*7877fdebSMatt Macy| field | units | description | 135*7877fdebSMatt Macy|---|---|---| 136*7877fdebSMatt Macy| sync_r_active_queue | entries | synchronous read active queue depth | 137*7877fdebSMatt Macy| sync_w_active_queue | entries | synchronous write active queue depth | 138*7877fdebSMatt Macy| async_r_active_queue | entries | asynchronous read active queue depth | 139*7877fdebSMatt Macy| async_w_active_queue | entries | asynchronous write active queue depth | 140*7877fdebSMatt Macy| async_scrub_active_queue | entries | asynchronous scrub active queue depth | 141*7877fdebSMatt Macy| sync_r_pend_queue | entries | synchronous read pending queue depth | 142*7877fdebSMatt Macy| sync_w_pend_queue | entries | synchronous write pending queue depth | 143*7877fdebSMatt Macy| async_r_pend_queue | entries | asynchronous read pending queue depth | 144*7877fdebSMatt Macy| async_w_pend_queue | entries | asynchronous write pending queue depth | 145*7877fdebSMatt Macy| async_scrub_pend_queue | entries | asynchronous scrub pending queue depth | 146*7877fdebSMatt Macy 147*7877fdebSMatt Macy### zpool_latency Histogram 148*7877fdebSMatt MacyZFS tracks the latency of each I/O in the ZIO pipeline. This latency can 149*7877fdebSMatt Macybe useful for observing latency-related issues that are not easily observed 150*7877fdebSMatt Macyusing the averaged latency statistics. 151*7877fdebSMatt Macy 152*7877fdebSMatt MacyThe histogram fields show cumulative values from lowest to highest. 153*7877fdebSMatt MacyThe largest bucket is tagged "le=+Inf", representing the total count 154*7877fdebSMatt Macyof I/Os by type and vdev. 155*7877fdebSMatt Macy 156*7877fdebSMatt Macy#### zpool_latency Histogram Tags 157*7877fdebSMatt Macy| label | description | 158*7877fdebSMatt Macy|---|---| 159*7877fdebSMatt Macy| le | bucket for histogram, latency is less than or equal to bucket value in seconds | 160*7877fdebSMatt Macy| name | pool name | 161*7877fdebSMatt Macy| path | for leaf vdevs, the device path name, otherwise omitted | 162*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) | 163*7877fdebSMatt Macy 164*7877fdebSMatt Macy#### zpool_latency Histogram Fields 165*7877fdebSMatt Macy| field | units | description | 166*7877fdebSMatt Macy|---|---|---| 167*7877fdebSMatt Macy| total_read | operations | read operations of all types | 168*7877fdebSMatt Macy| total_write | operations | write operations of all types | 169*7877fdebSMatt Macy| disk_read | operations | disk read operations | 170*7877fdebSMatt Macy| disk_write | operations | disk write operations | 171*7877fdebSMatt Macy| sync_read | operations | ZIO sync reads | 172*7877fdebSMatt Macy| sync_write | operations | ZIO sync writes | 173*7877fdebSMatt Macy| async_read | operations | ZIO async reads| 174*7877fdebSMatt Macy| async_write | operations | ZIO async writes | 175*7877fdebSMatt Macy| scrub | operations | ZIO scrub/scan reads | 176*7877fdebSMatt Macy| trim | operations | ZIO trim (aka unmap) writes | 177*7877fdebSMatt Macy 178*7877fdebSMatt Macy### zpool_io_size Histogram 179*7877fdebSMatt MacyZFS tracks I/O throughout the ZIO pipeline. The size of each I/O is used 180*7877fdebSMatt Macyto create a histogram of the size by I/O type and vdev. For example, a 181*7877fdebSMatt Macy4KiB write to mirrored pool will show a 4KiB write to the top-level vdev 182*7877fdebSMatt Macy(root) and a 4KiB write to each of the mirror leaf vdevs. 183*7877fdebSMatt Macy 184*7877fdebSMatt MacyThe ZIO pipeline can aggregate I/O operations. For example, a contiguous 185*7877fdebSMatt Macyseries of writes can be aggregated into a single, larger I/O to the leaf 186*7877fdebSMatt Macyvdev. The independent I/O operations reflect the logical operations and 187*7877fdebSMatt Macythe aggregated I/O operations reflect the physical operations. 188*7877fdebSMatt Macy 189*7877fdebSMatt MacyThe histogram fields show cumulative values from lowest to highest. 190*7877fdebSMatt MacyThe largest bucket is tagged "le=+Inf", representing the total count 191*7877fdebSMatt Macyof I/Os by type and vdev. 192*7877fdebSMatt Macy 193*7877fdebSMatt MacyNote: trim I/Os can be larger than 16MiB, but the larger sizes are 194*7877fdebSMatt Macyaccounted in the 16MiB bucket. 195*7877fdebSMatt Macy 196*7877fdebSMatt Macy#### zpool_io_size Histogram Tags 197*7877fdebSMatt Macy| label | description | 198*7877fdebSMatt Macy|---|---| 199*7877fdebSMatt Macy| le | bucket for histogram, I/O size is less than or equal to bucket value in bytes | 200*7877fdebSMatt Macy| name | pool name | 201*7877fdebSMatt Macy| path | for leaf vdevs, the device path name, otherwise omitted | 202*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) | 203*7877fdebSMatt Macy 204*7877fdebSMatt Macy#### zpool_io_size Histogram Fields 205*7877fdebSMatt Macy| field | units | description | 206*7877fdebSMatt Macy|---|---|---| 207*7877fdebSMatt Macy| sync_read_ind | blocks | independent sync reads | 208*7877fdebSMatt Macy| sync_write_ind | blocks | independent sync writes | 209*7877fdebSMatt Macy| async_read_ind | blocks | independent async reads | 210*7877fdebSMatt Macy| async_write_ind | blocks | independent async writes | 211*7877fdebSMatt Macy| scrub_read_ind | blocks | independent scrub/scan reads | 212*7877fdebSMatt Macy| trim_write_ind | blocks | independent trim (aka unmap) writes | 213*7877fdebSMatt Macy| sync_read_agg | blocks | aggregated sync reads | 214*7877fdebSMatt Macy| sync_write_agg | blocks | aggregated sync writes | 215*7877fdebSMatt Macy| async_read_agg | blocks | aggregated async reads | 216*7877fdebSMatt Macy| async_write_agg | blocks | aggregated async writes | 217*7877fdebSMatt Macy| scrub_read_agg | blocks | aggregated scrub/scan reads | 218*7877fdebSMatt Macy| trim_write_agg | blocks | aggregated trim (aka unmap) writes | 219*7877fdebSMatt Macy 220*7877fdebSMatt Macy#### About unsigned integers 221*7877fdebSMatt MacyTelegraf v1.6.2 and later support unsigned 64-bit integers which more 222*7877fdebSMatt Macyclosely matches the uint64_t values used by ZFS. By default, zpool_influxdb 223*7877fdebSMatt Macyuses ZFS' uint64_t values and influxdb line protocol unsigned integer type. 224*7877fdebSMatt MacyIf you are using old telegraf or influxdb where unsigned integers are not 225*7877fdebSMatt Macyavailable, use the `--signed-int` option. 226*7877fdebSMatt Macy 227*7877fdebSMatt Macy## Using _zpool_influxdb_ 228*7877fdebSMatt Macy 229*7877fdebSMatt MacyThe simplest method is to use the execd input agent in telegraf. For older 230*7877fdebSMatt Macyversions of telegraf which lack execd, the exec input agent can be used. 231*7877fdebSMatt MacyFor convenience, one of the sample config files below can be placed in the 232*7877fdebSMatt Macytelegraf config-directory (often /etc/telegraf/telegraf.d). Telegraf can 233*7877fdebSMatt Macybe restarted to read the config-directory files. 234*7877fdebSMatt Macy 235*7877fdebSMatt Macy### Example telegraf execd configuration 236*7877fdebSMatt Macy```toml 237*7877fdebSMatt Macy# # Read metrics from zpool_influxdb 238*7877fdebSMatt Macy[[inputs.execd]] 239*7877fdebSMatt Macy# ## default installation location for zpool_influxdb command 240*7877fdebSMatt Macy command = ["/usr/libexec/zfs/zpool_influxdb", "--execd"] 241*7877fdebSMatt Macy 242*7877fdebSMatt Macy ## Define how the process is signaled on each collection interval. 243*7877fdebSMatt Macy ## Valid values are: 244*7877fdebSMatt Macy ## "none" : Do not signal anything. (Recommended for service inputs) 245*7877fdebSMatt Macy ## The process must output metrics by itself. 246*7877fdebSMatt Macy ## "STDIN" : Send a newline on STDIN. (Recommended for gather inputs) 247*7877fdebSMatt Macy ## "SIGHUP" : Send a HUP signal. Not available on Windows. (not recommended) 248*7877fdebSMatt Macy ## "SIGUSR1" : Send a USR1 signal. Not available on Windows. 249*7877fdebSMatt Macy ## "SIGUSR2" : Send a USR2 signal. Not available on Windows. 250*7877fdebSMatt Macy signal = "STDIN" 251*7877fdebSMatt Macy 252*7877fdebSMatt Macy ## Delay before the process is restarted after an unexpected termination 253*7877fdebSMatt Macy restart_delay = "10s" 254*7877fdebSMatt Macy 255*7877fdebSMatt Macy ## Data format to consume. 256*7877fdebSMatt Macy ## Each data format has its own unique set of configuration options, read 257*7877fdebSMatt Macy ## more about them here: 258*7877fdebSMatt Macy ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md 259*7877fdebSMatt Macy data_format = "influx" 260*7877fdebSMatt Macy``` 261*7877fdebSMatt Macy 262*7877fdebSMatt Macy### Example telegraf exec configuration 263*7877fdebSMatt Macy```toml 264*7877fdebSMatt Macy# # Read metrics from zpool_influxdb 265*7877fdebSMatt Macy[[inputs.exec]] 266*7877fdebSMatt Macy# ## default installation location for zpool_influxdb command 267*7877fdebSMatt Macy commands = ["/usr/libexec/zfs/zpool_influxdb"] 268*7877fdebSMatt Macy data_format = "influx" 269*7877fdebSMatt Macy``` 270*7877fdebSMatt Macy 271*7877fdebSMatt Macy## Caveat Emptor 272*7877fdebSMatt Macy* Like the _zpool_ command, _zpool_influxdb_ takes a reader 273*7877fdebSMatt Macy lock on spa_config for each imported pool. If this lock blocks, 274*7877fdebSMatt Macy then the command will also block indefinitely and might be 275*7877fdebSMatt Macy unkillable. This is not a normal condition, but can occur if 276*7877fdebSMatt Macy there are bugs in the kernel modules. 277*7877fdebSMatt Macy For this reason, care should be taken: 278*7877fdebSMatt Macy * avoid spawning many of these commands hoping that one might 279*7877fdebSMatt Macy finish 280*7877fdebSMatt Macy * avoid frequent updates or short sample time 281*7877fdebSMatt Macy intervals, because the locks can interfere with the performance 282*7877fdebSMatt Macy of other instances of _zpool_ or _zpool_influxdb_ 283*7877fdebSMatt Macy 284*7877fdebSMatt Macy## Other collectors 285*7877fdebSMatt MacyThere are a few other collectors for zpool statistics roaming around 286*7877fdebSMatt Macythe Internet. Many attempt to screen-scrape `zpool` output in various 287*7877fdebSMatt Macyways. The screen-scrape method works poorly for `zpool` output because 288*7877fdebSMatt Macyof its human-friendly nature. Also, they suffer from the same caveats 289*7877fdebSMatt Macyas this implementation. This implementation is optimized for directly 290*7877fdebSMatt Macycollecting the metrics and is much more efficient than the screen-scrapers. 291*7877fdebSMatt Macy 292*7877fdebSMatt Macy## Feedback Encouraged 293*7877fdebSMatt MacyPull requests and issues are greatly appreciated at 294*7877fdebSMatt Macyhttps://github.com/openzfs/zfs 295