xref: /freebsd/sys/contrib/openzfs/cmd/zpool_influxdb/README.md (revision 7877fdebeeb35fad1cbbafce22598b1bdf97c786)
1*7877fdebSMatt Macy# Influxdb Metrics for ZFS Pools
2*7877fdebSMatt MacyThe _zpool_influxdb_ program produces
3*7877fdebSMatt Macy[influxdb](https://github.com/influxdata/influxdb) line protocol
4*7877fdebSMatt Macycompatible metrics from zpools. In the UNIX tradition, _zpool_influxdb_
5*7877fdebSMatt Macydoes one thing: read statistics from a pool and print them to
6*7877fdebSMatt Macystdout. In many ways, this is a metrics-friendly output of
7*7877fdebSMatt Macystatistics normally observed via the `zpool` command.
8*7877fdebSMatt Macy
9*7877fdebSMatt Macy## Usage
10*7877fdebSMatt MacyWhen run without arguments, _zpool_influxdb_ runs once, reading data
11*7877fdebSMatt Macyfrom all imported pools, and prints to stdout.
12*7877fdebSMatt Macy```shell
13*7877fdebSMatt Macyzpool_influxdb [options] [poolname]
14*7877fdebSMatt Macy```
15*7877fdebSMatt MacyIf no poolname is specified, then all pools are sampled.
16*7877fdebSMatt Macy
17*7877fdebSMatt Macy| option | short option | description |
18*7877fdebSMatt Macy|---|---|---|
19*7877fdebSMatt Macy| --execd | -e | For use with telegraf's `execd` plugin. When [enter] is pressed, the pools are sampled. To exit, use [ctrl+D] |
20*7877fdebSMatt Macy| --no-histogram | -n | Do not print histogram information |
21*7877fdebSMatt Macy| --signed-int | -i | Use signed integer data type (default=unsigned) |
22*7877fdebSMatt Macy| --sum-histogram-buckets | -s | Sum histogram bucket values |
23*7877fdebSMatt Macy| --tags key=value[,key=value...] | -t | Add tags to data points. No tag sanity checking is performed. |
24*7877fdebSMatt Macy| --help | -h | Print a short usage message |
25*7877fdebSMatt Macy
26*7877fdebSMatt Macy#### Histogram Bucket Values
27*7877fdebSMatt MacyThe histogram data collected by ZFS is stored as independent bucket values.
28*7877fdebSMatt MacyThis works well out-of-the-box with an influxdb data source and grafana's
29*7877fdebSMatt Macyheatmap visualization. The influxdb query for a grafana heatmap
30*7877fdebSMatt Macyvisualization looks like:
31*7877fdebSMatt Macy```
32*7877fdebSMatt Macyfield(disk_read) last() non_negative_derivative(1s)
33*7877fdebSMatt Macy```
34*7877fdebSMatt Macy
35*7877fdebSMatt MacyAnother method for storing histogram data sums the values for lower-value
36*7877fdebSMatt Macybuckets. For example, a latency bucket tagged "le=10" includes the values
37*7877fdebSMatt Macyin the bucket "le=1".
38*7877fdebSMatt MacyThis method is often used for prometheus histograms.
39*7877fdebSMatt MacyThe `zpool_influxdb --sum-histogram-buckets` option presents the data from ZFS
40*7877fdebSMatt Macyas summed values.
41*7877fdebSMatt Macy
42*7877fdebSMatt Macy## Measurements
43*7877fdebSMatt MacyThe following measurements are collected:
44*7877fdebSMatt Macy
45*7877fdebSMatt Macy| measurement | description | zpool equivalent |
46*7877fdebSMatt Macy|---|---|---|
47*7877fdebSMatt Macy| zpool_stats | general size and data | zpool list |
48*7877fdebSMatt Macy| zpool_scan_stats | scrub, rebuild, and resilver statistics (omitted if no scan has been requested) | zpool status |
49*7877fdebSMatt Macy| zpool_vdev_stats | per-vdev statistics | zpool iostat -q |
50*7877fdebSMatt Macy| zpool_io_size | per-vdev I/O size histogram | zpool iostat -r |
51*7877fdebSMatt Macy| zpool_latency | per-vdev I/O latency histogram | zpool iostat -w |
52*7877fdebSMatt Macy| zpool_vdev_queue | per-vdev instantaneous queue depth | zpool iostat -q |
53*7877fdebSMatt Macy
54*7877fdebSMatt Macy### zpool_stats Description
55*7877fdebSMatt Macyzpool_stats contains top-level summary statistics for the pool.
56*7877fdebSMatt MacyPerformance counters measure the I/Os to the pool's devices.
57*7877fdebSMatt Macy
58*7877fdebSMatt Macy#### zpool_stats Tags
59*7877fdebSMatt Macy
60*7877fdebSMatt Macy| label | description |
61*7877fdebSMatt Macy|---|---|
62*7877fdebSMatt Macy| name | pool name |
63*7877fdebSMatt Macy| path | for leaf vdevs, the pathname |
64*7877fdebSMatt Macy| state | pool state, as shown by _zpool status_ |
65*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) |
66*7877fdebSMatt Macy
67*7877fdebSMatt Macy#### zpool_stats Fields
68*7877fdebSMatt Macy
69*7877fdebSMatt Macy| field | units | description |
70*7877fdebSMatt Macy|---|---|---|
71*7877fdebSMatt Macy| alloc | bytes | allocated space |
72*7877fdebSMatt Macy| free | bytes | unallocated space |
73*7877fdebSMatt Macy| size | bytes | total pool size |
74*7877fdebSMatt Macy| read_bytes | bytes | bytes read since pool import |
75*7877fdebSMatt Macy| read_errors | count | number of read errors |
76*7877fdebSMatt Macy| read_ops | count | number of read operations |
77*7877fdebSMatt Macy| write_bytes | bytes | bytes written since pool import |
78*7877fdebSMatt Macy| write_errors | count | number of write errors |
79*7877fdebSMatt Macy| write_ops | count | number of write operations |
80*7877fdebSMatt Macy
81*7877fdebSMatt Macy### zpool_scan_stats Description
82*7877fdebSMatt MacyOnce a pool has been scrubbed, resilvered, or rebuilt, the zpool_scan_stats
83*7877fdebSMatt Macycontain information about the status and performance of the operation.
84*7877fdebSMatt MacyOtherwise, the zpool_scan_stats do not exist in the kernel, and therefore
85*7877fdebSMatt Macycannot be reported by this collector.
86*7877fdebSMatt Macy
87*7877fdebSMatt Macy#### zpool_scan_stats Tags
88*7877fdebSMatt Macy
89*7877fdebSMatt Macy| label | description |
90*7877fdebSMatt Macy|---|---|
91*7877fdebSMatt Macy| name | pool name |
92*7877fdebSMatt Macy| function | name of the scan function running or recently completed |
93*7877fdebSMatt Macy| state | scan state, as shown by _zpool status_ |
94*7877fdebSMatt Macy
95*7877fdebSMatt Macy#### zpool_scan_stats Fields
96*7877fdebSMatt Macy
97*7877fdebSMatt Macy| field | units | description |
98*7877fdebSMatt Macy|---|---|---|
99*7877fdebSMatt Macy| errors | count | number of errors encountered by scan |
100*7877fdebSMatt Macy| examined | bytes | total data examined during scan |
101*7877fdebSMatt Macy| to_examine | bytes | prediction of total bytes to be scanned |
102*7877fdebSMatt Macy| pass_examined | bytes | data examined during current scan pass |
103*7877fdebSMatt Macy| issued | bytes | size of I/Os issued to disks |
104*7877fdebSMatt Macy| pass_issued | bytes | size of I/Os issued to disks for current pass |
105*7877fdebSMatt Macy| processed | bytes | data reconstructed during scan |
106*7877fdebSMatt Macy| to_process | bytes | total bytes to be repaired |
107*7877fdebSMatt Macy| rate | bytes/sec | examination rate |
108*7877fdebSMatt Macy| start_ts | epoch timestamp | start timestamp for scan |
109*7877fdebSMatt Macy| pause_ts | epoch timestamp | timestamp for a scan pause request |
110*7877fdebSMatt Macy| end_ts | epoch timestamp | completion timestamp for scan |
111*7877fdebSMatt Macy| paused_t | seconds | elapsed time while paused |
112*7877fdebSMatt Macy| remaining_t | seconds | estimate of time remaining for scan |
113*7877fdebSMatt Macy
114*7877fdebSMatt Macy### zpool_vdev_stats Description
115*7877fdebSMatt MacyThe ZFS I/O (ZIO) scheduler uses five queues to schedule I/Os to each vdev.
116*7877fdebSMatt MacyThese queues are further divided into active and pending states.
117*7877fdebSMatt MacyAn I/O is pending prior to being issued to the vdev. An active
118*7877fdebSMatt MacyI/O has been issued to the vdev. The scheduler and its tunable
119*7877fdebSMatt Macyparameters are described at the
120*7877fdebSMatt Macy[ZFS documentation for ZIO Scheduler]
121*7877fdebSMatt Macy(https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/ZIO%20Scheduler.html)
122*7877fdebSMatt MacyThe ZIO scheduler reports the queue depths as gauges where the value
123*7877fdebSMatt Macyrepresents an instantaneous snapshot of the queue depth at
124*7877fdebSMatt Macythe sample time. Therefore, it is not unusual to see all zeroes
125*7877fdebSMatt Macyfor an idle pool.
126*7877fdebSMatt Macy
127*7877fdebSMatt Macy#### zpool_vdev_stats Tags
128*7877fdebSMatt Macy| label | description |
129*7877fdebSMatt Macy|---|---|
130*7877fdebSMatt Macy| name | pool name |
131*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) |
132*7877fdebSMatt Macy
133*7877fdebSMatt Macy#### zpool_vdev_stats Fields
134*7877fdebSMatt Macy| field | units | description |
135*7877fdebSMatt Macy|---|---|---|
136*7877fdebSMatt Macy| sync_r_active_queue | entries | synchronous read active queue depth |
137*7877fdebSMatt Macy| sync_w_active_queue | entries | synchronous write active queue depth |
138*7877fdebSMatt Macy| async_r_active_queue | entries | asynchronous read active queue depth |
139*7877fdebSMatt Macy| async_w_active_queue | entries | asynchronous write active queue depth |
140*7877fdebSMatt Macy| async_scrub_active_queue | entries | asynchronous scrub active queue depth |
141*7877fdebSMatt Macy| sync_r_pend_queue | entries | synchronous read pending queue depth |
142*7877fdebSMatt Macy| sync_w_pend_queue | entries | synchronous write pending queue depth |
143*7877fdebSMatt Macy| async_r_pend_queue | entries | asynchronous read pending queue depth |
144*7877fdebSMatt Macy| async_w_pend_queue | entries | asynchronous write pending queue depth |
145*7877fdebSMatt Macy| async_scrub_pend_queue | entries | asynchronous scrub pending queue depth |
146*7877fdebSMatt Macy
147*7877fdebSMatt Macy### zpool_latency Histogram
148*7877fdebSMatt MacyZFS tracks the latency of each I/O in the ZIO pipeline. This latency can
149*7877fdebSMatt Macybe useful for observing latency-related issues that are not easily observed
150*7877fdebSMatt Macyusing the averaged latency statistics.
151*7877fdebSMatt Macy
152*7877fdebSMatt MacyThe histogram fields show cumulative values from lowest to highest.
153*7877fdebSMatt MacyThe largest bucket is tagged "le=+Inf", representing the total count
154*7877fdebSMatt Macyof I/Os by type and vdev.
155*7877fdebSMatt Macy
156*7877fdebSMatt Macy#### zpool_latency Histogram Tags
157*7877fdebSMatt Macy| label | description |
158*7877fdebSMatt Macy|---|---|
159*7877fdebSMatt Macy| le | bucket for histogram, latency is less than or equal to bucket value in seconds |
160*7877fdebSMatt Macy| name | pool name |
161*7877fdebSMatt Macy| path | for leaf vdevs, the device path name, otherwise omitted |
162*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) |
163*7877fdebSMatt Macy
164*7877fdebSMatt Macy#### zpool_latency Histogram Fields
165*7877fdebSMatt Macy| field | units | description |
166*7877fdebSMatt Macy|---|---|---|
167*7877fdebSMatt Macy| total_read | operations | read operations of all types |
168*7877fdebSMatt Macy| total_write | operations | write operations of all types |
169*7877fdebSMatt Macy| disk_read | operations | disk read operations |
170*7877fdebSMatt Macy| disk_write | operations | disk write operations |
171*7877fdebSMatt Macy| sync_read | operations | ZIO sync reads |
172*7877fdebSMatt Macy| sync_write | operations | ZIO sync writes |
173*7877fdebSMatt Macy| async_read | operations | ZIO async reads|
174*7877fdebSMatt Macy| async_write | operations | ZIO async writes |
175*7877fdebSMatt Macy| scrub | operations | ZIO scrub/scan reads |
176*7877fdebSMatt Macy| trim | operations | ZIO trim (aka unmap) writes |
177*7877fdebSMatt Macy
178*7877fdebSMatt Macy### zpool_io_size Histogram
179*7877fdebSMatt MacyZFS tracks I/O throughout the ZIO pipeline. The size of each I/O is used
180*7877fdebSMatt Macyto create a histogram of the size by I/O type and vdev. For example, a
181*7877fdebSMatt Macy4KiB write to mirrored pool will show a 4KiB write to the top-level vdev
182*7877fdebSMatt Macy(root) and a 4KiB write to each of the mirror leaf vdevs.
183*7877fdebSMatt Macy
184*7877fdebSMatt MacyThe ZIO pipeline can aggregate I/O operations. For example, a contiguous
185*7877fdebSMatt Macyseries of writes can be aggregated into a single, larger I/O to the leaf
186*7877fdebSMatt Macyvdev. The independent I/O operations reflect the logical operations and
187*7877fdebSMatt Macythe aggregated I/O operations reflect the physical operations.
188*7877fdebSMatt Macy
189*7877fdebSMatt MacyThe histogram fields show cumulative values from lowest to highest.
190*7877fdebSMatt MacyThe largest bucket is tagged "le=+Inf", representing the total count
191*7877fdebSMatt Macyof I/Os by type and vdev.
192*7877fdebSMatt Macy
193*7877fdebSMatt MacyNote: trim I/Os can be larger than 16MiB, but the larger sizes are
194*7877fdebSMatt Macyaccounted in the 16MiB bucket.
195*7877fdebSMatt Macy
196*7877fdebSMatt Macy#### zpool_io_size Histogram Tags
197*7877fdebSMatt Macy| label | description |
198*7877fdebSMatt Macy|---|---|
199*7877fdebSMatt Macy| le | bucket for histogram, I/O size is less than or equal to bucket value in bytes |
200*7877fdebSMatt Macy| name | pool name |
201*7877fdebSMatt Macy| path | for leaf vdevs, the device path name, otherwise omitted |
202*7877fdebSMatt Macy| vdev | vdev name (root = entire pool) |
203*7877fdebSMatt Macy
204*7877fdebSMatt Macy#### zpool_io_size Histogram Fields
205*7877fdebSMatt Macy| field | units | description |
206*7877fdebSMatt Macy|---|---|---|
207*7877fdebSMatt Macy| sync_read_ind | blocks | independent sync reads |
208*7877fdebSMatt Macy| sync_write_ind | blocks | independent sync writes |
209*7877fdebSMatt Macy| async_read_ind | blocks | independent async reads |
210*7877fdebSMatt Macy| async_write_ind | blocks | independent async writes |
211*7877fdebSMatt Macy| scrub_read_ind | blocks | independent scrub/scan reads |
212*7877fdebSMatt Macy| trim_write_ind | blocks | independent trim (aka unmap) writes |
213*7877fdebSMatt Macy| sync_read_agg | blocks | aggregated sync reads |
214*7877fdebSMatt Macy| sync_write_agg | blocks | aggregated sync writes |
215*7877fdebSMatt Macy| async_read_agg | blocks | aggregated async reads |
216*7877fdebSMatt Macy| async_write_agg | blocks | aggregated async writes |
217*7877fdebSMatt Macy| scrub_read_agg | blocks | aggregated scrub/scan reads |
218*7877fdebSMatt Macy| trim_write_agg | blocks | aggregated trim (aka unmap) writes |
219*7877fdebSMatt Macy
220*7877fdebSMatt Macy#### About unsigned integers
221*7877fdebSMatt MacyTelegraf v1.6.2 and later support unsigned 64-bit integers which more
222*7877fdebSMatt Macyclosely matches the uint64_t values used by ZFS. By default, zpool_influxdb
223*7877fdebSMatt Macyuses ZFS' uint64_t values and influxdb line protocol unsigned integer type.
224*7877fdebSMatt MacyIf you are using old telegraf or influxdb where unsigned integers are not
225*7877fdebSMatt Macyavailable, use the `--signed-int` option.
226*7877fdebSMatt Macy
227*7877fdebSMatt Macy## Using _zpool_influxdb_
228*7877fdebSMatt Macy
229*7877fdebSMatt MacyThe simplest method is to use the execd input agent in telegraf. For older
230*7877fdebSMatt Macyversions of telegraf which lack execd, the exec input agent can be used.
231*7877fdebSMatt MacyFor convenience, one of the sample config files below can be placed in the
232*7877fdebSMatt Macytelegraf config-directory (often /etc/telegraf/telegraf.d). Telegraf can
233*7877fdebSMatt Macybe restarted to read the config-directory files.
234*7877fdebSMatt Macy
235*7877fdebSMatt Macy### Example telegraf execd configuration
236*7877fdebSMatt Macy```toml
237*7877fdebSMatt Macy# # Read metrics from zpool_influxdb
238*7877fdebSMatt Macy[[inputs.execd]]
239*7877fdebSMatt Macy#   ## default installation location for zpool_influxdb command
240*7877fdebSMatt Macy  command = ["/usr/libexec/zfs/zpool_influxdb", "--execd"]
241*7877fdebSMatt Macy
242*7877fdebSMatt Macy    ## Define how the process is signaled on each collection interval.
243*7877fdebSMatt Macy    ## Valid values are:
244*7877fdebSMatt Macy    ##   "none"    : Do not signal anything. (Recommended for service inputs)
245*7877fdebSMatt Macy    ##               The process must output metrics by itself.
246*7877fdebSMatt Macy    ##   "STDIN"   : Send a newline on STDIN. (Recommended for gather inputs)
247*7877fdebSMatt Macy    ##   "SIGHUP"  : Send a HUP signal. Not available on Windows. (not recommended)
248*7877fdebSMatt Macy    ##   "SIGUSR1" : Send a USR1 signal. Not available on Windows.
249*7877fdebSMatt Macy    ##   "SIGUSR2" : Send a USR2 signal. Not available on Windows.
250*7877fdebSMatt Macy  signal = "STDIN"
251*7877fdebSMatt Macy
252*7877fdebSMatt Macy  ## Delay before the process is restarted after an unexpected termination
253*7877fdebSMatt Macy  restart_delay = "10s"
254*7877fdebSMatt Macy
255*7877fdebSMatt Macy    ## Data format to consume.
256*7877fdebSMatt Macy    ## Each data format has its own unique set of configuration options, read
257*7877fdebSMatt Macy    ## more about them here:
258*7877fdebSMatt Macy    ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
259*7877fdebSMatt Macy  data_format = "influx"
260*7877fdebSMatt Macy```
261*7877fdebSMatt Macy
262*7877fdebSMatt Macy### Example telegraf exec configuration
263*7877fdebSMatt Macy```toml
264*7877fdebSMatt Macy# # Read metrics from zpool_influxdb
265*7877fdebSMatt Macy[[inputs.exec]]
266*7877fdebSMatt Macy#   ## default installation location for zpool_influxdb command
267*7877fdebSMatt Macy  commands = ["/usr/libexec/zfs/zpool_influxdb"]
268*7877fdebSMatt Macy  data_format = "influx"
269*7877fdebSMatt Macy```
270*7877fdebSMatt Macy
271*7877fdebSMatt Macy## Caveat Emptor
272*7877fdebSMatt Macy* Like the _zpool_ command, _zpool_influxdb_ takes a reader
273*7877fdebSMatt Macy  lock on spa_config for each imported pool. If this lock blocks,
274*7877fdebSMatt Macy  then the command will also block indefinitely and might be
275*7877fdebSMatt Macy  unkillable. This is not a normal condition, but can occur if
276*7877fdebSMatt Macy  there are bugs in the kernel modules.
277*7877fdebSMatt Macy  For this reason, care should be taken:
278*7877fdebSMatt Macy  * avoid spawning many of these commands hoping that one might
279*7877fdebSMatt Macy    finish
280*7877fdebSMatt Macy  * avoid frequent updates or short sample time
281*7877fdebSMatt Macy    intervals, because the locks can interfere with the performance
282*7877fdebSMatt Macy    of other instances of _zpool_ or _zpool_influxdb_
283*7877fdebSMatt Macy
284*7877fdebSMatt Macy## Other collectors
285*7877fdebSMatt MacyThere are a few other collectors for zpool statistics roaming around
286*7877fdebSMatt Macythe Internet. Many attempt to screen-scrape `zpool` output in various
287*7877fdebSMatt Macyways. The screen-scrape method works poorly for `zpool` output because
288*7877fdebSMatt Macyof its human-friendly nature. Also, they suffer from the same caveats
289*7877fdebSMatt Macyas this implementation. This implementation is optimized for directly
290*7877fdebSMatt Macycollecting the metrics and is much more efficient than the screen-scrapers.
291*7877fdebSMatt Macy
292*7877fdebSMatt Macy## Feedback Encouraged
293*7877fdebSMatt MacyPull requests and issues are greatly appreciated at
294*7877fdebSMatt Macyhttps://github.com/openzfs/zfs
295