xref: /freebsd/usr.sbin/watchdogd/watchdogd.8 (revision db33c6f3ae9d1231087710068ee4ea5398aacca7)
1.\" Copyright (c) 2013  iXsystems.com,
2.\"                     author: Alfred Perlstein <alfred@freebsd.org>
3.\" Copyright (c) 2004  Poul-Henning Kamp <phk@FreeBSD.org>
4.\" Copyright (c) 2003  Sean M. Kelly <smkelly@FreeBSD.org>
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.Dd May 11, 2015
29.Dt WATCHDOGD 8
30.Os
31.Sh NAME
32.Nm watchdogd
33.Nd watchdog daemon
34.Sh SYNOPSIS
35.Nm
36.Op Fl dnSw
37.Op Fl -debug
38.Op Fl -softtimeout
39.Op Fl -softtimeout-action Ar action
40.Op Fl -pretimeout Ar timeout
41.Op Fl -pretimeout-action Ar action
42.Op Fl e Ar cmd
43.Op Fl I Ar file
44.Op Fl s Ar sleep
45.Op Fl t Ar timeout
46.Op Fl T Ar script_timeout
47.Op Fl x Ar exit_timeout
48.Sh DESCRIPTION
49The
50.Nm
51utility interfaces with the kernel's watchdog facility to ensure
52that the system is in a working state.
53If
54.Nm
55is unable to interface with the kernel over a specific timeout,
56the kernel will take actions to assist in debugging or restarting the computer.
57.Pp
58If
59.Fl e Ar cmd
60is specified,
61.Nm
62will attempt to execute this command with
63.Xr system 3 ,
64and only if the command returns with a zero exit code will the
65watchdog be reset.
66If
67.Fl e Ar cmd
68is not specified, the daemon will perform a trivial file system
69check instead.
70.Pp
71The
72.Fl n
73argument 'dry-run' will cause watchdog not to arm the system watchdog and
74instead only run the watchdog function and report on failures.
75This is useful for developing new watchdogd scripts as the system will not
76reboot if there are problems with the script.
77.Pp
78The
79.Fl s Ar sleep
80argument can be used to control the sleep period between each execution
81of the check and defaults to 10 seconds.
82.Pp
83The
84.Fl t Ar timeout
85specifies the desired timeout period in seconds.
86The default timeout is 128 seconds.
87.Pp
88One possible circumstance which will cause a watchdog timeout is an interrupt
89storm.
90If this occurs,
91.Nm
92will no longer execute and thus the kernel's watchdog routines will take
93action after a configurable timeout.
94.Pp
95The
96.Fl T Ar script_timeout
97specifies the threshold (in seconds) at which the watchdogd will complain
98that its script has run for too long.
99If unset
100.Ar script_timeout
101defaults to the value specified by the
102.Fl s Ar sleep
103option.
104.Pp
105The
106.Fl x Ar exit_timeout
107argument is the timeout period (in seconds) to leave in effect when the
108program exits.
109Using
110.Fl x
111with a non-zero value protects against lockup during a reboot by
112triggering a hardware reset if the software reboot doesn't complete
113before the given timeout expires.
114.Pp
115Upon receiving the
116.Dv SIGTERM
117or
118.Dv SIGINT
119signals,
120.Nm
121will terminate, after first instructing the kernel to either disable the
122timeout or reset it to the value given by
123.Fl x Ar exit_timeout .
124.Pp
125The
126.Nm
127utility recognizes the following runtime options:
128.Bl -tag -width 30m
129.It Fl I Ar file
130Write the process ID of the
131.Nm
132utility in the specified file.
133.It Fl d Fl -debug
134Do not fork.
135When this option is specified,
136.Nm
137will not fork into the background at startup.
138.It Fl S
139Do not send a message to the system logger when the watchdog command takes
140longer than expected to execute.
141The default behaviour is to log a warning via the system logger with the
142LOG_DAEMON facility, and to output a warning to standard error.
143.It Fl w
144Complain when the watchdog script takes too long.
145This flag will cause watchdogd to complain when the amount of time to
146execute the watchdog script exceeds the threshold of 'sleep' option.
147.It Fl -pretimeout Ar timeout
148Set a "pretimeout" watchdog.
149At "timeout" seconds before the watchdog will fire attempt an action.
150The action is set by the --pretimeout-action flag.
151The default is just to log a message (WD_SOFT_LOG) via
152.Xr log 9 .
153.It Fl -pretimeout-action Ar action
154Set the timeout action for the pretimeout.
155See the section
156.Sx Timeout Actions .
157.It Fl -softtimeout
158Instead of arming the various hardware watchdogs, only use a basic software
159watchdog.
160The default action is just to
161.Xr log 9
162a message (WD_SOFT_LOG).
163.It Fl -softtimeout-action Ar action
164Set the timeout action for the softtimeout.
165See the section
166.Sx Timeout Actions .
167.El
168.Sh Timeout Actions
169The following timeout actions are available via the
170.Fl -pretimeout-action
171and
172.Fl -softtimeout-action
173flags:
174.Bl -tag -width ".Ar printf  "
175.It Ar panic
176Call
177.Xr panic 9
178when the timeout is reached.
179.It Ar ddb
180Enter the kernel debugger via
181.Xr kdb_enter 9
182when the timeout is reached.
183.It Ar log
184Log a message using
185.Xr log 9
186when the timeout is reached.
187.It Ar printf
188call the kernel
189.Xr printf 9
190to display a message to the console and
191.Xr dmesg 8
192buffer.
193.El
194.Pp
195Actions can be combined in a comma separated list as so:
196.Ar log,printf
197which would both
198.Xr printf 9
199and
200.Xr log 9
201which will send messages both to
202.Xr dmesg 8
203and the kernel
204.Xr log 4
205device for
206.Xr syslogd 8 .
207.Sh FILES
208.Bl -tag -width ".Pa /var/run/watchdogd.pid" -compact
209.It Pa /var/run/watchdogd.pid
210.El
211.Sh EXAMPLES
212.Ss Debugging watchdogd and/or your watchdog script.
213This is a useful recipe for debugging
214.Nm
215and your watchdog script.
216.Pp
217(Note that ^C works oddly because
218.Nm
219calls
220.Xr system 3
221so the
222first ^C will terminate the "sleep" command.)
223.Pp
224Explanation of options used:
225.Bl -enum -offset indent -compact
226.It
227Set Debug on (--debug)
228.It
229Set the watchdog to trip at 30 seconds. (-t 30)
230.It
231Use of a softtimeout:
232.Bl -enum -offset indent -compact -nested
233.It
234Use a softtimeout (do not arm the hardware watchdog).
235(--softtimeout)
236.It
237Set the softtimeout action to do both kernel
238.Xr printf 9
239and
240.Xr log 9
241when it trips.
242(--softtimeout-action log,printf)
243.El
244.It
245Use of a pre-timeout:
246.Bl -enum -offset indent -compact -nested
247.It
248Set a pre-timeout of 15 seconds (this will later trigger a panic/dump).
249(--pretimeout 15)
250.It
251Set the action to also kernel
252.Xr printf 9
253and
254.Xr log 9
255when it trips.
256(--pretimeout-action log,printf)
257.El
258.It
259Use of a script:
260.Bl -enum -offset indent -compact -nested
261.It
262Run "sleep 60" as a shell command that acts as the watchdog (-e 'sleep 60')
263.It
264Warn us when the script takes longer than 1 second to run (-w)
265.El
266.El
267.Bd -literal
268watchdogd --debug -t 30 \\
269  --softtimeout --softtimeout-action log,printf \\
270  --pretimeout 15 --pretimeout-action log,printf \\
271  -e 'sleep 60' -w
272.Ed
273.Ss Production use of example
274.Bl -enum -offset indent -compact
275.It
276Set hard timeout to 120 seconds (-t 120)
277.It
278Set a panic to happen at 60 seconds (to trigger a
279.Xr crash 8
280for dump analysis):
281.Bl -enum -offset indent -compact -nested
282.It
283Use of pre-timeout (--pretimeout 60)
284.It
285Specify pre-timeout action (--pretimeout-action log,printf,panic )
286.El
287.It
288Use of a script:
289.Bl -enum -offset indent -compact -nested
290.It
291Run your script (-e '/path/to/your/script 60')
292.It
293Log if your script takes a longer than 15 seconds to run time. (-w -T 15)
294.El
295.El
296.Bd -literal
297watchdogd  -t 120 \\
298  --pretimeout 60 --pretimeout-action log,printf,panic \\
299  -e '/path/to/your/script 60' -w -T 15
300.Ed
301.Sh SEE ALSO
302.Xr watchdog 4 ,
303.Xr watchdog 8 ,
304.Xr watchdog 9
305.Sh HISTORY
306The
307.Nm
308utility appeared in
309.Fx 5.1 .
310.Sh AUTHORS
311.An -nosplit
312The
313.Nm
314utility and manual page were written by
315.An Sean Kelly Aq Mt smkelly@FreeBSD.org
316and
317.An Poul-Henning Kamp Aq Mt phk@FreeBSD.org .
318.Pp
319Some contributions made by
320.An Jeff Roberson Aq Mt jeff@FreeBSD.org .
321.Pp
322The pretimeout and softtimeout action system was added by
323.An Alfred Perlstein Aq Mt alfred@freebsd.org .
324