xref: /illumos-gate/usr/src/man/man4i/mhd.4i (revision b93865c3d90e9b0d73e338c9abb3293c35c571a8)
1.\" Copyright (c) 2005 Sun Microsystems, Inc.  All Rights Reserved.
2.\" Copyright (c) 2017, Joyent, Inc.
3.\" The contents of this file are subject to the terms of the
4.\" Common Development and Distribution License (the "License").
5.\" You may not use this file except in compliance with the License.
6.\"
7.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
8.\" or http://www.opensolaris.org/os/licensing.
9.\" See the License for the specific language governing permissions
10.\" and limitations under the License.
11.\"
12.\" When distributing Covered Code, include this CDDL HEADER in each
13.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
14.\" If applicable, add the following below this CDDL HEADER, with the
15.\" fields enclosed by brackets "[]" replaced with your own identifying
16.\" information: Portions Copyright [yyyy] [name of copyright owner]
17.Dd March 13, 2022
18.Dt MHD 4I
19.Os
20.Sh NAME
21.Nm mhd
22.Nd multihost disk control operations
23.Sh SYNOPSIS
24.In sys/mhd.h
25.Sh DESCRIPTION
26The
27.Nm
28.Xr ioctl 2
29control access rights of a multihost disk, using
30disk reservations on the disk device.
31.Pp
32The stability level of this interface (see
33.Xr attributes 7 )
34is evolving.
35As a result, the interface is subject to change and you should limit your use of
36it.
37.Pp
38The mhd ioctls fall into two major categories: (1) ioctls for non-shared
39multihost disks and (2) ioctls for shared multihost disks.
40.Pp
41One ioctl,
42.Dv MHIOCENFAILFAST ,
43is applicable to both non-shared and shared multihost disks.
44It is described after the first two categories.
45.Pp
46All the ioctls require root privilege.
47.Pp
48For all of the ioctls, the caller should obtain the file descriptor for the
49device by calling
50.Xr open 2
51with the
52.Dv O_NDELAY
53flag; without the
54.Dv O_NDELAY
55flag, the open may fail due to another host already having a
56conflicting reservation on the device.
57Some of the ioctls below permit the caller to forcibly clear a conflicting
58reservation held by another host, however, in order to call the ioctl, the
59caller must first obtain the open file descriptor.
60.Ss "Non-shared multihost disks"
61Non-shared multihost disks ioctls consist of
62.Dv MHIOCTKOWN ,
63.Dv MHIOCRELEASE ,
64.Dv MHIOCSTATUS ,
65and
66.Dv MHIOCQRESERVE .
67These ioctl requests control the access rights of non-shared multihost disks.
68A non-shared multihost disk is one that supports serialized, mutually exclusive
69I/O mastery by the connected hosts.
70This is in contrast to the shared-disk model, in which
71concurrent access is allowed from more than one host (see below).
72.Pp
73A non-shared multihost disk can be in one of two states:
74.Bl -bullet -width indent
75.It
76Exclusive access state, where only one connected host has I/O access
77.It
78Non-exclusive access state, where all connected hosts have I/O access.
79An external hardware reset can cause the disk to enter the non-exclusive access
80state.
81.El
82.Pp
83Each multihost disk driver views the machine on which it's running as the
84.Dq local host ;
85each views all other machines as
86.Dq remote hosts .
87For each I/O or ioctl request, the requesting host is the local host.
88.Pp
89Note that the non-shared ioctls are designed to work with SCSI-2 disks.
90The
91SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the
92device that supports the non-shared ioctls.
93.Pp
94The function prototypes for the non-shared ioctls are:
95.Bd -literal -offset 2n
96.Fn ioctl fd MHIOCTKOWN ;
97.Fn ioctl fd MHIOCRELEASE ;
98.Fn ioctl fd MHIOCSTATUS ;
99.Fn ioctl fd MHIOCQRESERVE ;
100.Ed
101.Bl -tag -width MHIOCQRESERVE
102.It Dv MHIOCTKOWN
103Forcefully acquires exclusive access rights to the multihost disk for the local
104host.
105Revokes all access rights to the multihost disk from remote hosts.
106Causes the disk to enter the exclusive access state.
107.Pp
108Implementation Note: Reservations (exclusive access rights) broken via random
109resets should be reinstated by the driver upon their detection, for example, in
110the automatic probe function described below.
111.It Dv MHIOCRELEASE
112Relinquishes exclusive access rights to the multihost disk for the local host.
113On success, causes the disk to enter the non- exclusive access state.
114.It Dv MHIOCSTATUS
115Probes a multihost disk to determine whether the local host has access rights
116to the disk.
117Returns
118.Sy 0
119if the local host has access to the disk,
120.Sy 1
121if it doesn't, and
122.Sy -1
123with
124.Va errno
125set to
126.Er EIO
127if the probe failed for some other reason.
128.It Dv MHIOCQRESERVE
129Issues, simply and only, a SCSI-2 Reserve command.
130If the attempt to reserve
131fails due to the SCSI error Reservation Conflict (which implies that some other
132host has the device reserved), then the ioctl will return
133.Sy -1
134with
135.Va errno
136set to
137.Er EACCES .
138The
139.Dv MHIOCQRESERVE
140ioctl does NOT issue a bus device
141reset or bus reset prior to attempting the SCSI-2 reserve command.
142It also
143does not take care of re-instating reservations that disappear due to bus
144resets or bus device resets; if that behavior is desired, then the caller can
145call
146.Dv MHIOCTKOWN
147after the
148.Dv MHIOCQRESERVE
149has returned success.
150If
151the device does not support the SCSI-2 Reserve command, then the ioctl returns
152.Er -1
153with
154.Va errno
155set to
156.Er ENOTSUP .
157The
158.Dv MHIOCQRESERVE
159ioctl is intended to be used by high-availability or clustering software for a
160.Dq quorum
161disk, hence, the
162.Dq Q
163in the name of the ioctl.
164.El
165.Ss "Shared Multihost Disks"
166Shared multihost disks ioctls control access to shared multihost disks.
167The ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility.
168Therefore, the underlying semantic model is not described in detail here, see
169instead the SCSI-3 standard.
170The SCSI-3 Persistent Reservations support the
171concept of a group of hosts all sharing access to a disk.
172.Pp
173The function prototypes and descriptions for the shared multihost ioctls are as
174follows:
175.Bl -tag -width 1n
176.It Fn ioctl fd MHIOCGRP_INKEYS "(mhioc_inkeys_t *)k"
177.Pp
178Issues the SCSI-3 command Persistent Reserve In Read Keys to the device.
179On input, the field
180.Fa k->li
181should be initialized by the caller with
182.Fa k->li.listsize
183reflecting how big of an array the caller has allocated for the
184.Fa k->lilist
185field and with
186.Ql k->li.listlen\& ==\& 0 .
187On return, the field
188.Fa k->li.listlen
189is updated to indicate the number of
190reservation keys the device currently has: if this value is larger than
191.Fa k->li.listsize
192then that indicates that the caller should have passed a bigger
193.Fa k->li.list
194array with a bigger
195.Fa k->li.listsize .
196The number of array elements actually written by the callee into
197.Fa k->li.list
198is the minimum of
199.Fa k->li.listlen
200and
201.Fa k->li.listsize .
202The field
203.Fa k->generation
204is updated with the generation information returned by the SCSI-3
205Read Keys query.
206If the device does not support SCSI-3 Persistent Reservations,
207then this ioctl returns
208.Sy -1
209with
210.Va errno
211set to
212.Er ENOTSUP .
213.It Fn ioctl fd MHIOCGRP_INRESV "(mhioc_inresvs_t *)r"
214.Pp
215Issues the SCSI-3 command Persistent Reserve In Read Reservations to the
216device.
217Remarks similar to
218.Dv MHIOCGRP_INKEYS
219apply to the array manipulation.
220If the device does not support SCSI-3 Persistent Reservations,
221then this ioctl returns
222.Sy -1
223with
224.Va errno
225set to
226.Er ENOTSUP .
227.It Fn ioctl fd MHIOCGRP_REGISTER "(mhioc_register_t *)r"
228.Pp
229Issues the SCSI-3 command Persistent Reserve Out Register.
230The fields of structure
231.Va r
232are all inputs; none of the fields are modified by the ioctl.
233The field
234.Fa r->aptpl
235should be set to true to specify that registrations
236and reservations should persist across device power failures, or to false to
237specify that registrations and reservations should be cleared upon device power
238failure; true is the recommended setting.
239The field
240.Fa r->oldkey
241is the key that the caller believes the device may already have for this host
242initiator; if the caller believes that that this host initiator is not already
243registered with this device, it should pass the special key of all zeros.
244To achieve the effect of unregistering with the device, the caller should pass
245its current key for the
246.Fa r->oldkey
247field and an
248.Fa r->newkey
249field containing the special key of all zeros.
250If the device returns the SCSI error code
251Reservation Conflict, this ioctl returns
252.Sy -1
253with
254.Va errno
255set to
256.Er EACCES .
257.It Fn ioctl fd MHIOCGRP_RESERVE "(mhioc_resv_desc_t *)r"
258.Pp
259Issues the SCSI-3 command Persistent Reserve Out Reserve.
260The fields of
261structure
262.Va r
263are all inputs; none of the fields are modified by the ioctl.
264If the device returns the SCSI error code Reservation Conflict, this ioctl
265returns
266.Sy -1
267with
268.Va errno
269set to
270.Er EACCES .
271.It Fn ioctl fd MHIOCGRP_PREEMPTANDABORT "(mhioc_preemptandabort_t *)r"
272.Pp
273Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort.
274The fields
275of structure
276.Va r
277are all inputs; none of the fields are modified by the ioctl.
278The key of the victim host is specified by the field
279.Fa r->victim_key .
280The field
281.Fa r->resvdesc
282supplies the preempter's key and the reservation that it is requesting as part
283of the SCSI-3 Preempt-And-Abort command.
284If the device returns the SCSI error code
285Reservation Conflict, this ioctl returns
286.Sy -1
287with
288.Va errno
289set to
290.Er EACCES .
291.It Fn ioctl fd MHIOCGRP_PREEMPT "(mhioc_preemptandabort_t *)r"
292.Pp
293Similar to
294.Dv MHIOCGRP_PREEMPTANDABORT ,
295but instead issues the SCSI-3 command Persistent Reserve Out Preempt.
296(Note: This command is not implemented).
297.It Fn ioctl fd MHIOCGRP_CLEAR "(mhioc_resv_key_t *)r"
298Issues the SCSI-3 command Persistent Reserve Out Clear.
299The input parameter
300.Va r
301is the reservation key of the caller, which should have been already
302registered with the device, by an earlier call to
303.Dv MHIOCGRP_REGISTER .
304.El
305.Pp
306For each device, the non-shared ioctls should not be mixed with the Persistent
307Reserve Out shared ioctls, and vice-versa,  otherwise, the underlying device is
308likely to return errors, because SCSI does not permit SCSI-2 reservations to be
309mixed with SCSI-3 reservations on a single device.
310It is, however, legitimate
311to call the Persistent Reserve In ioctls, because these are query only.
312Issuing the
313.Dv MHIOCGRP_INKEYS
314ioctl is the recommended way for a caller to
315determine if the device supports SCSI-3 Persistent Reservations (the ioctl
316will return
317.Sy -1
318with
319.Va errno
320set to
321.Er ENOTSUP
322if the device does not).
323.Ss "MHIOCENFAILFAST Ioctl"
324The
325.Dv MHIOCENFAILFAST
326ioctl is applicable for both non-shared and shared
327disks, and may be used with either the non-shared or shared ioctls.
328.Bl -tag -width 1n
329.It Fn ioctl fd MHIOENFAILFAST "(unsigned int *)millisecs"
330.Pp
331Enables or disables the failfast option in the multihost disk driver and
332enables or disables automatic probing of a multihost disk, described below.
333The argument is an unsigned integer specifying the number of milliseconds to
334wait between executions of the automatic probe function.
335An argument of zero disables the failfast option and disables automatic probing.
336If the
337.Dv MHIOCENFAILFAST
338ioctl is never called, the effect is defined to be that
339both the failfast option and automatic probing are disabled.
340.El
341.Ss "Automatic Probing"
342The
343.Dv MHIOCENFAILFAST
344ioctl sets up a timeout in the driver to periodically
345schedule automatic probes of the disk.
346The automatic probe function works in this manner: The driver is scheduled to
347probe the multihost disk every n milliseconds, rounded up to the next integral
348multiple of the system clock's resolution.
349If
350.Bl -enum -offset indent
351.It
352the local host no longer has access rights to the multihost disk, and
353.It
354access rights were expected to be held by the local host,
355.El
356.Pp
357the driver immediately panics the machine to comply with the failfast model.
358.Pp
359If the driver makes this discovery outside the timeout function, especially
360during a read or write operation, it is imperative that it panic the system
361then as well.
362.Sh RETURN VALUES
363Each request returns
364.Sy -1
365on failure and sets
366.Va errno
367to indicate the error.
368.Bl -tag -width Er
369.It Er EPERM
370Caller is not root.
371.It Er EACCES
372Access rights were denied.
373.It Er EIO
374The multihost disk or controller was unable to successfully complete the
375requested operation.
376.It Er EOPNOTSUP
377The multihost disk does not support the operation.
378For example, it does not support the SCSI-2 Reserve/Release command set, or the
379SCSI-3 Persistent Reservation command set.
380.El
381.Sh STABILITY
382Uncommitted
383.Sh SEE ALSO
384.Xr ioctl 2 ,
385.Xr open 2 ,
386.Xr attributes 7
387