1.\" Copyright (c) 2005 Sun Microsystems, Inc. All Rights Reserved. 2.\" Copyright (c) 2017, Joyent, Inc. 3.\" The contents of this file are subject to the terms of the 4.\" Common Development and Distribution License (the "License"). 5.\" You may not use this file except in compliance with the License. 6.\" 7.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 8.\" or http://www.opensolaris.org/os/licensing. 9.\" See the License for the specific language governing permissions 10.\" and limitations under the License. 11.\" 12.\" When distributing Covered Code, include this CDDL HEADER in each 13.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 14.\" If applicable, add the following below this CDDL HEADER, with the 15.\" fields enclosed by brackets "[]" replaced with your own identifying 16.\" information: Portions Copyright [yyyy] [name of copyright owner] 17.Dd March 13, 2022 18.Dt MHD 4I 19.Os 20.Sh NAME 21.Nm mhd 22.Nd multihost disk control operations 23.Sh SYNOPSIS 24.In sys/mhd.h 25.Sh DESCRIPTION 26The 27.Nm 28.Xr ioctl 2 29control access rights of a multihost disk, using 30disk reservations on the disk device. 31.Pp 32The stability level of this interface (see 33.Xr attributes 7 ) 34is evolving. 35As a result, the interface is subject to change and you should limit your use of 36it. 37.Pp 38The mhd ioctls fall into two major categories: (1) ioctls for non-shared 39multihost disks and (2) ioctls for shared multihost disks. 40.Pp 41One ioctl, 42.Dv MHIOCENFAILFAST , 43is applicable to both non-shared and shared multihost disks. 44It is described after the first two categories. 45.Pp 46All the ioctls require root privilege. 47.Pp 48For all of the ioctls, the caller should obtain the file descriptor for the 49device by calling 50.Xr open 2 51with the 52.Dv O_NDELAY 53flag; without the 54.Dv O_NDELAY 55flag, the open may fail due to another host already having a 56conflicting reservation on the device. 57Some of the ioctls below permit the caller to forcibly clear a conflicting 58reservation held by another host, however, in order to call the ioctl, the 59caller must first obtain the open file descriptor. 60.Ss "Non-shared multihost disks" 61Non-shared multihost disks ioctls consist of 62.Dv MHIOCTKOWN , 63.Dv MHIOCRELEASE , 64.Dv MHIOCSTATUS , 65and 66.Dv MHIOCQRESERVE . 67These ioctl requests control the access rights of non-shared multihost disks. 68A non-shared multihost disk is one that supports serialized, mutually exclusive 69I/O mastery by the connected hosts. 70This is in contrast to the shared-disk model, in which 71concurrent access is allowed from more than one host (see below). 72.Pp 73A non-shared multihost disk can be in one of two states: 74.Bl -bullet -width indent 75.It 76Exclusive access state, where only one connected host has I/O access 77.It 78Non-exclusive access state, where all connected hosts have I/O access. 79An external hardware reset can cause the disk to enter the non-exclusive access 80state. 81.El 82.Pp 83Each multihost disk driver views the machine on which it's running as the 84.Dq local host ; 85each views all other machines as 86.Dq remote hosts . 87For each I/O or ioctl request, the requesting host is the local host. 88.Pp 89Note that the non-shared ioctls are designed to work with SCSI-2 disks. 90The 91SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the 92device that supports the non-shared ioctls. 93.Pp 94The function prototypes for the non-shared ioctls are: 95.Bd -literal -offset 2n 96.Fn ioctl fd MHIOCTKOWN ; 97.Fn ioctl fd MHIOCRELEASE ; 98.Fn ioctl fd MHIOCSTATUS ; 99.Fn ioctl fd MHIOCQRESERVE ; 100.Ed 101.Bl -tag -width MHIOCQRESERVE 102.It Dv MHIOCTKOWN 103Forcefully acquires exclusive access rights to the multihost disk for the local 104host. 105Revokes all access rights to the multihost disk from remote hosts. 106Causes the disk to enter the exclusive access state. 107.Pp 108Implementation Note: Reservations (exclusive access rights) broken via random 109resets should be reinstated by the driver upon their detection, for example, in 110the automatic probe function described below. 111.It Dv MHIOCRELEASE 112Relinquishes exclusive access rights to the multihost disk for the local host. 113On success, causes the disk to enter the non- exclusive access state. 114.It Dv MHIOCSTATUS 115Probes a multihost disk to determine whether the local host has access rights 116to the disk. 117Returns 118.Sy 0 119if the local host has access to the disk, 120.Sy 1 121if it doesn't, and 122.Sy -1 123with 124.Va errno 125set to 126.Er EIO 127if the probe failed for some other reason. 128.It Dv MHIOCQRESERVE 129Issues, simply and only, a SCSI-2 Reserve command. 130If the attempt to reserve 131fails due to the SCSI error Reservation Conflict (which implies that some other 132host has the device reserved), then the ioctl will return 133.Sy -1 134with 135.Va errno 136set to 137.Er EACCES . 138The 139.Dv MHIOCQRESERVE 140ioctl does NOT issue a bus device 141reset or bus reset prior to attempting the SCSI-2 reserve command. 142It also 143does not take care of re-instating reservations that disappear due to bus 144resets or bus device resets; if that behavior is desired, then the caller can 145call 146.Dv MHIOCTKOWN 147after the 148.Dv MHIOCQRESERVE 149has returned success. 150If 151the device does not support the SCSI-2 Reserve command, then the ioctl returns 152.Er -1 153with 154.Va errno 155set to 156.Er ENOTSUP . 157The 158.Dv MHIOCQRESERVE 159ioctl is intended to be used by high-availability or clustering software for a 160.Dq quorum 161disk, hence, the 162.Dq Q 163in the name of the ioctl. 164.El 165.Ss "Shared Multihost Disks" 166Shared multihost disks ioctls control access to shared multihost disks. 167The ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility. 168Therefore, the underlying semantic model is not described in detail here, see 169instead the SCSI-3 standard. 170The SCSI-3 Persistent Reservations support the 171concept of a group of hosts all sharing access to a disk. 172.Pp 173The function prototypes and descriptions for the shared multihost ioctls are as 174follows: 175.Bl -tag -width 1n 176.It Fn ioctl fd MHIOCGRP_INKEYS "(mhioc_inkeys_t *)k" 177.Pp 178Issues the SCSI-3 command Persistent Reserve In Read Keys to the device. 179On input, the field 180.Fa k->li 181should be initialized by the caller with 182.Fa k->li.listsize 183reflecting how big of an array the caller has allocated for the 184.Fa k->lilist 185field and with 186.Ql k->li.listlen\& ==\& 0 . 187On return, the field 188.Fa k->li.listlen 189is updated to indicate the number of 190reservation keys the device currently has: if this value is larger than 191.Fa k->li.listsize 192then that indicates that the caller should have passed a bigger 193.Fa k->li.list 194array with a bigger 195.Fa k->li.listsize . 196The number of array elements actually written by the callee into 197.Fa k->li.list 198is the minimum of 199.Fa k->li.listlen 200and 201.Fa k->li.listsize . 202The field 203.Fa k->generation 204is updated with the generation information returned by the SCSI-3 205Read Keys query. 206If the device does not support SCSI-3 Persistent Reservations, 207then this ioctl returns 208.Sy -1 209with 210.Va errno 211set to 212.Er ENOTSUP . 213.It Fn ioctl fd MHIOCGRP_INRESV "(mhioc_inresvs_t *)r" 214.Pp 215Issues the SCSI-3 command Persistent Reserve In Read Reservations to the 216device. 217Remarks similar to 218.Dv MHIOCGRP_INKEYS 219apply to the array manipulation. 220If the device does not support SCSI-3 Persistent Reservations, 221then this ioctl returns 222.Sy -1 223with 224.Va errno 225set to 226.Er ENOTSUP . 227.It Fn ioctl fd MHIOCGRP_REGISTER "(mhioc_register_t *)r" 228.Pp 229Issues the SCSI-3 command Persistent Reserve Out Register. 230The fields of structure 231.Va r 232are all inputs; none of the fields are modified by the ioctl. 233The field 234.Fa r->aptpl 235should be set to true to specify that registrations 236and reservations should persist across device power failures, or to false to 237specify that registrations and reservations should be cleared upon device power 238failure; true is the recommended setting. 239The field 240.Fa r->oldkey 241is the key that the caller believes the device may already have for this host 242initiator; if the caller believes that that this host initiator is not already 243registered with this device, it should pass the special key of all zeros. 244To achieve the effect of unregistering with the device, the caller should pass 245its current key for the 246.Fa r->oldkey 247field and an 248.Fa r->newkey 249field containing the special key of all zeros. 250If the device returns the SCSI error code 251Reservation Conflict, this ioctl returns 252.Sy -1 253with 254.Va errno 255set to 256.Er EACCES . 257.It Fn ioctl fd MHIOCGRP_RESERVE "(mhioc_resv_desc_t *)r" 258.Pp 259Issues the SCSI-3 command Persistent Reserve Out Reserve. 260The fields of 261structure 262.Va r 263are all inputs; none of the fields are modified by the ioctl. 264If the device returns the SCSI error code Reservation Conflict, this ioctl 265returns 266.Sy -1 267with 268.Va errno 269set to 270.Er EACCES . 271.It Fn ioctl fd MHIOCGRP_PREEMPTANDABORT "(mhioc_preemptandabort_t *)r" 272.Pp 273Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort. 274The fields 275of structure 276.Va r 277are all inputs; none of the fields are modified by the ioctl. 278The key of the victim host is specified by the field 279.Fa r->victim_key . 280The field 281.Fa r->resvdesc 282supplies the preempter's key and the reservation that it is requesting as part 283of the SCSI-3 Preempt-And-Abort command. 284If the device returns the SCSI error code 285Reservation Conflict, this ioctl returns 286.Sy -1 287with 288.Va errno 289set to 290.Er EACCES . 291.It Fn ioctl fd MHIOCGRP_PREEMPT "(mhioc_preemptandabort_t *)r" 292.Pp 293Similar to 294.Dv MHIOCGRP_PREEMPTANDABORT , 295but instead issues the SCSI-3 command Persistent Reserve Out Preempt. 296(Note: This command is not implemented). 297.It Fn ioctl fd MHIOCGRP_CLEAR "(mhioc_resv_key_t *)r" 298Issues the SCSI-3 command Persistent Reserve Out Clear. 299The input parameter 300.Va r 301is the reservation key of the caller, which should have been already 302registered with the device, by an earlier call to 303.Dv MHIOCGRP_REGISTER . 304.El 305.Pp 306For each device, the non-shared ioctls should not be mixed with the Persistent 307Reserve Out shared ioctls, and vice-versa, otherwise, the underlying device is 308likely to return errors, because SCSI does not permit SCSI-2 reservations to be 309mixed with SCSI-3 reservations on a single device. 310It is, however, legitimate 311to call the Persistent Reserve In ioctls, because these are query only. 312Issuing the 313.Dv MHIOCGRP_INKEYS 314ioctl is the recommended way for a caller to 315determine if the device supports SCSI-3 Persistent Reservations (the ioctl 316will return 317.Sy -1 318with 319.Va errno 320set to 321.Er ENOTSUP 322if the device does not). 323.Ss "MHIOCENFAILFAST Ioctl" 324The 325.Dv MHIOCENFAILFAST 326ioctl is applicable for both non-shared and shared 327disks, and may be used with either the non-shared or shared ioctls. 328.Bl -tag -width 1n 329.It Fn ioctl fd MHIOENFAILFAST "(unsigned int *)millisecs" 330.Pp 331Enables or disables the failfast option in the multihost disk driver and 332enables or disables automatic probing of a multihost disk, described below. 333The argument is an unsigned integer specifying the number of milliseconds to 334wait between executions of the automatic probe function. 335An argument of zero disables the failfast option and disables automatic probing. 336If the 337.Dv MHIOCENFAILFAST 338ioctl is never called, the effect is defined to be that 339both the failfast option and automatic probing are disabled. 340.El 341.Ss "Automatic Probing" 342The 343.Dv MHIOCENFAILFAST 344ioctl sets up a timeout in the driver to periodically 345schedule automatic probes of the disk. 346The automatic probe function works in this manner: The driver is scheduled to 347probe the multihost disk every n milliseconds, rounded up to the next integral 348multiple of the system clock's resolution. 349If 350.Bl -enum -offset indent 351.It 352the local host no longer has access rights to the multihost disk, and 353.It 354access rights were expected to be held by the local host, 355.El 356.Pp 357the driver immediately panics the machine to comply with the failfast model. 358.Pp 359If the driver makes this discovery outside the timeout function, especially 360during a read or write operation, it is imperative that it panic the system 361then as well. 362.Sh RETURN VALUES 363Each request returns 364.Sy -1 365on failure and sets 366.Va errno 367to indicate the error. 368.Bl -tag -width Er 369.It Er EPERM 370Caller is not root. 371.It Er EACCES 372Access rights were denied. 373.It Er EIO 374The multihost disk or controller was unable to successfully complete the 375requested operation. 376.It Er EOPNOTSUP 377The multihost disk does not support the operation. 378For example, it does not support the SCSI-2 Reserve/Release command set, or the 379SCSI-3 Persistent Reservation command set. 380.El 381.Sh STABILITY 382Uncommitted 383.Sh SEE ALSO 384.Xr ioctl 2 , 385.Xr open 2 , 386.Xr attributes 7 387