xref: /linux/Documentation/misc-devices/ibmvmc.rst (revision 597473720f4dc69749542bfcfed4a927a43d935e)
10eca353eSBryant G. Ly.. SPDX-License-Identifier: GPL-2.0+
2*5591a307SRandy Dunlap
30eca353eSBryant G. Ly======================================================
40eca353eSBryant G. LyIBM Virtual Management Channel Kernel Driver (IBMVMC)
50eca353eSBryant G. Ly======================================================
60eca353eSBryant G. Ly
70eca353eSBryant G. Ly:Authors:
80eca353eSBryant G. Ly	Dave Engebretsen <engebret@us.ibm.com>,
90eca353eSBryant G. Ly	Adam Reznechek <adreznec@linux.vnet.ibm.com>,
100eca353eSBryant G. Ly	Steven Royer <seroyer@linux.vnet.ibm.com>,
110eca353eSBryant G. Ly	Bryant G. Ly <bryantly@linux.vnet.ibm.com>,
120eca353eSBryant G. Ly
130eca353eSBryant G. LyIntroduction
140eca353eSBryant G. Ly============
150eca353eSBryant G. Ly
160eca353eSBryant G. LyNote: Knowledge of virtualization technology is required to understand
170eca353eSBryant G. Lythis document.
180eca353eSBryant G. Ly
190eca353eSBryant G. LyA good reference document would be:
200eca353eSBryant G. Ly
210eca353eSBryant G. Lyhttps://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
220eca353eSBryant G. Ly
230eca353eSBryant G. LyThe Virtual Management Channel (VMC) is a logical device which provides an
240eca353eSBryant G. Lyinterface between the hypervisor and a management partition. This interface
250eca353eSBryant G. Lyis like a message passing interface. This management partition is intended
260eca353eSBryant G. Lyto provide an alternative to systems that use a Hardware Management
270eca353eSBryant G. LyConsole (HMC) - based system management.
280eca353eSBryant G. Ly
290eca353eSBryant G. LyThe primary hardware management solution that is developed by IBM relies
300eca353eSBryant G. Lyon an appliance server named the Hardware Management Console (HMC),
310eca353eSBryant G. Lypackaged as an external tower or rack-mounted personal computer. In a
320eca353eSBryant G. LyPower Systems environment, a single HMC can manage multiple POWER
330eca353eSBryant G. Lyprocessor-based systems.
340eca353eSBryant G. Ly
350eca353eSBryant G. LyManagement Application
360eca353eSBryant G. Ly----------------------
370eca353eSBryant G. Ly
380eca353eSBryant G. LyIn the management partition, a management application exists which enables
390eca353eSBryant G. Lya system administrator to configure the system’s partitioning
400eca353eSBryant G. Lycharacteristics via a command line interface (CLI) or Representational
410eca353eSBryant G. LyState Transfer Application (REST API's).
420eca353eSBryant G. Ly
430eca353eSBryant G. LyThe management application runs on a Linux logical partition on a
440eca353eSBryant G. LyPOWER8 or newer processor-based server that is virtualized by PowerVM.
450eca353eSBryant G. LySystem configuration, maintenance, and control functions which
460eca353eSBryant G. Lytraditionally require an HMC can be implemented in the management
470eca353eSBryant G. Lyapplication using a combination of HMC to hypervisor interfaces and
480eca353eSBryant G. Lyexisting operating system methods. This tool provides a subset of the
490eca353eSBryant G. Lyfunctions implemented by the HMC and enables basic partition configuration.
500eca353eSBryant G. LyThe set of HMC to hypervisor messages supported by the management
510eca353eSBryant G. Lyapplication component are passed to the hypervisor over a VMC interface,
520eca353eSBryant G. Lywhich is defined below.
530eca353eSBryant G. Ly
540eca353eSBryant G. LyThe VMC enables the management partition to provide basic partitioning
550eca353eSBryant G. Lyfunctions:
560eca353eSBryant G. Ly
570eca353eSBryant G. Ly- Logical Partitioning Configuration
580eca353eSBryant G. Ly- Start, and stop actions for individual partitions
590eca353eSBryant G. Ly- Display of partition status
600eca353eSBryant G. Ly- Management of virtual Ethernet
610eca353eSBryant G. Ly- Management of virtual Storage
620eca353eSBryant G. Ly- Basic system management
630eca353eSBryant G. Ly
640eca353eSBryant G. LyVirtual Management Channel (VMC)
650eca353eSBryant G. Ly--------------------------------
660eca353eSBryant G. Ly
670eca353eSBryant G. LyA logical device, called the Virtual Management Channel (VMC), is defined
680eca353eSBryant G. Lyfor communicating between the management application and the hypervisor. It
690eca353eSBryant G. Lybasically creates the pipes that enable virtualization management
700eca353eSBryant G. Lysoftware. This device is presented to a designated management partition as
710eca353eSBryant G. Lya virtual device.
720eca353eSBryant G. Ly
730eca353eSBryant G. LyThis communication device uses Command/Response Queue (CRQ) and the
740eca353eSBryant G. LyRemote Direct Memory Access (RDMA) interfaces. A three-way handshake is
750eca353eSBryant G. Lydefined that must take place to establish that both the hypervisor and
760eca353eSBryant G. Lymanagement partition sides of the channel are running prior to
770eca353eSBryant G. Lysending/receiving any of the protocol messages.
780eca353eSBryant G. Ly
790eca353eSBryant G. LyThis driver also utilizes Transport Event CRQs. CRQ messages are sent
800eca353eSBryant G. Lywhen the hypervisor detects one of the peer partitions has abnormally
810eca353eSBryant G. Lyterminated, or one side has called H_FREE_CRQ to close their CRQ.
820eca353eSBryant G. LyTwo new classes of CRQ messages are introduced for the VMC device. VMC
830eca353eSBryant G. LyAdministrative messages are used for each partition using the VMC to
840eca353eSBryant G. Lycommunicate capabilities to their partner. HMC Interface messages are used
850eca353eSBryant G. Lyfor the actual flow of HMC messages between the management partition and
860eca353eSBryant G. Lythe hypervisor. As most HMC messages far exceed the size of a CRQ buffer,
870eca353eSBryant G. Lya virtual DMA (RMDA) of the HMC message data is done prior to each HMC
880eca353eSBryant G. LyInterface CRQ message. Only the management partition drives RDMA
890eca353eSBryant G. Lyoperations; hypervisors never directly cause the movement of message data.
900eca353eSBryant G. Ly
910eca353eSBryant G. Ly
920eca353eSBryant G. LyTerminology
930eca353eSBryant G. Ly-----------
940eca353eSBryant G. LyRDMA
950eca353eSBryant G. Ly        Remote Direct Memory Access is DMA transfer from the server to its
960eca353eSBryant G. Ly        client or from the server to its partner partition. DMA refers
970eca353eSBryant G. Ly        to both physical I/O to and from memory operations and to memory
980eca353eSBryant G. Ly        to memory move operations.
990eca353eSBryant G. LyCRQ
1000eca353eSBryant G. Ly        Command/Response Queue a facility which is used to communicate
1010eca353eSBryant G. Ly        between partner partitions. Transport events which are signaled
1020eca353eSBryant G. Ly        from the hypervisor to partition are also reported in this queue.
1030eca353eSBryant G. Ly
1040eca353eSBryant G. LyExample Management Partition VMC Driver Interface
1050eca353eSBryant G. Ly=================================================
1060eca353eSBryant G. Ly
1070eca353eSBryant G. LyThis section provides an example for the management application
1080eca353eSBryant G. Lyimplementation where a device driver is used to interface to the VMC
1090eca353eSBryant G. Lydevice. This driver consists of a new device, for example /dev/ibmvmc,
1100eca353eSBryant G. Lywhich provides interfaces to open, close, read, write, and perform
1110eca353eSBryant G. Lyioctl’s against the VMC device.
1120eca353eSBryant G. Ly
1130eca353eSBryant G. LyVMC Interface Initialization
1140eca353eSBryant G. Ly----------------------------
1150eca353eSBryant G. Ly
1160eca353eSBryant G. LyThe device driver is responsible for initializing the VMC when the driver
1170eca353eSBryant G. Lyis loaded. It first creates and initializes the CRQ. Next, an exchange of
1180eca353eSBryant G. LyVMC capabilities is performed to indicate the code version and number of
1190eca353eSBryant G. Lyresources available in both the management partition and the hypervisor.
1200eca353eSBryant G. LyFinally, the hypervisor requests that the management partition create an
1210eca353eSBryant G. Lyinitial pool of VMC buffers, one buffer for each possible HMC connection,
1220eca353eSBryant G. Lywhich will be used for management application  session initialization.
1230eca353eSBryant G. LyPrior to completion of this initialization sequence, the device returns
1240eca353eSBryant G. LyEBUSY to open() calls. EIO is returned for all open() failures.
1250eca353eSBryant G. Ly
1260eca353eSBryant G. Ly::
1270eca353eSBryant G. Ly
1280eca353eSBryant G. Ly        Management Partition		Hypervisor
1290eca353eSBryant G. Ly                        CRQ INIT
1300eca353eSBryant G. Ly        ---------------------------------------->
1310eca353eSBryant G. Ly        	   CRQ INIT COMPLETE
1320eca353eSBryant G. Ly        <----------------------------------------
1330eca353eSBryant G. Ly        	      CAPABILITIES
1340eca353eSBryant G. Ly        ---------------------------------------->
1350eca353eSBryant G. Ly        	 CAPABILITIES RESPONSE
1360eca353eSBryant G. Ly        <----------------------------------------
1370eca353eSBryant G. Ly              ADD BUFFER (HMC IDX=0,1,..)         _
1380eca353eSBryant G. Ly        <----------------------------------------  |
1390eca353eSBryant G. Ly        	  ADD BUFFER RESPONSE              | - Perform # HMCs Iterations
1400eca353eSBryant G. Ly        ----------------------------------------> -
1410eca353eSBryant G. Ly
1420eca353eSBryant G. LyVMC Interface Open
1430eca353eSBryant G. Ly------------------
1440eca353eSBryant G. Ly
1450eca353eSBryant G. LyAfter the basic VMC channel has been initialized, an HMC session level
1460eca353eSBryant G. Lyconnection can be established. The application layer performs an open() to
1470eca353eSBryant G. Lythe VMC device and executes an ioctl() against it, indicating the HMC ID
1480eca353eSBryant G. Ly(32 bytes of data) for this session. If the VMC device is in an invalid
1490eca353eSBryant G. Lystate, EIO will be returned for the ioctl(). The device driver creates a
1500eca353eSBryant G. Lynew HMC session value (ranging from 1 to 255) and HMC index value (starting
1510eca353eSBryant G. Lyat index 0 and ranging to 254) for this HMC ID. The driver then does an
1520eca353eSBryant G. LyRDMA of the HMC ID to the hypervisor, and then sends an Interface Open
1530eca353eSBryant G. Lymessage to the hypervisor to establish the session over the VMC. After the
1540eca353eSBryant G. Lyhypervisor receives this information, it sends Add Buffer messages to the
1550eca353eSBryant G. Lymanagement partition to seed an initial pool of buffers for the new HMC
1560eca353eSBryant G. Lyconnection. Finally, the hypervisor sends an Interface Open Response
1570eca353eSBryant G. Lymessage, to indicate that it is ready for normal runtime messaging. The
1580eca353eSBryant G. Lyfollowing illustrates this VMC flow:
1590eca353eSBryant G. Ly
1600eca353eSBryant G. Ly::
1610eca353eSBryant G. Ly
1620eca353eSBryant G. Ly        Management Partition             Hypervisor
1630eca353eSBryant G. Ly        	      RDMA HMC ID
1640eca353eSBryant G. Ly        ---------------------------------------->
1650eca353eSBryant G. Ly        	    Interface Open
1660eca353eSBryant G. Ly        ---------------------------------------->
1670eca353eSBryant G. Ly        	      Add Buffer                  _
1680eca353eSBryant G. Ly        <----------------------------------------  |
1690eca353eSBryant G. Ly        	  Add Buffer Response              | - Perform N Iterations
1700eca353eSBryant G. Ly        ----------------------------------------> -
1710eca353eSBryant G. Ly        	Interface Open Response
1720eca353eSBryant G. Ly        <----------------------------------------
1730eca353eSBryant G. Ly
1740eca353eSBryant G. LyVMC Interface Runtime
1750eca353eSBryant G. Ly---------------------
1760eca353eSBryant G. Ly
1770eca353eSBryant G. LyDuring normal runtime, the management application and the hypervisor
1780eca353eSBryant G. Lyexchange HMC messages via the Signal VMC message and RDMA operations. When
1790eca353eSBryant G. Lysending data to the hypervisor, the management application performs a
1800eca353eSBryant G. Lywrite() to the VMC device, and the driver RDMA’s the data to the hypervisor
1810eca353eSBryant G. Lyand then sends a Signal Message. If a write() is attempted before VMC
1820eca353eSBryant G. Lydevice buffers have been made available by the hypervisor, or no buffers
1830eca353eSBryant G. Lyare currently available, EBUSY is returned in response to the write(). A
1840eca353eSBryant G. Lywrite() will return EIO for all other errors, such as an invalid device
1850eca353eSBryant G. Lystate. When the hypervisor sends a message to the management, the data is
1860eca353eSBryant G. Lyput into a VMC buffer and an Signal Message is sent to the VMC driver in
1870eca353eSBryant G. Lythe management partition. The driver RDMA’s the buffer into the partition
1880eca353eSBryant G. Lyand passes the data up to the appropriate management application via a
1890eca353eSBryant G. Lyread() to the VMC device. The read() request blocks if there is no buffer
1900eca353eSBryant G. Lyavailable to read. The management application may use select() to wait for
1910eca353eSBryant G. Lythe VMC device to become ready with data to read.
1920eca353eSBryant G. Ly
1930eca353eSBryant G. Ly::
1940eca353eSBryant G. Ly
1950eca353eSBryant G. Ly        Management Partition             Hypervisor
1960eca353eSBryant G. Ly        		MSG RDMA
1970eca353eSBryant G. Ly        ---------------------------------------->
1980eca353eSBryant G. Ly        		SIGNAL MSG
1990eca353eSBryant G. Ly        ---------------------------------------->
2000eca353eSBryant G. Ly        		SIGNAL MSG
2010eca353eSBryant G. Ly        <----------------------------------------
2020eca353eSBryant G. Ly        		MSG RDMA
2030eca353eSBryant G. Ly        <----------------------------------------
2040eca353eSBryant G. Ly
2050eca353eSBryant G. LyVMC Interface Close
2060eca353eSBryant G. Ly-------------------
2070eca353eSBryant G. Ly
2080eca353eSBryant G. LyHMC session level connections are closed by the management partition when
2090eca353eSBryant G. Lythe application layer performs a close() against the device. This action
2100eca353eSBryant G. Lyresults in an Interface Close message flowing to the hypervisor, which
2110eca353eSBryant G. Lycauses the session to be terminated. The device driver must free any
2120eca353eSBryant G. Lystorage allocated for buffers for this HMC connection.
2130eca353eSBryant G. Ly
2140eca353eSBryant G. Ly::
2150eca353eSBryant G. Ly
2160eca353eSBryant G. Ly        Management Partition             Hypervisor
2170eca353eSBryant G. Ly        	     INTERFACE CLOSE
2180eca353eSBryant G. Ly        ---------------------------------------->
2190eca353eSBryant G. Ly                INTERFACE CLOSE RESPONSE
2200eca353eSBryant G. Ly        <----------------------------------------
2210eca353eSBryant G. Ly
2220eca353eSBryant G. LyAdditional Information
2230eca353eSBryant G. Ly======================
2240eca353eSBryant G. Ly
2250eca353eSBryant G. LyFor more information on the documentation for CRQ Messages, VMC Messages,
2260eca353eSBryant G. LyHMC interface Buffers, and signal messages please refer to the Linux on
2270eca353eSBryant G. LyPower Architecture Platform Reference. Section F.
228