1.. SPDX-License-Identifier: GPL-2.0+ 2====================================================== 3IBM Virtual Management Channel Kernel Driver (IBMVMC) 4====================================================== 5 6:Authors: 7 Dave Engebretsen <engebret@us.ibm.com>, 8 Adam Reznechek <adreznec@linux.vnet.ibm.com>, 9 Steven Royer <seroyer@linux.vnet.ibm.com>, 10 Bryant G. Ly <bryantly@linux.vnet.ibm.com>, 11 12Introduction 13============ 14 15Note: Knowledge of virtualization technology is required to understand 16this document. 17 18A good reference document would be: 19 20https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf 21 22The Virtual Management Channel (VMC) is a logical device which provides an 23interface between the hypervisor and a management partition. This interface 24is like a message passing interface. This management partition is intended 25to provide an alternative to systems that use a Hardware Management 26Console (HMC) - based system management. 27 28The primary hardware management solution that is developed by IBM relies 29on an appliance server named the Hardware Management Console (HMC), 30packaged as an external tower or rack-mounted personal computer. In a 31Power Systems environment, a single HMC can manage multiple POWER 32processor-based systems. 33 34Management Application 35---------------------- 36 37In the management partition, a management application exists which enables 38a system administrator to configure the system’s partitioning 39characteristics via a command line interface (CLI) or Representational 40State Transfer Application (REST API's). 41 42The management application runs on a Linux logical partition on a 43POWER8 or newer processor-based server that is virtualized by PowerVM. 44System configuration, maintenance, and control functions which 45traditionally require an HMC can be implemented in the management 46application using a combination of HMC to hypervisor interfaces and 47existing operating system methods. This tool provides a subset of the 48functions implemented by the HMC and enables basic partition configuration. 49The set of HMC to hypervisor messages supported by the management 50application component are passed to the hypervisor over a VMC interface, 51which is defined below. 52 53The VMC enables the management partition to provide basic partitioning 54functions: 55 56- Logical Partitioning Configuration 57- Start, and stop actions for individual partitions 58- Display of partition status 59- Management of virtual Ethernet 60- Management of virtual Storage 61- Basic system management 62 63Virtual Management Channel (VMC) 64-------------------------------- 65 66A logical device, called the Virtual Management Channel (VMC), is defined 67for communicating between the management application and the hypervisor. It 68basically creates the pipes that enable virtualization management 69software. This device is presented to a designated management partition as 70a virtual device. 71 72This communication device uses Command/Response Queue (CRQ) and the 73Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is 74defined that must take place to establish that both the hypervisor and 75management partition sides of the channel are running prior to 76sending/receiving any of the protocol messages. 77 78This driver also utilizes Transport Event CRQs. CRQ messages are sent 79when the hypervisor detects one of the peer partitions has abnormally 80terminated, or one side has called H_FREE_CRQ to close their CRQ. 81Two new classes of CRQ messages are introduced for the VMC device. VMC 82Administrative messages are used for each partition using the VMC to 83communicate capabilities to their partner. HMC Interface messages are used 84for the actual flow of HMC messages between the management partition and 85the hypervisor. As most HMC messages far exceed the size of a CRQ buffer, 86a virtual DMA (RMDA) of the HMC message data is done prior to each HMC 87Interface CRQ message. Only the management partition drives RDMA 88operations; hypervisors never directly cause the movement of message data. 89 90 91Terminology 92----------- 93RDMA 94 Remote Direct Memory Access is DMA transfer from the server to its 95 client or from the server to its partner partition. DMA refers 96 to both physical I/O to and from memory operations and to memory 97 to memory move operations. 98CRQ 99 Command/Response Queue a facility which is used to communicate 100 between partner partitions. Transport events which are signaled 101 from the hypervisor to partition are also reported in this queue. 102 103Example Management Partition VMC Driver Interface 104================================================= 105 106This section provides an example for the management application 107implementation where a device driver is used to interface to the VMC 108device. This driver consists of a new device, for example /dev/ibmvmc, 109which provides interfaces to open, close, read, write, and perform 110ioctl’s against the VMC device. 111 112VMC Interface Initialization 113---------------------------- 114 115The device driver is responsible for initializing the VMC when the driver 116is loaded. It first creates and initializes the CRQ. Next, an exchange of 117VMC capabilities is performed to indicate the code version and number of 118resources available in both the management partition and the hypervisor. 119Finally, the hypervisor requests that the management partition create an 120initial pool of VMC buffers, one buffer for each possible HMC connection, 121which will be used for management application session initialization. 122Prior to completion of this initialization sequence, the device returns 123EBUSY to open() calls. EIO is returned for all open() failures. 124 125:: 126 127 Management Partition Hypervisor 128 CRQ INIT 129 ----------------------------------------> 130 CRQ INIT COMPLETE 131 <---------------------------------------- 132 CAPABILITIES 133 ----------------------------------------> 134 CAPABILITIES RESPONSE 135 <---------------------------------------- 136 ADD BUFFER (HMC IDX=0,1,..) _ 137 <---------------------------------------- | 138 ADD BUFFER RESPONSE | - Perform # HMCs Iterations 139 ----------------------------------------> - 140 141VMC Interface Open 142------------------ 143 144After the basic VMC channel has been initialized, an HMC session level 145connection can be established. The application layer performs an open() to 146the VMC device and executes an ioctl() against it, indicating the HMC ID 147(32 bytes of data) for this session. If the VMC device is in an invalid 148state, EIO will be returned for the ioctl(). The device driver creates a 149new HMC session value (ranging from 1 to 255) and HMC index value (starting 150at index 0 and ranging to 254) for this HMC ID. The driver then does an 151RDMA of the HMC ID to the hypervisor, and then sends an Interface Open 152message to the hypervisor to establish the session over the VMC. After the 153hypervisor receives this information, it sends Add Buffer messages to the 154management partition to seed an initial pool of buffers for the new HMC 155connection. Finally, the hypervisor sends an Interface Open Response 156message, to indicate that it is ready for normal runtime messaging. The 157following illustrates this VMC flow: 158 159:: 160 161 Management Partition Hypervisor 162 RDMA HMC ID 163 ----------------------------------------> 164 Interface Open 165 ----------------------------------------> 166 Add Buffer _ 167 <---------------------------------------- | 168 Add Buffer Response | - Perform N Iterations 169 ----------------------------------------> - 170 Interface Open Response 171 <---------------------------------------- 172 173VMC Interface Runtime 174--------------------- 175 176During normal runtime, the management application and the hypervisor 177exchange HMC messages via the Signal VMC message and RDMA operations. When 178sending data to the hypervisor, the management application performs a 179write() to the VMC device, and the driver RDMA’s the data to the hypervisor 180and then sends a Signal Message. If a write() is attempted before VMC 181device buffers have been made available by the hypervisor, or no buffers 182are currently available, EBUSY is returned in response to the write(). A 183write() will return EIO for all other errors, such as an invalid device 184state. When the hypervisor sends a message to the management, the data is 185put into a VMC buffer and an Signal Message is sent to the VMC driver in 186the management partition. The driver RDMA’s the buffer into the partition 187and passes the data up to the appropriate management application via a 188read() to the VMC device. The read() request blocks if there is no buffer 189available to read. The management application may use select() to wait for 190the VMC device to become ready with data to read. 191 192:: 193 194 Management Partition Hypervisor 195 MSG RDMA 196 ----------------------------------------> 197 SIGNAL MSG 198 ----------------------------------------> 199 SIGNAL MSG 200 <---------------------------------------- 201 MSG RDMA 202 <---------------------------------------- 203 204VMC Interface Close 205------------------- 206 207HMC session level connections are closed by the management partition when 208the application layer performs a close() against the device. This action 209results in an Interface Close message flowing to the hypervisor, which 210causes the session to be terminated. The device driver must free any 211storage allocated for buffers for this HMC connection. 212 213:: 214 215 Management Partition Hypervisor 216 INTERFACE CLOSE 217 ----------------------------------------> 218 INTERFACE CLOSE RESPONSE 219 <---------------------------------------- 220 221Additional Information 222====================== 223 224For more information on the documentation for CRQ Messages, VMC Messages, 225HMC interface Buffers, and signal messages please refer to the Linux on 226Power Architecture Platform Reference. Section F. 227