1*1a2ac6d7SJonathan Corbet======================================= 2*1a2ac6d7SJonathan CorbetOracle Data Analytics Accelerator (DAX) 3*1a2ac6d7SJonathan Corbet======================================= 4*1a2ac6d7SJonathan Corbet 5*1a2ac6d7SJonathan CorbetDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8 6*1a2ac6d7SJonathan Corbet(DAX2) processor chips, and has direct access to the CPU's L3 caches 7*1a2ac6d7SJonathan Corbetas well as physical memory. It can perform several operations on data 8*1a2ac6d7SJonathan Corbetstreams with various input and output formats. A driver provides a 9*1a2ac6d7SJonathan Corbettransport mechanism and has limited knowledge of the various opcodes 10*1a2ac6d7SJonathan Corbetand data formats. A user space library provides high level services 11*1a2ac6d7SJonathan Corbetand translates these into low level commands which are then passed 12*1a2ac6d7SJonathan Corbetinto the driver and subsequently the Hypervisor and the coprocessor. 13*1a2ac6d7SJonathan CorbetThe library is the recommended way for applications to use the 14*1a2ac6d7SJonathan Corbetcoprocessor, and the driver interface is not intended for general use. 15*1a2ac6d7SJonathan CorbetThis document describes the general flow of the driver, its 16*1a2ac6d7SJonathan Corbetstructures, and its programmatic interface. It also provides example 17*1a2ac6d7SJonathan Corbetcode sufficient to write user or kernel applications that use DAX 18*1a2ac6d7SJonathan Corbetfunctionality. 19*1a2ac6d7SJonathan Corbet 20*1a2ac6d7SJonathan CorbetThe user library is open source and available at: 21*1a2ac6d7SJonathan Corbet 22*1a2ac6d7SJonathan Corbet https://oss.oracle.com/git/gitweb.cgi?p=libdax.git 23*1a2ac6d7SJonathan Corbet 24*1a2ac6d7SJonathan CorbetThe Hypervisor interface to the coprocessor is described in detail in 25*1a2ac6d7SJonathan Corbetthe accompanying document, dax-hv-api.txt, which is a plain text 26*1a2ac6d7SJonathan Corbetexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine 27*1a2ac6d7SJonathan CorbetSpecification" version 3.0.20+15, dated 2017-09-25. 28*1a2ac6d7SJonathan Corbet 29*1a2ac6d7SJonathan Corbet 30*1a2ac6d7SJonathan CorbetHigh Level Overview 31*1a2ac6d7SJonathan Corbet=================== 32*1a2ac6d7SJonathan Corbet 33*1a2ac6d7SJonathan CorbetA coprocessor request is described by a Command Control Block 34*1a2ac6d7SJonathan Corbet(CCB). The CCB contains an opcode and various parameters. The opcode 35*1a2ac6d7SJonathan Corbetspecifies what operation is to be done, and the parameters specify 36*1a2ac6d7SJonathan Corbetoptions, flags, sizes, and addresses. The CCB (or an array of CCBs) 37*1a2ac6d7SJonathan Corbetis passed to the Hypervisor, which handles queueing and scheduling of 38*1a2ac6d7SJonathan Corbetrequests to the available coprocessor execution units. A status code 39*1a2ac6d7SJonathan Corbetreturned indicates if the request was submitted successfully or if 40*1a2ac6d7SJonathan Corbetthere was an error. One of the addresses given in each CCB is a 41*1a2ac6d7SJonathan Corbetpointer to a "completion area", which is a 128 byte memory block that 42*1a2ac6d7SJonathan Corbetis written by the coprocessor to provide execution status. No 43*1a2ac6d7SJonathan Corbetinterrupt is generated upon completion; the completion area must be 44*1a2ac6d7SJonathan Corbetpolled by software to find out when a transaction has finished, but 45*1a2ac6d7SJonathan Corbetthe M7 and later processors provide a mechanism to pause the virtual 46*1a2ac6d7SJonathan Corbetprocessor until the completion status has been updated by the 47*1a2ac6d7SJonathan Corbetcoprocessor. This is done using the monitored load and mwait 48*1a2ac6d7SJonathan Corbetinstructions, which are described in more detail later. The DAX 49*1a2ac6d7SJonathan Corbetcoprocessor was designed so that after a request is submitted, the 50*1a2ac6d7SJonathan Corbetkernel is no longer involved in the processing of it. The polling is 51*1a2ac6d7SJonathan Corbetdone at the user level, which results in almost zero latency between 52*1a2ac6d7SJonathan Corbetcompletion of a request and resumption of execution of the requesting 53*1a2ac6d7SJonathan Corbetthread. 54*1a2ac6d7SJonathan Corbet 55*1a2ac6d7SJonathan Corbet 56*1a2ac6d7SJonathan CorbetAddressing Memory 57*1a2ac6d7SJonathan Corbet================= 58*1a2ac6d7SJonathan Corbet 59*1a2ac6d7SJonathan CorbetThe kernel does not have access to physical memory in the Sun4v 60*1a2ac6d7SJonathan Corbetarchitecture, as there is an additional level of memory virtualization 61*1a2ac6d7SJonathan Corbetpresent. This intermediate level is called "real" memory, and the 62*1a2ac6d7SJonathan Corbetkernel treats this as if it were physical. The Hypervisor handles the 63*1a2ac6d7SJonathan Corbettranslations between real memory and physical so that each logical 64*1a2ac6d7SJonathan Corbetdomain (LDOM) can have a partition of physical memory that is isolated 65*1a2ac6d7SJonathan Corbetfrom that of other LDOMs. When the kernel sets up a virtual mapping, 66*1a2ac6d7SJonathan Corbetit specifies a virtual address and the real address to which it should 67*1a2ac6d7SJonathan Corbetbe mapped. 68*1a2ac6d7SJonathan Corbet 69*1a2ac6d7SJonathan CorbetThe DAX coprocessor can only operate on physical memory, so before a 70*1a2ac6d7SJonathan Corbetrequest can be fed to the coprocessor, all the addresses in a CCB must 71*1a2ac6d7SJonathan Corbetbe converted into physical addresses. The kernel cannot do this since 72*1a2ac6d7SJonathan Corbetit has no visibility into physical addresses. So a CCB may contain 73*1a2ac6d7SJonathan Corbeteither the virtual or real addresses of the buffers or a combination 74*1a2ac6d7SJonathan Corbetof them. An "address type" field is available for each address that 75*1a2ac6d7SJonathan Corbetmay be given in the CCB. In all cases, the Hypervisor will translate 76*1a2ac6d7SJonathan Corbetall the addresses to physical before dispatching to hardware. Address 77*1a2ac6d7SJonathan Corbettranslations are performed using the context of the process initiating 78*1a2ac6d7SJonathan Corbetthe request. 79*1a2ac6d7SJonathan Corbet 80*1a2ac6d7SJonathan Corbet 81*1a2ac6d7SJonathan CorbetThe Driver API 82*1a2ac6d7SJonathan Corbet============== 83*1a2ac6d7SJonathan Corbet 84*1a2ac6d7SJonathan CorbetAn application makes requests to the driver via the write() system 85*1a2ac6d7SJonathan Corbetcall, and gets results (if any) via read(). The completion areas are 86*1a2ac6d7SJonathan Corbetmade accessible via mmap(), and are read-only for the application. 87*1a2ac6d7SJonathan Corbet 88*1a2ac6d7SJonathan CorbetThe request may either be an immediate command or an array of CCBs to 89*1a2ac6d7SJonathan Corbetbe submitted to the hardware. 90*1a2ac6d7SJonathan Corbet 91*1a2ac6d7SJonathan CorbetEach open instance of the device is exclusive to the thread that 92*1a2ac6d7SJonathan Corbetopened it, and must be used by that thread for all subsequent 93*1a2ac6d7SJonathan Corbetoperations. The driver open function creates a new context for the 94*1a2ac6d7SJonathan Corbetthread and initializes it for use. This context contains pointers and 95*1a2ac6d7SJonathan Corbetvalues used internally by the driver to keep track of submitted 96*1a2ac6d7SJonathan Corbetrequests. The completion area buffer is also allocated, and this is 97*1a2ac6d7SJonathan Corbetlarge enough to contain the completion areas for many concurrent 98*1a2ac6d7SJonathan Corbetrequests. When the device is closed, any outstanding transactions are 99*1a2ac6d7SJonathan Corbetflushed and the context is cleaned up. 100*1a2ac6d7SJonathan Corbet 101*1a2ac6d7SJonathan CorbetOn a DAX1 system (M7), the device will be called "oradax1", while on a 102*1a2ac6d7SJonathan CorbetDAX2 system (M8) it will be "oradax2". If an application requires one 103*1a2ac6d7SJonathan Corbetor the other, it should simply attempt to open the appropriate 104*1a2ac6d7SJonathan Corbetdevice. Only one of the devices will exist on any given system, so the 105*1a2ac6d7SJonathan Corbetname can be used to determine what the platform supports. 106*1a2ac6d7SJonathan Corbet 107*1a2ac6d7SJonathan CorbetThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For 108*1a2ac6d7SJonathan Corbetall of these, success is indicated by a return value from write() 109*1a2ac6d7SJonathan Corbetequal to the number of bytes given in the call. Otherwise -1 is 110*1a2ac6d7SJonathan Corbetreturned and errno is set. 111*1a2ac6d7SJonathan Corbet 112*1a2ac6d7SJonathan CorbetCCB_DEQUEUE 113*1a2ac6d7SJonathan Corbet----------- 114*1a2ac6d7SJonathan Corbet 115*1a2ac6d7SJonathan CorbetTells the driver to clean up resources associated with past 116*1a2ac6d7SJonathan Corbetrequests. Since no interrupt is generated upon the completion of a 117*1a2ac6d7SJonathan Corbetrequest, the driver must be told when it may reclaim resources. No 118*1a2ac6d7SJonathan Corbetfurther status information is returned, so the user should not 119*1a2ac6d7SJonathan Corbetsubsequently call read(). 120*1a2ac6d7SJonathan Corbet 121*1a2ac6d7SJonathan CorbetCCB_KILL 122*1a2ac6d7SJonathan Corbet-------- 123*1a2ac6d7SJonathan Corbet 124*1a2ac6d7SJonathan CorbetKills a CCB during execution. The CCB is guaranteed to not continue 125*1a2ac6d7SJonathan Corbetexecuting once this call returns successfully. On success, read() must 126*1a2ac6d7SJonathan Corbetbe called to retrieve the result of the action. 127*1a2ac6d7SJonathan Corbet 128*1a2ac6d7SJonathan CorbetCCB_INFO 129*1a2ac6d7SJonathan Corbet-------- 130*1a2ac6d7SJonathan Corbet 131*1a2ac6d7SJonathan CorbetRetrieves information about a currently executing CCB. Note that some 132*1a2ac6d7SJonathan CorbetHypervisors might return 'notfound' when the CCB is in 'inprogress' 133*1a2ac6d7SJonathan Corbetstate. To ensure a CCB in the 'notfound' state will never be executed, 134*1a2ac6d7SJonathan CorbetCCB_KILL must be invoked on that CCB. Upon success, read() must be 135*1a2ac6d7SJonathan Corbetcalled to retrieve the details of the action. 136*1a2ac6d7SJonathan Corbet 137*1a2ac6d7SJonathan CorbetSubmission of an array of CCBs for execution 138*1a2ac6d7SJonathan Corbet--------------------------------------------- 139*1a2ac6d7SJonathan Corbet 140*1a2ac6d7SJonathan CorbetA write() whose length is a multiple of the CCB size is treated as a 141*1a2ac6d7SJonathan Corbetsubmit operation. The file offset is treated as the index of the 142*1a2ac6d7SJonathan Corbetcompletion area to use, and may be set via lseek() or using the 143*1a2ac6d7SJonathan Corbetpwrite() system call. If -1 is returned then errno is set to indicate 144*1a2ac6d7SJonathan Corbetthe error. Otherwise, the return value is the length of the array that 145*1a2ac6d7SJonathan Corbetwas actually accepted by the coprocessor. If the accepted length is 146*1a2ac6d7SJonathan Corbetequal to the requested length, then the submission was completely 147*1a2ac6d7SJonathan Corbetsuccessful and there is no further status needed; hence, the user 148*1a2ac6d7SJonathan Corbetshould not subsequently call read(). Partial acceptance of the CCB 149*1a2ac6d7SJonathan Corbetarray is indicated by a return value less than the requested length, 150*1a2ac6d7SJonathan Corbetand read() must be called to retrieve further status information. The 151*1a2ac6d7SJonathan Corbetstatus will reflect the error caused by the first CCB that was not 152*1a2ac6d7SJonathan Corbetaccepted, and status_data will provide additional data in some cases. 153*1a2ac6d7SJonathan Corbet 154*1a2ac6d7SJonathan CorbetMMAP 155*1a2ac6d7SJonathan Corbet---- 156*1a2ac6d7SJonathan Corbet 157*1a2ac6d7SJonathan CorbetThe mmap() function provides access to the completion area allocated 158*1a2ac6d7SJonathan Corbetin the driver. Note that the completion area is not writeable by the 159*1a2ac6d7SJonathan Corbetuser process, and the mmap call must not specify PROT_WRITE. 160*1a2ac6d7SJonathan Corbet 161*1a2ac6d7SJonathan Corbet 162*1a2ac6d7SJonathan CorbetCompletion of a Request 163*1a2ac6d7SJonathan Corbet======================= 164*1a2ac6d7SJonathan Corbet 165*1a2ac6d7SJonathan CorbetThe first byte in each completion area is the command status which is 166*1a2ac6d7SJonathan Corbetupdated by the coprocessor hardware. Software may take advantage of 167*1a2ac6d7SJonathan Corbetnew M7/M8 processor capabilities to efficiently poll this status byte. 168*1a2ac6d7SJonathan CorbetFirst, a "monitored load" is achieved via a Load from Alternate Space 169*1a2ac6d7SJonathan Corbet(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY). Second, a 170*1a2ac6d7SJonathan Corbet"monitored wait" is achieved via the mwait instruction (a write to 171*1a2ac6d7SJonathan Corbet%asr28). This instruction is like pause in that it suspends execution 172*1a2ac6d7SJonathan Corbetof the virtual processor for the given number of nanoseconds, but in 173*1a2ac6d7SJonathan Corbetaddition will terminate early when one of several events occur. If the 174*1a2ac6d7SJonathan Corbetblock of data containing the monitored location is modified, then the 175*1a2ac6d7SJonathan Corbetmwait terminates. This causes software to resume execution immediately 176*1a2ac6d7SJonathan Corbet(without a context switch or kernel to user transition) after a 177*1a2ac6d7SJonathan Corbettransaction completes. Thus the latency between transaction completion 178*1a2ac6d7SJonathan Corbetand resumption of execution may be just a few nanoseconds. 179*1a2ac6d7SJonathan Corbet 180*1a2ac6d7SJonathan Corbet 181*1a2ac6d7SJonathan CorbetApplication Life Cycle of a DAX Submission 182*1a2ac6d7SJonathan Corbet========================================== 183*1a2ac6d7SJonathan Corbet 184*1a2ac6d7SJonathan Corbet - open dax device 185*1a2ac6d7SJonathan Corbet - call mmap() to get the completion area address 186*1a2ac6d7SJonathan Corbet - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc. 187*1a2ac6d7SJonathan Corbet - submit CCB via write() or pwrite() 188*1a2ac6d7SJonathan Corbet - go into a loop executing monitored load + monitored wait and 189*1a2ac6d7SJonathan Corbet terminate when the command status indicates the request is complete 190*1a2ac6d7SJonathan Corbet (CCB_KILL or CCB_INFO may be used any time as necessary) 191*1a2ac6d7SJonathan Corbet - perform a CCB_DEQUEUE 192*1a2ac6d7SJonathan Corbet - call munmap() for completion area 193*1a2ac6d7SJonathan Corbet - close the dax device 194*1a2ac6d7SJonathan Corbet 195*1a2ac6d7SJonathan Corbet 196*1a2ac6d7SJonathan CorbetMemory Constraints 197*1a2ac6d7SJonathan Corbet================== 198*1a2ac6d7SJonathan Corbet 199*1a2ac6d7SJonathan CorbetThe DAX hardware operates only on physical addresses. Therefore, it is 200*1a2ac6d7SJonathan Corbetnot aware of virtual memory mappings and the discontiguities that may 201*1a2ac6d7SJonathan Corbetexist in the physical memory that a virtual buffer maps to. There is 202*1a2ac6d7SJonathan Corbetno I/O TLB or any scatter/gather mechanism. All buffers, whether input 203*1a2ac6d7SJonathan Corbetor output, must reside in a physically contiguous region of memory. 204*1a2ac6d7SJonathan Corbet 205*1a2ac6d7SJonathan CorbetThe Hypervisor translates all addresses within a CCB to physical 206*1a2ac6d7SJonathan Corbetbefore handing off the CCB to DAX. The Hypervisor determines the 207*1a2ac6d7SJonathan Corbetvirtual page size for each virtual address given, and uses this to 208*1a2ac6d7SJonathan Corbetprogram a size limit for each address. This prevents the coprocessor 209*1a2ac6d7SJonathan Corbetfrom reading or writing beyond the bound of the virtual page, even 210*1a2ac6d7SJonathan Corbetthough it is accessing physical memory directly. A simpler way of 211*1a2ac6d7SJonathan Corbetsaying this is that a DAX operation will never "cross" a virtual page 212*1a2ac6d7SJonathan Corbetboundary. If an 8k virtual page is used, then the data is strictly 213*1a2ac6d7SJonathan Corbetlimited to 8k. If a user's buffer is larger than 8k, then a larger 214*1a2ac6d7SJonathan Corbetpage size must be used, or the transaction size will be truncated to 215*1a2ac6d7SJonathan Corbet8k. 216*1a2ac6d7SJonathan Corbet 217*1a2ac6d7SJonathan CorbetHuge pages. A user may allocate huge pages using standard interfaces. 218*1a2ac6d7SJonathan CorbetMemory buffers residing on huge pages may be used to achieve much 219*1a2ac6d7SJonathan Corbetlarger DAX transaction sizes, but the rules must still be followed, 220*1a2ac6d7SJonathan Corbetand no transaction will cross a page boundary, even a huge page. A 221*1a2ac6d7SJonathan Corbetmajor caveat is that Linux on Sparc presents 8Mb as one of the huge 222*1a2ac6d7SJonathan Corbetpage sizes. Sparc does not actually provide a 8Mb hardware page size, 223*1a2ac6d7SJonathan Corbetand this size is synthesized by pasting together two 4Mb pages. The 224*1a2ac6d7SJonathan Corbetreasons for this are historical, and it creates an issue because only 225*1a2ac6d7SJonathan Corbethalf of this 8Mb page can actually be used for any given buffer in a 226*1a2ac6d7SJonathan CorbetDAX request, and it must be either the first half or the second half; 227*1a2ac6d7SJonathan Corbetit cannot be a 4Mb chunk in the middle, since that crosses a 228*1a2ac6d7SJonathan Corbet(hardware) page boundary. Note that this entire issue may be hidden by 229*1a2ac6d7SJonathan Corbethigher level libraries. 230*1a2ac6d7SJonathan Corbet 231*1a2ac6d7SJonathan Corbet 232*1a2ac6d7SJonathan CorbetCCB Structure 233*1a2ac6d7SJonathan Corbet------------- 234*1a2ac6d7SJonathan CorbetA CCB is an array of 8 64-bit words. Several of these words provide 235*1a2ac6d7SJonathan Corbetcommand opcodes, parameters, flags, etc., and the rest are addresses 236*1a2ac6d7SJonathan Corbetfor the completion area, output buffer, and various inputs:: 237*1a2ac6d7SJonathan Corbet 238*1a2ac6d7SJonathan Corbet struct ccb { 239*1a2ac6d7SJonathan Corbet u64 control; 240*1a2ac6d7SJonathan Corbet u64 completion; 241*1a2ac6d7SJonathan Corbet u64 input0; 242*1a2ac6d7SJonathan Corbet u64 access; 243*1a2ac6d7SJonathan Corbet u64 input1; 244*1a2ac6d7SJonathan Corbet u64 op_data; 245*1a2ac6d7SJonathan Corbet u64 output; 246*1a2ac6d7SJonathan Corbet u64 table; 247*1a2ac6d7SJonathan Corbet }; 248*1a2ac6d7SJonathan Corbet 249*1a2ac6d7SJonathan CorbetSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of 250*1a2ac6d7SJonathan Corbeteach of these fields, and see dax-hv-api.txt for a complete description 251*1a2ac6d7SJonathan Corbetof the Hypervisor API available to the guest OS (ie, Linux kernel). 252*1a2ac6d7SJonathan Corbet 253*1a2ac6d7SJonathan CorbetThe first word (control) is examined by the driver for the following: 254*1a2ac6d7SJonathan Corbet - CCB version, which must be consistent with hardware version 255*1a2ac6d7SJonathan Corbet - Opcode, which must be one of the documented allowable commands 256*1a2ac6d7SJonathan Corbet - Address types, which must be set to "virtual" for all the addresses 257*1a2ac6d7SJonathan Corbet given by the user, thereby ensuring that the application can 258*1a2ac6d7SJonathan Corbet only access memory that it owns 259*1a2ac6d7SJonathan Corbet 260*1a2ac6d7SJonathan Corbet 261*1a2ac6d7SJonathan CorbetExample Code 262*1a2ac6d7SJonathan Corbet============ 263*1a2ac6d7SJonathan Corbet 264*1a2ac6d7SJonathan CorbetThe DAX is accessible to both user and kernel code. The kernel code 265*1a2ac6d7SJonathan Corbetcan make hypercalls directly while the user code must use wrappers 266*1a2ac6d7SJonathan Corbetprovided by the driver. The setup of the CCB is nearly identical for 267*1a2ac6d7SJonathan Corbetboth; the only difference is in preparation of the completion area. An 268*1a2ac6d7SJonathan Corbetexample of user code is given now, with kernel code afterwards. 269*1a2ac6d7SJonathan Corbet 270*1a2ac6d7SJonathan CorbetIn order to program using the driver API, the file 271*1a2ac6d7SJonathan Corbetarch/sparc/include/uapi/asm/oradax.h must be included. 272*1a2ac6d7SJonathan Corbet 273*1a2ac6d7SJonathan CorbetFirst, the proper device must be opened. For M7 it will be 274*1a2ac6d7SJonathan Corbet/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest 275*1a2ac6d7SJonathan Corbetprocedure is to attempt to open both, as only one will succeed:: 276*1a2ac6d7SJonathan Corbet 277*1a2ac6d7SJonathan Corbet fd = open("/dev/oradax1", O_RDWR); 278*1a2ac6d7SJonathan Corbet if (fd < 0) 279*1a2ac6d7SJonathan Corbet fd = open("/dev/oradax2", O_RDWR); 280*1a2ac6d7SJonathan Corbet if (fd < 0) 281*1a2ac6d7SJonathan Corbet /* No DAX found */ 282*1a2ac6d7SJonathan Corbet 283*1a2ac6d7SJonathan CorbetNext, the completion area must be mapped:: 284*1a2ac6d7SJonathan Corbet 285*1a2ac6d7SJonathan Corbet completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0); 286*1a2ac6d7SJonathan Corbet 287*1a2ac6d7SJonathan CorbetAll input and output buffers must be fully contained in one hardware 288*1a2ac6d7SJonathan Corbetpage, since as explained above, the DAX is strictly constrained by 289*1a2ac6d7SJonathan Corbetvirtual page boundaries. In addition, the output buffer must be 290*1a2ac6d7SJonathan Corbet64-byte aligned and its size must be a multiple of 64 bytes because 291*1a2ac6d7SJonathan Corbetthe coprocessor writes in units of cache lines. 292*1a2ac6d7SJonathan Corbet 293*1a2ac6d7SJonathan CorbetThis example demonstrates the DAX Scan command, which takes as input a 294*1a2ac6d7SJonathan Corbetvector and a match value, and produces a bitmap as the output. For 295*1a2ac6d7SJonathan Corbeteach input element that matches the value, the corresponding bit is 296*1a2ac6d7SJonathan Corbetset in the output. 297*1a2ac6d7SJonathan Corbet 298*1a2ac6d7SJonathan CorbetIn this example, the input vector consists of a series of single bits, 299*1a2ac6d7SJonathan Corbetand the match value is 0. So each 0 bit in the input will produce a 1 300*1a2ac6d7SJonathan Corbetin the output, and vice versa, which produces an output bitmap which 301*1a2ac6d7SJonathan Corbetis the input bitmap inverted. 302*1a2ac6d7SJonathan Corbet 303*1a2ac6d7SJonathan CorbetFor details of all the parameters and bits used in this CCB, please 304*1a2ac6d7SJonathan Corbetrefer to section 36.2.1.3 of the DAX Hypervisor API document, which 305*1a2ac6d7SJonathan Corbetdescribes the Scan command in detail:: 306*1a2ac6d7SJonathan Corbet 307*1a2ac6d7SJonathan Corbet ccb->control = /* Table 36.1, CCB Header Format */ 308*1a2ac6d7SJonathan Corbet (2L << 48) /* command = Scan Value */ 309*1a2ac6d7SJonathan Corbet | (3L << 40) /* output address type = primary virtual */ 310*1a2ac6d7SJonathan Corbet | (3L << 34) /* primary input address type = primary virtual */ 311*1a2ac6d7SJonathan Corbet /* Section 36.2.1, Query CCB Command Formats */ 312*1a2ac6d7SJonathan Corbet | (1 << 28) /* 36.2.1.1.1 primary input format = fixed width bit packed */ 313*1a2ac6d7SJonathan Corbet | (0 << 23) /* 36.2.1.1.2 primary input element size = 0 (1 bit) */ 314*1a2ac6d7SJonathan Corbet | (8 << 10) /* 36.2.1.1.6 output format = bit vector */ 315*1a2ac6d7SJonathan Corbet | (0 << 5) /* 36.2.1.3 First scan criteria size = 0 (1 byte) */ 316*1a2ac6d7SJonathan Corbet | (31 << 0); /* 36.2.1.3 Disable second scan criteria */ 317*1a2ac6d7SJonathan Corbet 318*1a2ac6d7SJonathan Corbet ccb->completion = 0; /* Completion area address, to be filled in by driver */ 319*1a2ac6d7SJonathan Corbet 320*1a2ac6d7SJonathan Corbet ccb->input0 = (unsigned long) input; /* primary input address */ 321*1a2ac6d7SJonathan Corbet 322*1a2ac6d7SJonathan Corbet ccb->access = /* Section 36.2.1.2, Data Access Control */ 323*1a2ac6d7SJonathan Corbet (2 << 24) /* Primary input length format = bits */ 324*1a2ac6d7SJonathan Corbet | (nbits - 1); /* number of bits in primary input stream, minus 1 */ 325*1a2ac6d7SJonathan Corbet 326*1a2ac6d7SJonathan Corbet ccb->input1 = 0; /* secondary input address, unused */ 327*1a2ac6d7SJonathan Corbet 328*1a2ac6d7SJonathan Corbet ccb->op_data = 0; /* scan criteria (value to be matched) */ 329*1a2ac6d7SJonathan Corbet 330*1a2ac6d7SJonathan Corbet ccb->output = (unsigned long) output; /* output address */ 331*1a2ac6d7SJonathan Corbet 332*1a2ac6d7SJonathan Corbet ccb->table = 0; /* table address, unused */ 333*1a2ac6d7SJonathan Corbet 334*1a2ac6d7SJonathan CorbetThe CCB submission is a write() or pwrite() system call to the 335*1a2ac6d7SJonathan Corbetdriver. If the call fails, then a read() must be used to retrieve the 336*1a2ac6d7SJonathan Corbetstatus:: 337*1a2ac6d7SJonathan Corbet 338*1a2ac6d7SJonathan Corbet if (pwrite(fd, ccb, 64, 0) != 64) { 339*1a2ac6d7SJonathan Corbet struct ccb_exec_result status; 340*1a2ac6d7SJonathan Corbet read(fd, &status, sizeof(status)); 341*1a2ac6d7SJonathan Corbet /* bail out */ 342*1a2ac6d7SJonathan Corbet } 343*1a2ac6d7SJonathan Corbet 344*1a2ac6d7SJonathan CorbetAfter a successful submission of the CCB, the completion area may be 345*1a2ac6d7SJonathan Corbetpolled to determine when the DAX is finished. Detailed information on 346*1a2ac6d7SJonathan Corbetthe contents of the completion area can be found in section 36.2.2 of 347*1a2ac6d7SJonathan Corbetthe DAX HV API document:: 348*1a2ac6d7SJonathan Corbet 349*1a2ac6d7SJonathan Corbet while (1) { 350*1a2ac6d7SJonathan Corbet /* Monitored Load */ 351*1a2ac6d7SJonathan Corbet __asm__ __volatile__("lduba [%1] 0x84, %0\n" 352*1a2ac6d7SJonathan Corbet : "=r" (status) 353*1a2ac6d7SJonathan Corbet : "r" (completion_area)); 354*1a2ac6d7SJonathan Corbet 355*1a2ac6d7SJonathan Corbet if (status) /* 0 indicates command in progress */ 356*1a2ac6d7SJonathan Corbet break; 357*1a2ac6d7SJonathan Corbet 358*1a2ac6d7SJonathan Corbet /* MWAIT */ 359*1a2ac6d7SJonathan Corbet __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 360*1a2ac6d7SJonathan Corbet } 361*1a2ac6d7SJonathan Corbet 362*1a2ac6d7SJonathan CorbetA completion area status of 1 indicates successful completion of the 363*1a2ac6d7SJonathan CorbetCCB and validity of the output bitmap, which may be used immediately. 364*1a2ac6d7SJonathan CorbetAll other non-zero values indicate error conditions which are 365*1a2ac6d7SJonathan Corbetdescribed in section 36.2.2:: 366*1a2ac6d7SJonathan Corbet 367*1a2ac6d7SJonathan Corbet if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 368*1a2ac6d7SJonathan Corbet /* completion_area[0] contains the completion status */ 369*1a2ac6d7SJonathan Corbet /* completion_area[1] contains an error code, see 36.2.2 */ 370*1a2ac6d7SJonathan Corbet } 371*1a2ac6d7SJonathan Corbet 372*1a2ac6d7SJonathan CorbetAfter the completion area has been processed, the driver must be 373*1a2ac6d7SJonathan Corbetnotified that it can release any resources associated with the 374*1a2ac6d7SJonathan Corbetrequest. This is done via the dequeue operation:: 375*1a2ac6d7SJonathan Corbet 376*1a2ac6d7SJonathan Corbet struct dax_command cmd; 377*1a2ac6d7SJonathan Corbet cmd.command = CCB_DEQUEUE; 378*1a2ac6d7SJonathan Corbet if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) { 379*1a2ac6d7SJonathan Corbet /* bail out */ 380*1a2ac6d7SJonathan Corbet } 381*1a2ac6d7SJonathan Corbet 382*1a2ac6d7SJonathan CorbetFinally, normal program cleanup should be done, i.e., unmapping 383*1a2ac6d7SJonathan Corbetcompletion area, closing the dax device, freeing memory etc. 384*1a2ac6d7SJonathan Corbet 385*1a2ac6d7SJonathan CorbetKernel example 386*1a2ac6d7SJonathan Corbet-------------- 387*1a2ac6d7SJonathan Corbet 388*1a2ac6d7SJonathan CorbetThe only difference in using the DAX in kernel code is the treatment 389*1a2ac6d7SJonathan Corbetof the completion area. Unlike user applications which mmap the 390*1a2ac6d7SJonathan Corbetcompletion area allocated by the driver, kernel code must allocate its 391*1a2ac6d7SJonathan Corbetown memory to use for the completion area, and this address and its 392*1a2ac6d7SJonathan Corbettype must be given in the CCB:: 393*1a2ac6d7SJonathan Corbet 394*1a2ac6d7SJonathan Corbet ccb->control |= /* Table 36.1, CCB Header Format */ 395*1a2ac6d7SJonathan Corbet (3L << 32); /* completion area address type = primary virtual */ 396*1a2ac6d7SJonathan Corbet 397*1a2ac6d7SJonathan Corbet ccb->completion = (unsigned long) completion_area; /* Completion area address */ 398*1a2ac6d7SJonathan Corbet 399*1a2ac6d7SJonathan CorbetThe dax submit hypercall is made directly. The flags used in the 400*1a2ac6d7SJonathan Corbetccb_submit call are documented in the DAX HV API in section 36.3.1/ 401*1a2ac6d7SJonathan Corbet 402*1a2ac6d7SJonathan Corbet:: 403*1a2ac6d7SJonathan Corbet 404*1a2ac6d7SJonathan Corbet #include <asm/hypervisor.h> 405*1a2ac6d7SJonathan Corbet 406*1a2ac6d7SJonathan Corbet hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64, 407*1a2ac6d7SJonathan Corbet HV_CCB_QUERY_CMD | 408*1a2ac6d7SJonathan Corbet HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY | 409*1a2ac6d7SJonathan Corbet HV_CCB_VA_PRIVILEGED, 410*1a2ac6d7SJonathan Corbet 0, &bytes_accepted, &status_data); 411*1a2ac6d7SJonathan Corbet 412*1a2ac6d7SJonathan Corbet if (hv_rv != HV_EOK) { 413*1a2ac6d7SJonathan Corbet /* hv_rv is an error code, status_data contains */ 414*1a2ac6d7SJonathan Corbet /* potential additional status, see 36.3.1.1 */ 415*1a2ac6d7SJonathan Corbet } 416*1a2ac6d7SJonathan Corbet 417*1a2ac6d7SJonathan CorbetAfter the submission, the completion area polling code is identical to 418*1a2ac6d7SJonathan Corbetthat in user land:: 419*1a2ac6d7SJonathan Corbet 420*1a2ac6d7SJonathan Corbet while (1) { 421*1a2ac6d7SJonathan Corbet /* Monitored Load */ 422*1a2ac6d7SJonathan Corbet __asm__ __volatile__("lduba [%1] 0x84, %0\n" 423*1a2ac6d7SJonathan Corbet : "=r" (status) 424*1a2ac6d7SJonathan Corbet : "r" (completion_area)); 425*1a2ac6d7SJonathan Corbet 426*1a2ac6d7SJonathan Corbet if (status) /* 0 indicates command in progress */ 427*1a2ac6d7SJonathan Corbet break; 428*1a2ac6d7SJonathan Corbet 429*1a2ac6d7SJonathan Corbet /* MWAIT */ 430*1a2ac6d7SJonathan Corbet __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 431*1a2ac6d7SJonathan Corbet } 432*1a2ac6d7SJonathan Corbet 433*1a2ac6d7SJonathan Corbet if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 434*1a2ac6d7SJonathan Corbet /* completion_area[0] contains the completion status */ 435*1a2ac6d7SJonathan Corbet /* completion_area[1] contains an error code, see 36.2.2 */ 436*1a2ac6d7SJonathan Corbet } 437*1a2ac6d7SJonathan Corbet 438*1a2ac6d7SJonathan CorbetThe output bitmap is ready for consumption immediately after the 439*1a2ac6d7SJonathan Corbetcompletion status indicates success. 440*1a2ac6d7SJonathan Corbet 441*1a2ac6d7SJonathan CorbetExcer[t from UltraSPARC Virtual Machine Specification 442*1a2ac6d7SJonathan Corbet===================================================== 443*1a2ac6d7SJonathan Corbet 444*1a2ac6d7SJonathan Corbet .. include:: dax-hv-api.txt 445*1a2ac6d7SJonathan Corbet :literal: 446