xref: /linux/Documentation/arch/sparc/oradax/oracle-dax.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1*1a2ac6d7SJonathan Corbet=======================================
2*1a2ac6d7SJonathan CorbetOracle Data Analytics Accelerator (DAX)
3*1a2ac6d7SJonathan Corbet=======================================
4*1a2ac6d7SJonathan Corbet
5*1a2ac6d7SJonathan CorbetDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
6*1a2ac6d7SJonathan Corbet(DAX2) processor chips, and has direct access to the CPU's L3 caches
7*1a2ac6d7SJonathan Corbetas well as physical memory. It can perform several operations on data
8*1a2ac6d7SJonathan Corbetstreams with various input and output formats.  A driver provides a
9*1a2ac6d7SJonathan Corbettransport mechanism and has limited knowledge of the various opcodes
10*1a2ac6d7SJonathan Corbetand data formats. A user space library provides high level services
11*1a2ac6d7SJonathan Corbetand translates these into low level commands which are then passed
12*1a2ac6d7SJonathan Corbetinto the driver and subsequently the Hypervisor and the coprocessor.
13*1a2ac6d7SJonathan CorbetThe library is the recommended way for applications to use the
14*1a2ac6d7SJonathan Corbetcoprocessor, and the driver interface is not intended for general use.
15*1a2ac6d7SJonathan CorbetThis document describes the general flow of the driver, its
16*1a2ac6d7SJonathan Corbetstructures, and its programmatic interface. It also provides example
17*1a2ac6d7SJonathan Corbetcode sufficient to write user or kernel applications that use DAX
18*1a2ac6d7SJonathan Corbetfunctionality.
19*1a2ac6d7SJonathan Corbet
20*1a2ac6d7SJonathan CorbetThe user library is open source and available at:
21*1a2ac6d7SJonathan Corbet
22*1a2ac6d7SJonathan Corbet    https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
23*1a2ac6d7SJonathan Corbet
24*1a2ac6d7SJonathan CorbetThe Hypervisor interface to the coprocessor is described in detail in
25*1a2ac6d7SJonathan Corbetthe accompanying document, dax-hv-api.txt, which is a plain text
26*1a2ac6d7SJonathan Corbetexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine
27*1a2ac6d7SJonathan CorbetSpecification" version 3.0.20+15, dated 2017-09-25.
28*1a2ac6d7SJonathan Corbet
29*1a2ac6d7SJonathan Corbet
30*1a2ac6d7SJonathan CorbetHigh Level Overview
31*1a2ac6d7SJonathan Corbet===================
32*1a2ac6d7SJonathan Corbet
33*1a2ac6d7SJonathan CorbetA coprocessor request is described by a Command Control Block
34*1a2ac6d7SJonathan Corbet(CCB). The CCB contains an opcode and various parameters. The opcode
35*1a2ac6d7SJonathan Corbetspecifies what operation is to be done, and the parameters specify
36*1a2ac6d7SJonathan Corbetoptions, flags, sizes, and addresses.  The CCB (or an array of CCBs)
37*1a2ac6d7SJonathan Corbetis passed to the Hypervisor, which handles queueing and scheduling of
38*1a2ac6d7SJonathan Corbetrequests to the available coprocessor execution units. A status code
39*1a2ac6d7SJonathan Corbetreturned indicates if the request was submitted successfully or if
40*1a2ac6d7SJonathan Corbetthere was an error.  One of the addresses given in each CCB is a
41*1a2ac6d7SJonathan Corbetpointer to a "completion area", which is a 128 byte memory block that
42*1a2ac6d7SJonathan Corbetis written by the coprocessor to provide execution status. No
43*1a2ac6d7SJonathan Corbetinterrupt is generated upon completion; the completion area must be
44*1a2ac6d7SJonathan Corbetpolled by software to find out when a transaction has finished, but
45*1a2ac6d7SJonathan Corbetthe M7 and later processors provide a mechanism to pause the virtual
46*1a2ac6d7SJonathan Corbetprocessor until the completion status has been updated by the
47*1a2ac6d7SJonathan Corbetcoprocessor. This is done using the monitored load and mwait
48*1a2ac6d7SJonathan Corbetinstructions, which are described in more detail later.  The DAX
49*1a2ac6d7SJonathan Corbetcoprocessor was designed so that after a request is submitted, the
50*1a2ac6d7SJonathan Corbetkernel is no longer involved in the processing of it.  The polling is
51*1a2ac6d7SJonathan Corbetdone at the user level, which results in almost zero latency between
52*1a2ac6d7SJonathan Corbetcompletion of a request and resumption of execution of the requesting
53*1a2ac6d7SJonathan Corbetthread.
54*1a2ac6d7SJonathan Corbet
55*1a2ac6d7SJonathan Corbet
56*1a2ac6d7SJonathan CorbetAddressing Memory
57*1a2ac6d7SJonathan Corbet=================
58*1a2ac6d7SJonathan Corbet
59*1a2ac6d7SJonathan CorbetThe kernel does not have access to physical memory in the Sun4v
60*1a2ac6d7SJonathan Corbetarchitecture, as there is an additional level of memory virtualization
61*1a2ac6d7SJonathan Corbetpresent. This intermediate level is called "real" memory, and the
62*1a2ac6d7SJonathan Corbetkernel treats this as if it were physical.  The Hypervisor handles the
63*1a2ac6d7SJonathan Corbettranslations between real memory and physical so that each logical
64*1a2ac6d7SJonathan Corbetdomain (LDOM) can have a partition of physical memory that is isolated
65*1a2ac6d7SJonathan Corbetfrom that of other LDOMs.  When the kernel sets up a virtual mapping,
66*1a2ac6d7SJonathan Corbetit specifies a virtual address and the real address to which it should
67*1a2ac6d7SJonathan Corbetbe mapped.
68*1a2ac6d7SJonathan Corbet
69*1a2ac6d7SJonathan CorbetThe DAX coprocessor can only operate on physical memory, so before a
70*1a2ac6d7SJonathan Corbetrequest can be fed to the coprocessor, all the addresses in a CCB must
71*1a2ac6d7SJonathan Corbetbe converted into physical addresses. The kernel cannot do this since
72*1a2ac6d7SJonathan Corbetit has no visibility into physical addresses. So a CCB may contain
73*1a2ac6d7SJonathan Corbeteither the virtual or real addresses of the buffers or a combination
74*1a2ac6d7SJonathan Corbetof them. An "address type" field is available for each address that
75*1a2ac6d7SJonathan Corbetmay be given in the CCB. In all cases, the Hypervisor will translate
76*1a2ac6d7SJonathan Corbetall the addresses to physical before dispatching to hardware. Address
77*1a2ac6d7SJonathan Corbettranslations are performed using the context of the process initiating
78*1a2ac6d7SJonathan Corbetthe request.
79*1a2ac6d7SJonathan Corbet
80*1a2ac6d7SJonathan Corbet
81*1a2ac6d7SJonathan CorbetThe Driver API
82*1a2ac6d7SJonathan Corbet==============
83*1a2ac6d7SJonathan Corbet
84*1a2ac6d7SJonathan CorbetAn application makes requests to the driver via the write() system
85*1a2ac6d7SJonathan Corbetcall, and gets results (if any) via read(). The completion areas are
86*1a2ac6d7SJonathan Corbetmade accessible via mmap(), and are read-only for the application.
87*1a2ac6d7SJonathan Corbet
88*1a2ac6d7SJonathan CorbetThe request may either be an immediate command or an array of CCBs to
89*1a2ac6d7SJonathan Corbetbe submitted to the hardware.
90*1a2ac6d7SJonathan Corbet
91*1a2ac6d7SJonathan CorbetEach open instance of the device is exclusive to the thread that
92*1a2ac6d7SJonathan Corbetopened it, and must be used by that thread for all subsequent
93*1a2ac6d7SJonathan Corbetoperations. The driver open function creates a new context for the
94*1a2ac6d7SJonathan Corbetthread and initializes it for use.  This context contains pointers and
95*1a2ac6d7SJonathan Corbetvalues used internally by the driver to keep track of submitted
96*1a2ac6d7SJonathan Corbetrequests. The completion area buffer is also allocated, and this is
97*1a2ac6d7SJonathan Corbetlarge enough to contain the completion areas for many concurrent
98*1a2ac6d7SJonathan Corbetrequests.  When the device is closed, any outstanding transactions are
99*1a2ac6d7SJonathan Corbetflushed and the context is cleaned up.
100*1a2ac6d7SJonathan Corbet
101*1a2ac6d7SJonathan CorbetOn a DAX1 system (M7), the device will be called "oradax1", while on a
102*1a2ac6d7SJonathan CorbetDAX2 system (M8) it will be "oradax2". If an application requires one
103*1a2ac6d7SJonathan Corbetor the other, it should simply attempt to open the appropriate
104*1a2ac6d7SJonathan Corbetdevice. Only one of the devices will exist on any given system, so the
105*1a2ac6d7SJonathan Corbetname can be used to determine what the platform supports.
106*1a2ac6d7SJonathan Corbet
107*1a2ac6d7SJonathan CorbetThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
108*1a2ac6d7SJonathan Corbetall of these, success is indicated by a return value from write()
109*1a2ac6d7SJonathan Corbetequal to the number of bytes given in the call. Otherwise -1 is
110*1a2ac6d7SJonathan Corbetreturned and errno is set.
111*1a2ac6d7SJonathan Corbet
112*1a2ac6d7SJonathan CorbetCCB_DEQUEUE
113*1a2ac6d7SJonathan Corbet-----------
114*1a2ac6d7SJonathan Corbet
115*1a2ac6d7SJonathan CorbetTells the driver to clean up resources associated with past
116*1a2ac6d7SJonathan Corbetrequests. Since no interrupt is generated upon the completion of a
117*1a2ac6d7SJonathan Corbetrequest, the driver must be told when it may reclaim resources.  No
118*1a2ac6d7SJonathan Corbetfurther status information is returned, so the user should not
119*1a2ac6d7SJonathan Corbetsubsequently call read().
120*1a2ac6d7SJonathan Corbet
121*1a2ac6d7SJonathan CorbetCCB_KILL
122*1a2ac6d7SJonathan Corbet--------
123*1a2ac6d7SJonathan Corbet
124*1a2ac6d7SJonathan CorbetKills a CCB during execution. The CCB is guaranteed to not continue
125*1a2ac6d7SJonathan Corbetexecuting once this call returns successfully. On success, read() must
126*1a2ac6d7SJonathan Corbetbe called to retrieve the result of the action.
127*1a2ac6d7SJonathan Corbet
128*1a2ac6d7SJonathan CorbetCCB_INFO
129*1a2ac6d7SJonathan Corbet--------
130*1a2ac6d7SJonathan Corbet
131*1a2ac6d7SJonathan CorbetRetrieves information about a currently executing CCB. Note that some
132*1a2ac6d7SJonathan CorbetHypervisors might return 'notfound' when the CCB is in 'inprogress'
133*1a2ac6d7SJonathan Corbetstate. To ensure a CCB in the 'notfound' state will never be executed,
134*1a2ac6d7SJonathan CorbetCCB_KILL must be invoked on that CCB. Upon success, read() must be
135*1a2ac6d7SJonathan Corbetcalled to retrieve the details of the action.
136*1a2ac6d7SJonathan Corbet
137*1a2ac6d7SJonathan CorbetSubmission of an array of CCBs for execution
138*1a2ac6d7SJonathan Corbet---------------------------------------------
139*1a2ac6d7SJonathan Corbet
140*1a2ac6d7SJonathan CorbetA write() whose length is a multiple of the CCB size is treated as a
141*1a2ac6d7SJonathan Corbetsubmit operation. The file offset is treated as the index of the
142*1a2ac6d7SJonathan Corbetcompletion area to use, and may be set via lseek() or using the
143*1a2ac6d7SJonathan Corbetpwrite() system call. If -1 is returned then errno is set to indicate
144*1a2ac6d7SJonathan Corbetthe error. Otherwise, the return value is the length of the array that
145*1a2ac6d7SJonathan Corbetwas actually accepted by the coprocessor. If the accepted length is
146*1a2ac6d7SJonathan Corbetequal to the requested length, then the submission was completely
147*1a2ac6d7SJonathan Corbetsuccessful and there is no further status needed; hence, the user
148*1a2ac6d7SJonathan Corbetshould not subsequently call read(). Partial acceptance of the CCB
149*1a2ac6d7SJonathan Corbetarray is indicated by a return value less than the requested length,
150*1a2ac6d7SJonathan Corbetand read() must be called to retrieve further status information.  The
151*1a2ac6d7SJonathan Corbetstatus will reflect the error caused by the first CCB that was not
152*1a2ac6d7SJonathan Corbetaccepted, and status_data will provide additional data in some cases.
153*1a2ac6d7SJonathan Corbet
154*1a2ac6d7SJonathan CorbetMMAP
155*1a2ac6d7SJonathan Corbet----
156*1a2ac6d7SJonathan Corbet
157*1a2ac6d7SJonathan CorbetThe mmap() function provides access to the completion area allocated
158*1a2ac6d7SJonathan Corbetin the driver.  Note that the completion area is not writeable by the
159*1a2ac6d7SJonathan Corbetuser process, and the mmap call must not specify PROT_WRITE.
160*1a2ac6d7SJonathan Corbet
161*1a2ac6d7SJonathan Corbet
162*1a2ac6d7SJonathan CorbetCompletion of a Request
163*1a2ac6d7SJonathan Corbet=======================
164*1a2ac6d7SJonathan Corbet
165*1a2ac6d7SJonathan CorbetThe first byte in each completion area is the command status which is
166*1a2ac6d7SJonathan Corbetupdated by the coprocessor hardware. Software may take advantage of
167*1a2ac6d7SJonathan Corbetnew M7/M8 processor capabilities to efficiently poll this status byte.
168*1a2ac6d7SJonathan CorbetFirst, a "monitored load" is achieved via a Load from Alternate Space
169*1a2ac6d7SJonathan Corbet(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY).  Second, a
170*1a2ac6d7SJonathan Corbet"monitored wait" is achieved via the mwait instruction (a write to
171*1a2ac6d7SJonathan Corbet%asr28). This instruction is like pause in that it suspends execution
172*1a2ac6d7SJonathan Corbetof the virtual processor for the given number of nanoseconds, but in
173*1a2ac6d7SJonathan Corbetaddition will terminate early when one of several events occur. If the
174*1a2ac6d7SJonathan Corbetblock of data containing the monitored location is modified, then the
175*1a2ac6d7SJonathan Corbetmwait terminates. This causes software to resume execution immediately
176*1a2ac6d7SJonathan Corbet(without a context switch or kernel to user transition) after a
177*1a2ac6d7SJonathan Corbettransaction completes. Thus the latency between transaction completion
178*1a2ac6d7SJonathan Corbetand resumption of execution may be just a few nanoseconds.
179*1a2ac6d7SJonathan Corbet
180*1a2ac6d7SJonathan Corbet
181*1a2ac6d7SJonathan CorbetApplication Life Cycle of a DAX Submission
182*1a2ac6d7SJonathan Corbet==========================================
183*1a2ac6d7SJonathan Corbet
184*1a2ac6d7SJonathan Corbet - open dax device
185*1a2ac6d7SJonathan Corbet - call mmap() to get the completion area address
186*1a2ac6d7SJonathan Corbet - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
187*1a2ac6d7SJonathan Corbet - submit CCB via write() or pwrite()
188*1a2ac6d7SJonathan Corbet - go into a loop executing monitored load + monitored wait and
189*1a2ac6d7SJonathan Corbet   terminate when the command status indicates the request is complete
190*1a2ac6d7SJonathan Corbet   (CCB_KILL or CCB_INFO may be used any time as necessary)
191*1a2ac6d7SJonathan Corbet - perform a CCB_DEQUEUE
192*1a2ac6d7SJonathan Corbet - call munmap() for completion area
193*1a2ac6d7SJonathan Corbet - close the dax device
194*1a2ac6d7SJonathan Corbet
195*1a2ac6d7SJonathan Corbet
196*1a2ac6d7SJonathan CorbetMemory Constraints
197*1a2ac6d7SJonathan Corbet==================
198*1a2ac6d7SJonathan Corbet
199*1a2ac6d7SJonathan CorbetThe DAX hardware operates only on physical addresses. Therefore, it is
200*1a2ac6d7SJonathan Corbetnot aware of virtual memory mappings and the discontiguities that may
201*1a2ac6d7SJonathan Corbetexist in the physical memory that a virtual buffer maps to. There is
202*1a2ac6d7SJonathan Corbetno I/O TLB or any scatter/gather mechanism. All buffers, whether input
203*1a2ac6d7SJonathan Corbetor output, must reside in a physically contiguous region of memory.
204*1a2ac6d7SJonathan Corbet
205*1a2ac6d7SJonathan CorbetThe Hypervisor translates all addresses within a CCB to physical
206*1a2ac6d7SJonathan Corbetbefore handing off the CCB to DAX. The Hypervisor determines the
207*1a2ac6d7SJonathan Corbetvirtual page size for each virtual address given, and uses this to
208*1a2ac6d7SJonathan Corbetprogram a size limit for each address. This prevents the coprocessor
209*1a2ac6d7SJonathan Corbetfrom reading or writing beyond the bound of the virtual page, even
210*1a2ac6d7SJonathan Corbetthough it is accessing physical memory directly. A simpler way of
211*1a2ac6d7SJonathan Corbetsaying this is that a DAX operation will never "cross" a virtual page
212*1a2ac6d7SJonathan Corbetboundary. If an 8k virtual page is used, then the data is strictly
213*1a2ac6d7SJonathan Corbetlimited to 8k. If a user's buffer is larger than 8k, then a larger
214*1a2ac6d7SJonathan Corbetpage size must be used, or the transaction size will be truncated to
215*1a2ac6d7SJonathan Corbet8k.
216*1a2ac6d7SJonathan Corbet
217*1a2ac6d7SJonathan CorbetHuge pages. A user may allocate huge pages using standard interfaces.
218*1a2ac6d7SJonathan CorbetMemory buffers residing on huge pages may be used to achieve much
219*1a2ac6d7SJonathan Corbetlarger DAX transaction sizes, but the rules must still be followed,
220*1a2ac6d7SJonathan Corbetand no transaction will cross a page boundary, even a huge page.  A
221*1a2ac6d7SJonathan Corbetmajor caveat is that Linux on Sparc presents 8Mb as one of the huge
222*1a2ac6d7SJonathan Corbetpage sizes. Sparc does not actually provide a 8Mb hardware page size,
223*1a2ac6d7SJonathan Corbetand this size is synthesized by pasting together two 4Mb pages. The
224*1a2ac6d7SJonathan Corbetreasons for this are historical, and it creates an issue because only
225*1a2ac6d7SJonathan Corbethalf of this 8Mb page can actually be used for any given buffer in a
226*1a2ac6d7SJonathan CorbetDAX request, and it must be either the first half or the second half;
227*1a2ac6d7SJonathan Corbetit cannot be a 4Mb chunk in the middle, since that crosses a
228*1a2ac6d7SJonathan Corbet(hardware) page boundary. Note that this entire issue may be hidden by
229*1a2ac6d7SJonathan Corbethigher level libraries.
230*1a2ac6d7SJonathan Corbet
231*1a2ac6d7SJonathan Corbet
232*1a2ac6d7SJonathan CorbetCCB Structure
233*1a2ac6d7SJonathan Corbet-------------
234*1a2ac6d7SJonathan CorbetA CCB is an array of 8 64-bit words. Several of these words provide
235*1a2ac6d7SJonathan Corbetcommand opcodes, parameters, flags, etc., and the rest are addresses
236*1a2ac6d7SJonathan Corbetfor the completion area, output buffer, and various inputs::
237*1a2ac6d7SJonathan Corbet
238*1a2ac6d7SJonathan Corbet   struct ccb {
239*1a2ac6d7SJonathan Corbet       u64   control;
240*1a2ac6d7SJonathan Corbet       u64   completion;
241*1a2ac6d7SJonathan Corbet       u64   input0;
242*1a2ac6d7SJonathan Corbet       u64   access;
243*1a2ac6d7SJonathan Corbet       u64   input1;
244*1a2ac6d7SJonathan Corbet       u64   op_data;
245*1a2ac6d7SJonathan Corbet       u64   output;
246*1a2ac6d7SJonathan Corbet       u64   table;
247*1a2ac6d7SJonathan Corbet   };
248*1a2ac6d7SJonathan Corbet
249*1a2ac6d7SJonathan CorbetSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
250*1a2ac6d7SJonathan Corbeteach of these fields, and see dax-hv-api.txt for a complete description
251*1a2ac6d7SJonathan Corbetof the Hypervisor API available to the guest OS (ie, Linux kernel).
252*1a2ac6d7SJonathan Corbet
253*1a2ac6d7SJonathan CorbetThe first word (control) is examined by the driver for the following:
254*1a2ac6d7SJonathan Corbet - CCB version, which must be consistent with hardware version
255*1a2ac6d7SJonathan Corbet - Opcode, which must be one of the documented allowable commands
256*1a2ac6d7SJonathan Corbet - Address types, which must be set to "virtual" for all the addresses
257*1a2ac6d7SJonathan Corbet   given by the user, thereby ensuring that the application can
258*1a2ac6d7SJonathan Corbet   only access memory that it owns
259*1a2ac6d7SJonathan Corbet
260*1a2ac6d7SJonathan Corbet
261*1a2ac6d7SJonathan CorbetExample Code
262*1a2ac6d7SJonathan Corbet============
263*1a2ac6d7SJonathan Corbet
264*1a2ac6d7SJonathan CorbetThe DAX is accessible to both user and kernel code.  The kernel code
265*1a2ac6d7SJonathan Corbetcan make hypercalls directly while the user code must use wrappers
266*1a2ac6d7SJonathan Corbetprovided by the driver. The setup of the CCB is nearly identical for
267*1a2ac6d7SJonathan Corbetboth; the only difference is in preparation of the completion area. An
268*1a2ac6d7SJonathan Corbetexample of user code is given now, with kernel code afterwards.
269*1a2ac6d7SJonathan Corbet
270*1a2ac6d7SJonathan CorbetIn order to program using the driver API, the file
271*1a2ac6d7SJonathan Corbetarch/sparc/include/uapi/asm/oradax.h must be included.
272*1a2ac6d7SJonathan Corbet
273*1a2ac6d7SJonathan CorbetFirst, the proper device must be opened. For M7 it will be
274*1a2ac6d7SJonathan Corbet/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
275*1a2ac6d7SJonathan Corbetprocedure is to attempt to open both, as only one will succeed::
276*1a2ac6d7SJonathan Corbet
277*1a2ac6d7SJonathan Corbet	fd = open("/dev/oradax1", O_RDWR);
278*1a2ac6d7SJonathan Corbet	if (fd < 0)
279*1a2ac6d7SJonathan Corbet		fd = open("/dev/oradax2", O_RDWR);
280*1a2ac6d7SJonathan Corbet	if (fd < 0)
281*1a2ac6d7SJonathan Corbet	       /* No DAX found */
282*1a2ac6d7SJonathan Corbet
283*1a2ac6d7SJonathan CorbetNext, the completion area must be mapped::
284*1a2ac6d7SJonathan Corbet
285*1a2ac6d7SJonathan Corbet      completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
286*1a2ac6d7SJonathan Corbet
287*1a2ac6d7SJonathan CorbetAll input and output buffers must be fully contained in one hardware
288*1a2ac6d7SJonathan Corbetpage, since as explained above, the DAX is strictly constrained by
289*1a2ac6d7SJonathan Corbetvirtual page boundaries.  In addition, the output buffer must be
290*1a2ac6d7SJonathan Corbet64-byte aligned and its size must be a multiple of 64 bytes because
291*1a2ac6d7SJonathan Corbetthe coprocessor writes in units of cache lines.
292*1a2ac6d7SJonathan Corbet
293*1a2ac6d7SJonathan CorbetThis example demonstrates the DAX Scan command, which takes as input a
294*1a2ac6d7SJonathan Corbetvector and a match value, and produces a bitmap as the output. For
295*1a2ac6d7SJonathan Corbeteach input element that matches the value, the corresponding bit is
296*1a2ac6d7SJonathan Corbetset in the output.
297*1a2ac6d7SJonathan Corbet
298*1a2ac6d7SJonathan CorbetIn this example, the input vector consists of a series of single bits,
299*1a2ac6d7SJonathan Corbetand the match value is 0. So each 0 bit in the input will produce a 1
300*1a2ac6d7SJonathan Corbetin the output, and vice versa, which produces an output bitmap which
301*1a2ac6d7SJonathan Corbetis the input bitmap inverted.
302*1a2ac6d7SJonathan Corbet
303*1a2ac6d7SJonathan CorbetFor details of all the parameters and bits used in this CCB, please
304*1a2ac6d7SJonathan Corbetrefer to section 36.2.1.3 of the DAX Hypervisor API document, which
305*1a2ac6d7SJonathan Corbetdescribes the Scan command in detail::
306*1a2ac6d7SJonathan Corbet
307*1a2ac6d7SJonathan Corbet	ccb->control =       /* Table 36.1, CCB Header Format */
308*1a2ac6d7SJonathan Corbet		  (2L << 48)     /* command = Scan Value */
309*1a2ac6d7SJonathan Corbet		| (3L << 40)     /* output address type = primary virtual */
310*1a2ac6d7SJonathan Corbet		| (3L << 34)     /* primary input address type = primary virtual */
311*1a2ac6d7SJonathan Corbet		             /* Section 36.2.1, Query CCB Command Formats */
312*1a2ac6d7SJonathan Corbet		| (1 << 28)     /* 36.2.1.1.1 primary input format = fixed width bit packed */
313*1a2ac6d7SJonathan Corbet		| (0 << 23)     /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
314*1a2ac6d7SJonathan Corbet		| (8 << 10)     /* 36.2.1.1.6 output format = bit vector */
315*1a2ac6d7SJonathan Corbet		| (0 <<  5)	/* 36.2.1.3 First scan criteria size = 0 (1 byte) */
316*1a2ac6d7SJonathan Corbet		| (31 << 0);	/* 36.2.1.3 Disable second scan criteria */
317*1a2ac6d7SJonathan Corbet
318*1a2ac6d7SJonathan Corbet	ccb->completion = 0;    /* Completion area address, to be filled in by driver */
319*1a2ac6d7SJonathan Corbet
320*1a2ac6d7SJonathan Corbet	ccb->input0 = (unsigned long) input; /* primary input address */
321*1a2ac6d7SJonathan Corbet
322*1a2ac6d7SJonathan Corbet	ccb->access =       /* Section 36.2.1.2, Data Access Control */
323*1a2ac6d7SJonathan Corbet		  (2 << 24)    /* Primary input length format = bits */
324*1a2ac6d7SJonathan Corbet		| (nbits - 1); /* number of bits in primary input stream, minus 1 */
325*1a2ac6d7SJonathan Corbet
326*1a2ac6d7SJonathan Corbet	ccb->input1 = 0;       /* secondary input address, unused */
327*1a2ac6d7SJonathan Corbet
328*1a2ac6d7SJonathan Corbet	ccb->op_data = 0;      /* scan criteria (value to be matched) */
329*1a2ac6d7SJonathan Corbet
330*1a2ac6d7SJonathan Corbet	ccb->output = (unsigned long) output;	/* output address */
331*1a2ac6d7SJonathan Corbet
332*1a2ac6d7SJonathan Corbet	ccb->table = 0;	       /* table address, unused */
333*1a2ac6d7SJonathan Corbet
334*1a2ac6d7SJonathan CorbetThe CCB submission is a write() or pwrite() system call to the
335*1a2ac6d7SJonathan Corbetdriver. If the call fails, then a read() must be used to retrieve the
336*1a2ac6d7SJonathan Corbetstatus::
337*1a2ac6d7SJonathan Corbet
338*1a2ac6d7SJonathan Corbet	if (pwrite(fd, ccb, 64, 0) != 64) {
339*1a2ac6d7SJonathan Corbet		struct ccb_exec_result status;
340*1a2ac6d7SJonathan Corbet		read(fd, &status, sizeof(status));
341*1a2ac6d7SJonathan Corbet		/* bail out */
342*1a2ac6d7SJonathan Corbet	}
343*1a2ac6d7SJonathan Corbet
344*1a2ac6d7SJonathan CorbetAfter a successful submission of the CCB, the completion area may be
345*1a2ac6d7SJonathan Corbetpolled to determine when the DAX is finished. Detailed information on
346*1a2ac6d7SJonathan Corbetthe contents of the completion area can be found in section 36.2.2 of
347*1a2ac6d7SJonathan Corbetthe DAX HV API document::
348*1a2ac6d7SJonathan Corbet
349*1a2ac6d7SJonathan Corbet	while (1) {
350*1a2ac6d7SJonathan Corbet		/* Monitored Load */
351*1a2ac6d7SJonathan Corbet		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
352*1a2ac6d7SJonathan Corbet				     : "=r" (status)
353*1a2ac6d7SJonathan Corbet				     : "r"  (completion_area));
354*1a2ac6d7SJonathan Corbet
355*1a2ac6d7SJonathan Corbet		if (status)	     /* 0 indicates command in progress */
356*1a2ac6d7SJonathan Corbet			break;
357*1a2ac6d7SJonathan Corbet
358*1a2ac6d7SJonathan Corbet		/* MWAIT */
359*1a2ac6d7SJonathan Corbet		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
360*1a2ac6d7SJonathan Corbet	}
361*1a2ac6d7SJonathan Corbet
362*1a2ac6d7SJonathan CorbetA completion area status of 1 indicates successful completion of the
363*1a2ac6d7SJonathan CorbetCCB and validity of the output bitmap, which may be used immediately.
364*1a2ac6d7SJonathan CorbetAll other non-zero values indicate error conditions which are
365*1a2ac6d7SJonathan Corbetdescribed in section 36.2.2::
366*1a2ac6d7SJonathan Corbet
367*1a2ac6d7SJonathan Corbet	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
368*1a2ac6d7SJonathan Corbet		/* completion_area[0] contains the completion status */
369*1a2ac6d7SJonathan Corbet		/* completion_area[1] contains an error code, see 36.2.2 */
370*1a2ac6d7SJonathan Corbet	}
371*1a2ac6d7SJonathan Corbet
372*1a2ac6d7SJonathan CorbetAfter the completion area has been processed, the driver must be
373*1a2ac6d7SJonathan Corbetnotified that it can release any resources associated with the
374*1a2ac6d7SJonathan Corbetrequest. This is done via the dequeue operation::
375*1a2ac6d7SJonathan Corbet
376*1a2ac6d7SJonathan Corbet	struct dax_command cmd;
377*1a2ac6d7SJonathan Corbet	cmd.command = CCB_DEQUEUE;
378*1a2ac6d7SJonathan Corbet	if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
379*1a2ac6d7SJonathan Corbet		/* bail out */
380*1a2ac6d7SJonathan Corbet	}
381*1a2ac6d7SJonathan Corbet
382*1a2ac6d7SJonathan CorbetFinally, normal program cleanup should be done, i.e., unmapping
383*1a2ac6d7SJonathan Corbetcompletion area, closing the dax device, freeing memory etc.
384*1a2ac6d7SJonathan Corbet
385*1a2ac6d7SJonathan CorbetKernel example
386*1a2ac6d7SJonathan Corbet--------------
387*1a2ac6d7SJonathan Corbet
388*1a2ac6d7SJonathan CorbetThe only difference in using the DAX in kernel code is the treatment
389*1a2ac6d7SJonathan Corbetof the completion area. Unlike user applications which mmap the
390*1a2ac6d7SJonathan Corbetcompletion area allocated by the driver, kernel code must allocate its
391*1a2ac6d7SJonathan Corbetown memory to use for the completion area, and this address and its
392*1a2ac6d7SJonathan Corbettype must be given in the CCB::
393*1a2ac6d7SJonathan Corbet
394*1a2ac6d7SJonathan Corbet	ccb->control |=      /* Table 36.1, CCB Header Format */
395*1a2ac6d7SJonathan Corbet	        (3L << 32);     /* completion area address type = primary virtual */
396*1a2ac6d7SJonathan Corbet
397*1a2ac6d7SJonathan Corbet	ccb->completion = (unsigned long) completion_area;   /* Completion area address */
398*1a2ac6d7SJonathan Corbet
399*1a2ac6d7SJonathan CorbetThe dax submit hypercall is made directly. The flags used in the
400*1a2ac6d7SJonathan Corbetccb_submit call are documented in the DAX HV API in section 36.3.1/
401*1a2ac6d7SJonathan Corbet
402*1a2ac6d7SJonathan Corbet::
403*1a2ac6d7SJonathan Corbet
404*1a2ac6d7SJonathan Corbet  #include <asm/hypervisor.h>
405*1a2ac6d7SJonathan Corbet
406*1a2ac6d7SJonathan Corbet	hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
407*1a2ac6d7SJonathan Corbet				 HV_CCB_QUERY_CMD |
408*1a2ac6d7SJonathan Corbet				 HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
409*1a2ac6d7SJonathan Corbet				 HV_CCB_VA_PRIVILEGED,
410*1a2ac6d7SJonathan Corbet				 0, &bytes_accepted, &status_data);
411*1a2ac6d7SJonathan Corbet
412*1a2ac6d7SJonathan Corbet	if (hv_rv != HV_EOK) {
413*1a2ac6d7SJonathan Corbet		/* hv_rv is an error code, status_data contains */
414*1a2ac6d7SJonathan Corbet		/* potential additional status, see 36.3.1.1 */
415*1a2ac6d7SJonathan Corbet	}
416*1a2ac6d7SJonathan Corbet
417*1a2ac6d7SJonathan CorbetAfter the submission, the completion area polling code is identical to
418*1a2ac6d7SJonathan Corbetthat in user land::
419*1a2ac6d7SJonathan Corbet
420*1a2ac6d7SJonathan Corbet	while (1) {
421*1a2ac6d7SJonathan Corbet		/* Monitored Load */
422*1a2ac6d7SJonathan Corbet		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
423*1a2ac6d7SJonathan Corbet				     : "=r" (status)
424*1a2ac6d7SJonathan Corbet				     : "r"  (completion_area));
425*1a2ac6d7SJonathan Corbet
426*1a2ac6d7SJonathan Corbet		if (status)	     /* 0 indicates command in progress */
427*1a2ac6d7SJonathan Corbet			break;
428*1a2ac6d7SJonathan Corbet
429*1a2ac6d7SJonathan Corbet		/* MWAIT */
430*1a2ac6d7SJonathan Corbet		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
431*1a2ac6d7SJonathan Corbet	}
432*1a2ac6d7SJonathan Corbet
433*1a2ac6d7SJonathan Corbet	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
434*1a2ac6d7SJonathan Corbet		/* completion_area[0] contains the completion status */
435*1a2ac6d7SJonathan Corbet		/* completion_area[1] contains an error code, see 36.2.2 */
436*1a2ac6d7SJonathan Corbet	}
437*1a2ac6d7SJonathan Corbet
438*1a2ac6d7SJonathan CorbetThe output bitmap is ready for consumption immediately after the
439*1a2ac6d7SJonathan Corbetcompletion status indicates success.
440*1a2ac6d7SJonathan Corbet
441*1a2ac6d7SJonathan CorbetExcer[t from UltraSPARC Virtual Machine Specification
442*1a2ac6d7SJonathan Corbet=====================================================
443*1a2ac6d7SJonathan Corbet
444*1a2ac6d7SJonathan Corbet .. include:: dax-hv-api.txt
445*1a2ac6d7SJonathan Corbet    :literal:
446