1*56e6d5c0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2*56e6d5c0SMauro Carvalho Chehab 3*56e6d5c0SMauro Carvalho Chehab================================== 4*56e6d5c0SMauro Carvalho Chehabrelay interface (formerly relayfs) 5*56e6d5c0SMauro Carvalho Chehab================================== 6*56e6d5c0SMauro Carvalho Chehab 7*56e6d5c0SMauro Carvalho ChehabThe relay interface provides a means for kernel applications to 8*56e6d5c0SMauro Carvalho Chehabefficiently log and transfer large quantities of data from the kernel 9*56e6d5c0SMauro Carvalho Chehabto userspace via user-defined 'relay channels'. 10*56e6d5c0SMauro Carvalho Chehab 11*56e6d5c0SMauro Carvalho ChehabA 'relay channel' is a kernel->user data relay mechanism implemented 12*56e6d5c0SMauro Carvalho Chehabas a set of per-cpu kernel buffers ('channel buffers'), each 13*56e6d5c0SMauro Carvalho Chehabrepresented as a regular file ('relay file') in user space. Kernel 14*56e6d5c0SMauro Carvalho Chehabclients write into the channel buffers using efficient write 15*56e6d5c0SMauro Carvalho Chehabfunctions; these automatically log into the current cpu's channel 16*56e6d5c0SMauro Carvalho Chehabbuffer. User space applications mmap() or read() from the relay files 17*56e6d5c0SMauro Carvalho Chehaband retrieve the data as it becomes available. The relay files 18*56e6d5c0SMauro Carvalho Chehabthemselves are files created in a host filesystem, e.g. debugfs, and 19*56e6d5c0SMauro Carvalho Chehabare associated with the channel buffers using the API described below. 20*56e6d5c0SMauro Carvalho Chehab 21*56e6d5c0SMauro Carvalho ChehabThe format of the data logged into the channel buffers is completely 22*56e6d5c0SMauro Carvalho Chehabup to the kernel client; the relay interface does however provide 23*56e6d5c0SMauro Carvalho Chehabhooks which allow kernel clients to impose some structure on the 24*56e6d5c0SMauro Carvalho Chehabbuffer data. The relay interface doesn't implement any form of data 25*56e6d5c0SMauro Carvalho Chehabfiltering - this also is left to the kernel client. The purpose is to 26*56e6d5c0SMauro Carvalho Chehabkeep things as simple as possible. 27*56e6d5c0SMauro Carvalho Chehab 28*56e6d5c0SMauro Carvalho ChehabThis document provides an overview of the relay interface API. The 29*56e6d5c0SMauro Carvalho Chehabdetails of the function parameters are documented along with the 30*56e6d5c0SMauro Carvalho Chehabfunctions in the relay interface code - please see that for details. 31*56e6d5c0SMauro Carvalho Chehab 32*56e6d5c0SMauro Carvalho ChehabSemantics 33*56e6d5c0SMauro Carvalho Chehab========= 34*56e6d5c0SMauro Carvalho Chehab 35*56e6d5c0SMauro Carvalho ChehabEach relay channel has one buffer per CPU, each buffer has one or more 36*56e6d5c0SMauro Carvalho Chehabsub-buffers. Messages are written to the first sub-buffer until it is 37*56e6d5c0SMauro Carvalho Chehabtoo full to contain a new message, in which case it is written to 38*56e6d5c0SMauro Carvalho Chehabthe next (if available). Messages are never split across sub-buffers. 39*56e6d5c0SMauro Carvalho ChehabAt this point, userspace can be notified so it empties the first 40*56e6d5c0SMauro Carvalho Chehabsub-buffer, while the kernel continues writing to the next. 41*56e6d5c0SMauro Carvalho Chehab 42*56e6d5c0SMauro Carvalho ChehabWhen notified that a sub-buffer is full, the kernel knows how many 43*56e6d5c0SMauro Carvalho Chehabbytes of it are padding i.e. unused space occurring because a complete 44*56e6d5c0SMauro Carvalho Chehabmessage couldn't fit into a sub-buffer. Userspace can use this 45*56e6d5c0SMauro Carvalho Chehabknowledge to copy only valid data. 46*56e6d5c0SMauro Carvalho Chehab 47*56e6d5c0SMauro Carvalho ChehabAfter copying it, userspace can notify the kernel that a sub-buffer 48*56e6d5c0SMauro Carvalho Chehabhas been consumed. 49*56e6d5c0SMauro Carvalho Chehab 50*56e6d5c0SMauro Carvalho ChehabA relay channel can operate in a mode where it will overwrite data not 51*56e6d5c0SMauro Carvalho Chehabyet collected by userspace, and not wait for it to be consumed. 52*56e6d5c0SMauro Carvalho Chehab 53*56e6d5c0SMauro Carvalho ChehabThe relay channel itself does not provide for communication of such 54*56e6d5c0SMauro Carvalho Chehabdata between userspace and kernel, allowing the kernel side to remain 55*56e6d5c0SMauro Carvalho Chehabsimple and not impose a single interface on userspace. It does 56*56e6d5c0SMauro Carvalho Chehabprovide a set of examples and a separate helper though, described 57*56e6d5c0SMauro Carvalho Chehabbelow. 58*56e6d5c0SMauro Carvalho Chehab 59*56e6d5c0SMauro Carvalho ChehabThe read() interface both removes padding and internally consumes the 60*56e6d5c0SMauro Carvalho Chehabread sub-buffers; thus in cases where read(2) is being used to drain 61*56e6d5c0SMauro Carvalho Chehabthe channel buffers, special-purpose communication between kernel and 62*56e6d5c0SMauro Carvalho Chehabuser isn't necessary for basic operation. 63*56e6d5c0SMauro Carvalho Chehab 64*56e6d5c0SMauro Carvalho ChehabOne of the major goals of the relay interface is to provide a low 65*56e6d5c0SMauro Carvalho Chehaboverhead mechanism for conveying kernel data to userspace. While the 66*56e6d5c0SMauro Carvalho Chehabread() interface is easy to use, it's not as efficient as the mmap() 67*56e6d5c0SMauro Carvalho Chehabapproach; the example code attempts to make the tradeoff between the 68*56e6d5c0SMauro Carvalho Chehabtwo approaches as small as possible. 69*56e6d5c0SMauro Carvalho Chehab 70*56e6d5c0SMauro Carvalho Chehabklog and relay-apps example code 71*56e6d5c0SMauro Carvalho Chehab================================ 72*56e6d5c0SMauro Carvalho Chehab 73*56e6d5c0SMauro Carvalho ChehabThe relay interface itself is ready to use, but to make things easier, 74*56e6d5c0SMauro Carvalho Chehaba couple simple utility functions and a set of examples are provided. 75*56e6d5c0SMauro Carvalho Chehab 76*56e6d5c0SMauro Carvalho ChehabThe relay-apps example tarball, available on the relay sourceforge 77*56e6d5c0SMauro Carvalho Chehabsite, contains a set of self-contained examples, each consisting of a 78*56e6d5c0SMauro Carvalho Chehabpair of .c files containing boilerplate code for each of the user and 79*56e6d5c0SMauro Carvalho Chehabkernel sides of a relay application. When combined these two sets of 80*56e6d5c0SMauro Carvalho Chehabboilerplate code provide glue to easily stream data to disk, without 81*56e6d5c0SMauro Carvalho Chehabhaving to bother with mundane housekeeping chores. 82*56e6d5c0SMauro Carvalho Chehab 83*56e6d5c0SMauro Carvalho ChehabThe 'klog debugging functions' patch (klog.patch in the relay-apps 84*56e6d5c0SMauro Carvalho Chehabtarball) provides a couple of high-level logging functions to the 85*56e6d5c0SMauro Carvalho Chehabkernel which allow writing formatted text or raw data to a channel, 86*56e6d5c0SMauro Carvalho Chehabregardless of whether a channel to write into exists or not, or even 87*56e6d5c0SMauro Carvalho Chehabwhether the relay interface is compiled into the kernel or not. These 88*56e6d5c0SMauro Carvalho Chehabfunctions allow you to put unconditional 'trace' statements anywhere 89*56e6d5c0SMauro Carvalho Chehabin the kernel or kernel modules; only when there is a 'klog handler' 90*56e6d5c0SMauro Carvalho Chehabregistered will data actually be logged (see the klog and kleak 91*56e6d5c0SMauro Carvalho Chehabexamples for details). 92*56e6d5c0SMauro Carvalho Chehab 93*56e6d5c0SMauro Carvalho ChehabIt is of course possible to use the relay interface from scratch, 94*56e6d5c0SMauro Carvalho Chehabi.e. without using any of the relay-apps example code or klog, but 95*56e6d5c0SMauro Carvalho Chehabyou'll have to implement communication between userspace and kernel, 96*56e6d5c0SMauro Carvalho Chehaballowing both to convey the state of buffers (full, empty, amount of 97*56e6d5c0SMauro Carvalho Chehabpadding). The read() interface both removes padding and internally 98*56e6d5c0SMauro Carvalho Chehabconsumes the read sub-buffers; thus in cases where read(2) is being 99*56e6d5c0SMauro Carvalho Chehabused to drain the channel buffers, special-purpose communication 100*56e6d5c0SMauro Carvalho Chehabbetween kernel and user isn't necessary for basic operation. Things 101*56e6d5c0SMauro Carvalho Chehabsuch as buffer-full conditions would still need to be communicated via 102*56e6d5c0SMauro Carvalho Chehabsome channel though. 103*56e6d5c0SMauro Carvalho Chehab 104*56e6d5c0SMauro Carvalho Chehabklog and the relay-apps examples can be found in the relay-apps 105*56e6d5c0SMauro Carvalho Chehabtarball on http://relayfs.sourceforge.net 106*56e6d5c0SMauro Carvalho Chehab 107*56e6d5c0SMauro Carvalho ChehabThe relay interface user space API 108*56e6d5c0SMauro Carvalho Chehab================================== 109*56e6d5c0SMauro Carvalho Chehab 110*56e6d5c0SMauro Carvalho ChehabThe relay interface implements basic file operations for user space 111*56e6d5c0SMauro Carvalho Chehabaccess to relay channel buffer data. Here are the file operations 112*56e6d5c0SMauro Carvalho Chehabthat are available and some comments regarding their behavior: 113*56e6d5c0SMauro Carvalho Chehab 114*56e6d5c0SMauro Carvalho Chehab=========== ============================================================ 115*56e6d5c0SMauro Carvalho Chehabopen() enables user to open an _existing_ channel buffer. 116*56e6d5c0SMauro Carvalho Chehab 117*56e6d5c0SMauro Carvalho Chehabmmap() results in channel buffer being mapped into the caller's 118*56e6d5c0SMauro Carvalho Chehab memory space. Note that you can't do a partial mmap - you 119*56e6d5c0SMauro Carvalho Chehab must map the entire file, which is NRBUF * SUBBUFSIZE. 120*56e6d5c0SMauro Carvalho Chehab 121*56e6d5c0SMauro Carvalho Chehabread() read the contents of a channel buffer. The bytes read are 122*56e6d5c0SMauro Carvalho Chehab 'consumed' by the reader, i.e. they won't be available 123*56e6d5c0SMauro Carvalho Chehab again to subsequent reads. If the channel is being used 124*56e6d5c0SMauro Carvalho Chehab in no-overwrite mode (the default), it can be read at any 125*56e6d5c0SMauro Carvalho Chehab time even if there's an active kernel writer. If the 126*56e6d5c0SMauro Carvalho Chehab channel is being used in overwrite mode and there are 127*56e6d5c0SMauro Carvalho Chehab active channel writers, results may be unpredictable - 128*56e6d5c0SMauro Carvalho Chehab users should make sure that all logging to the channel has 129*56e6d5c0SMauro Carvalho Chehab ended before using read() with overwrite mode. Sub-buffer 130*56e6d5c0SMauro Carvalho Chehab padding is automatically removed and will not be seen by 131*56e6d5c0SMauro Carvalho Chehab the reader. 132*56e6d5c0SMauro Carvalho Chehab 133*56e6d5c0SMauro Carvalho Chehabsendfile() transfer data from a channel buffer to an output file 134*56e6d5c0SMauro Carvalho Chehab descriptor. Sub-buffer padding is automatically removed 135*56e6d5c0SMauro Carvalho Chehab and will not be seen by the reader. 136*56e6d5c0SMauro Carvalho Chehab 137*56e6d5c0SMauro Carvalho Chehabpoll() POLLIN/POLLRDNORM/POLLERR supported. User applications are 138*56e6d5c0SMauro Carvalho Chehab notified when sub-buffer boundaries are crossed. 139*56e6d5c0SMauro Carvalho Chehab 140*56e6d5c0SMauro Carvalho Chehabclose() decrements the channel buffer's refcount. When the refcount 141*56e6d5c0SMauro Carvalho Chehab reaches 0, i.e. when no process or kernel client has the 142*56e6d5c0SMauro Carvalho Chehab buffer open, the channel buffer is freed. 143*56e6d5c0SMauro Carvalho Chehab=========== ============================================================ 144*56e6d5c0SMauro Carvalho Chehab 145*56e6d5c0SMauro Carvalho ChehabIn order for a user application to make use of relay files, the 146*56e6d5c0SMauro Carvalho Chehabhost filesystem must be mounted. For example:: 147*56e6d5c0SMauro Carvalho Chehab 148*56e6d5c0SMauro Carvalho Chehab mount -t debugfs debugfs /sys/kernel/debug 149*56e6d5c0SMauro Carvalho Chehab 150*56e6d5c0SMauro Carvalho Chehab.. Note:: 151*56e6d5c0SMauro Carvalho Chehab 152*56e6d5c0SMauro Carvalho Chehab the host filesystem doesn't need to be mounted for kernel 153*56e6d5c0SMauro Carvalho Chehab clients to create or use channels - it only needs to be 154*56e6d5c0SMauro Carvalho Chehab mounted when user space applications need access to the buffer 155*56e6d5c0SMauro Carvalho Chehab data. 156*56e6d5c0SMauro Carvalho Chehab 157*56e6d5c0SMauro Carvalho Chehab 158*56e6d5c0SMauro Carvalho ChehabThe relay interface kernel API 159*56e6d5c0SMauro Carvalho Chehab============================== 160*56e6d5c0SMauro Carvalho Chehab 161*56e6d5c0SMauro Carvalho ChehabHere's a summary of the API the relay interface provides to in-kernel clients: 162*56e6d5c0SMauro Carvalho Chehab 163*56e6d5c0SMauro Carvalho ChehabTBD(curr. line MT:/API/) 164*56e6d5c0SMauro Carvalho Chehab channel management functions:: 165*56e6d5c0SMauro Carvalho Chehab 166*56e6d5c0SMauro Carvalho Chehab relay_open(base_filename, parent, subbuf_size, n_subbufs, 167*56e6d5c0SMauro Carvalho Chehab callbacks, private_data) 168*56e6d5c0SMauro Carvalho Chehab relay_close(chan) 169*56e6d5c0SMauro Carvalho Chehab relay_flush(chan) 170*56e6d5c0SMauro Carvalho Chehab relay_reset(chan) 171*56e6d5c0SMauro Carvalho Chehab 172*56e6d5c0SMauro Carvalho Chehab channel management typically called on instigation of userspace:: 173*56e6d5c0SMauro Carvalho Chehab 174*56e6d5c0SMauro Carvalho Chehab relay_subbufs_consumed(chan, cpu, subbufs_consumed) 175*56e6d5c0SMauro Carvalho Chehab 176*56e6d5c0SMauro Carvalho Chehab write functions:: 177*56e6d5c0SMauro Carvalho Chehab 178*56e6d5c0SMauro Carvalho Chehab relay_write(chan, data, length) 179*56e6d5c0SMauro Carvalho Chehab __relay_write(chan, data, length) 180*56e6d5c0SMauro Carvalho Chehab relay_reserve(chan, length) 181*56e6d5c0SMauro Carvalho Chehab 182*56e6d5c0SMauro Carvalho Chehab callbacks:: 183*56e6d5c0SMauro Carvalho Chehab 184*56e6d5c0SMauro Carvalho Chehab subbuf_start(buf, subbuf, prev_subbuf, prev_padding) 185*56e6d5c0SMauro Carvalho Chehab buf_mapped(buf, filp) 186*56e6d5c0SMauro Carvalho Chehab buf_unmapped(buf, filp) 187*56e6d5c0SMauro Carvalho Chehab create_buf_file(filename, parent, mode, buf, is_global) 188*56e6d5c0SMauro Carvalho Chehab remove_buf_file(dentry) 189*56e6d5c0SMauro Carvalho Chehab 190*56e6d5c0SMauro Carvalho Chehab helper functions:: 191*56e6d5c0SMauro Carvalho Chehab 192*56e6d5c0SMauro Carvalho Chehab relay_buf_full(buf) 193*56e6d5c0SMauro Carvalho Chehab subbuf_start_reserve(buf, length) 194*56e6d5c0SMauro Carvalho Chehab 195*56e6d5c0SMauro Carvalho Chehab 196*56e6d5c0SMauro Carvalho ChehabCreating a channel 197*56e6d5c0SMauro Carvalho Chehab------------------ 198*56e6d5c0SMauro Carvalho Chehab 199*56e6d5c0SMauro Carvalho Chehabrelay_open() is used to create a channel, along with its per-cpu 200*56e6d5c0SMauro Carvalho Chehabchannel buffers. Each channel buffer will have an associated file 201*56e6d5c0SMauro Carvalho Chehabcreated for it in the host filesystem, which can be and mmapped or 202*56e6d5c0SMauro Carvalho Chehabread from in user space. The files are named basename0...basenameN-1 203*56e6d5c0SMauro Carvalho Chehabwhere N is the number of online cpus, and by default will be created 204*56e6d5c0SMauro Carvalho Chehabin the root of the filesystem (if the parent param is NULL). If you 205*56e6d5c0SMauro Carvalho Chehabwant a directory structure to contain your relay files, you should 206*56e6d5c0SMauro Carvalho Chehabcreate it using the host filesystem's directory creation function, 207*56e6d5c0SMauro Carvalho Chehabe.g. debugfs_create_dir(), and pass the parent directory to 208*56e6d5c0SMauro Carvalho Chehabrelay_open(). Users are responsible for cleaning up any directory 209*56e6d5c0SMauro Carvalho Chehabstructure they create, when the channel is closed - again the host 210*56e6d5c0SMauro Carvalho Chehabfilesystem's directory removal functions should be used for that, 211*56e6d5c0SMauro Carvalho Chehabe.g. debugfs_remove(). 212*56e6d5c0SMauro Carvalho Chehab 213*56e6d5c0SMauro Carvalho ChehabIn order for a channel to be created and the host filesystem's files 214*56e6d5c0SMauro Carvalho Chehabassociated with its channel buffers, the user must provide definitions 215*56e6d5c0SMauro Carvalho Chehabfor two callback functions, create_buf_file() and remove_buf_file(). 216*56e6d5c0SMauro Carvalho Chehabcreate_buf_file() is called once for each per-cpu buffer from 217*56e6d5c0SMauro Carvalho Chehabrelay_open() and allows the user to create the file which will be used 218*56e6d5c0SMauro Carvalho Chehabto represent the corresponding channel buffer. The callback should 219*56e6d5c0SMauro Carvalho Chehabreturn the dentry of the file created to represent the channel buffer. 220*56e6d5c0SMauro Carvalho Chehabremove_buf_file() must also be defined; it's responsible for deleting 221*56e6d5c0SMauro Carvalho Chehabthe file(s) created in create_buf_file() and is called during 222*56e6d5c0SMauro Carvalho Chehabrelay_close(). 223*56e6d5c0SMauro Carvalho Chehab 224*56e6d5c0SMauro Carvalho ChehabHere are some typical definitions for these callbacks, in this case 225*56e6d5c0SMauro Carvalho Chehabusing debugfs:: 226*56e6d5c0SMauro Carvalho Chehab 227*56e6d5c0SMauro Carvalho Chehab /* 228*56e6d5c0SMauro Carvalho Chehab * create_buf_file() callback. Creates relay file in debugfs. 229*56e6d5c0SMauro Carvalho Chehab */ 230*56e6d5c0SMauro Carvalho Chehab static struct dentry *create_buf_file_handler(const char *filename, 231*56e6d5c0SMauro Carvalho Chehab struct dentry *parent, 232*56e6d5c0SMauro Carvalho Chehab umode_t mode, 233*56e6d5c0SMauro Carvalho Chehab struct rchan_buf *buf, 234*56e6d5c0SMauro Carvalho Chehab int *is_global) 235*56e6d5c0SMauro Carvalho Chehab { 236*56e6d5c0SMauro Carvalho Chehab return debugfs_create_file(filename, mode, parent, buf, 237*56e6d5c0SMauro Carvalho Chehab &relay_file_operations); 238*56e6d5c0SMauro Carvalho Chehab } 239*56e6d5c0SMauro Carvalho Chehab 240*56e6d5c0SMauro Carvalho Chehab /* 241*56e6d5c0SMauro Carvalho Chehab * remove_buf_file() callback. Removes relay file from debugfs. 242*56e6d5c0SMauro Carvalho Chehab */ 243*56e6d5c0SMauro Carvalho Chehab static int remove_buf_file_handler(struct dentry *dentry) 244*56e6d5c0SMauro Carvalho Chehab { 245*56e6d5c0SMauro Carvalho Chehab debugfs_remove(dentry); 246*56e6d5c0SMauro Carvalho Chehab 247*56e6d5c0SMauro Carvalho Chehab return 0; 248*56e6d5c0SMauro Carvalho Chehab } 249*56e6d5c0SMauro Carvalho Chehab 250*56e6d5c0SMauro Carvalho Chehab /* 251*56e6d5c0SMauro Carvalho Chehab * relay interface callbacks 252*56e6d5c0SMauro Carvalho Chehab */ 253*56e6d5c0SMauro Carvalho Chehab static struct rchan_callbacks relay_callbacks = 254*56e6d5c0SMauro Carvalho Chehab { 255*56e6d5c0SMauro Carvalho Chehab .create_buf_file = create_buf_file_handler, 256*56e6d5c0SMauro Carvalho Chehab .remove_buf_file = remove_buf_file_handler, 257*56e6d5c0SMauro Carvalho Chehab }; 258*56e6d5c0SMauro Carvalho Chehab 259*56e6d5c0SMauro Carvalho ChehabAnd an example relay_open() invocation using them:: 260*56e6d5c0SMauro Carvalho Chehab 261*56e6d5c0SMauro Carvalho Chehab chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL); 262*56e6d5c0SMauro Carvalho Chehab 263*56e6d5c0SMauro Carvalho ChehabIf the create_buf_file() callback fails, or isn't defined, channel 264*56e6d5c0SMauro Carvalho Chehabcreation and thus relay_open() will fail. 265*56e6d5c0SMauro Carvalho Chehab 266*56e6d5c0SMauro Carvalho ChehabThe total size of each per-cpu buffer is calculated by multiplying the 267*56e6d5c0SMauro Carvalho Chehabnumber of sub-buffers by the sub-buffer size passed into relay_open(). 268*56e6d5c0SMauro Carvalho ChehabThe idea behind sub-buffers is that they're basically an extension of 269*56e6d5c0SMauro Carvalho Chehabdouble-buffering to N buffers, and they also allow applications to 270*56e6d5c0SMauro Carvalho Chehabeasily implement random-access-on-buffer-boundary schemes, which can 271*56e6d5c0SMauro Carvalho Chehabbe important for some high-volume applications. The number and size 272*56e6d5c0SMauro Carvalho Chehabof sub-buffers is completely dependent on the application and even for 273*56e6d5c0SMauro Carvalho Chehabthe same application, different conditions will warrant different 274*56e6d5c0SMauro Carvalho Chehabvalues for these parameters at different times. Typically, the right 275*56e6d5c0SMauro Carvalho Chehabvalues to use are best decided after some experimentation; in general, 276*56e6d5c0SMauro Carvalho Chehabthough, it's safe to assume that having only 1 sub-buffer is a bad 277*56e6d5c0SMauro Carvalho Chehabidea - you're guaranteed to either overwrite data or lose events 278*56e6d5c0SMauro Carvalho Chehabdepending on the channel mode being used. 279*56e6d5c0SMauro Carvalho Chehab 280*56e6d5c0SMauro Carvalho ChehabThe create_buf_file() implementation can also be defined in such a way 281*56e6d5c0SMauro Carvalho Chehabas to allow the creation of a single 'global' buffer instead of the 282*56e6d5c0SMauro Carvalho Chehabdefault per-cpu set. This can be useful for applications interested 283*56e6d5c0SMauro Carvalho Chehabmainly in seeing the relative ordering of system-wide events without 284*56e6d5c0SMauro Carvalho Chehabthe need to bother with saving explicit timestamps for the purpose of 285*56e6d5c0SMauro Carvalho Chehabmerging/sorting per-cpu files in a postprocessing step. 286*56e6d5c0SMauro Carvalho Chehab 287*56e6d5c0SMauro Carvalho ChehabTo have relay_open() create a global buffer, the create_buf_file() 288*56e6d5c0SMauro Carvalho Chehabimplementation should set the value of the is_global outparam to a 289*56e6d5c0SMauro Carvalho Chehabnon-zero value in addition to creating the file that will be used to 290*56e6d5c0SMauro Carvalho Chehabrepresent the single buffer. In the case of a global buffer, 291*56e6d5c0SMauro Carvalho Chehabcreate_buf_file() and remove_buf_file() will be called only once. The 292*56e6d5c0SMauro Carvalho Chehabnormal channel-writing functions, e.g. relay_write(), can still be 293*56e6d5c0SMauro Carvalho Chehabused - writes from any cpu will transparently end up in the global 294*56e6d5c0SMauro Carvalho Chehabbuffer - but since it is a global buffer, callers should make sure 295*56e6d5c0SMauro Carvalho Chehabthey use the proper locking for such a buffer, either by wrapping 296*56e6d5c0SMauro Carvalho Chehabwrites in a spinlock, or by copying a write function from relay.h and 297*56e6d5c0SMauro Carvalho Chehabcreating a local version that internally does the proper locking. 298*56e6d5c0SMauro Carvalho Chehab 299*56e6d5c0SMauro Carvalho ChehabThe private_data passed into relay_open() allows clients to associate 300*56e6d5c0SMauro Carvalho Chehabuser-defined data with a channel, and is immediately available 301*56e6d5c0SMauro Carvalho Chehab(including in create_buf_file()) via chan->private_data or 302*56e6d5c0SMauro Carvalho Chehabbuf->chan->private_data. 303*56e6d5c0SMauro Carvalho Chehab 304*56e6d5c0SMauro Carvalho ChehabBuffer-only channels 305*56e6d5c0SMauro Carvalho Chehab-------------------- 306*56e6d5c0SMauro Carvalho Chehab 307*56e6d5c0SMauro Carvalho ChehabThese channels have no files associated and can be created with 308*56e6d5c0SMauro Carvalho Chehabrelay_open(NULL, NULL, ...). Such channels are useful in scenarios such 309*56e6d5c0SMauro Carvalho Chehabas when doing early tracing in the kernel, before the VFS is up. In these 310*56e6d5c0SMauro Carvalho Chehabcases, one may open a buffer-only channel and then call 311*56e6d5c0SMauro Carvalho Chehabrelay_late_setup_files() when the kernel is ready to handle files, 312*56e6d5c0SMauro Carvalho Chehabto expose the buffered data to the userspace. 313*56e6d5c0SMauro Carvalho Chehab 314*56e6d5c0SMauro Carvalho ChehabChannel 'modes' 315*56e6d5c0SMauro Carvalho Chehab--------------- 316*56e6d5c0SMauro Carvalho Chehab 317*56e6d5c0SMauro Carvalho Chehabrelay channels can be used in either of two modes - 'overwrite' or 318*56e6d5c0SMauro Carvalho Chehab'no-overwrite'. The mode is entirely determined by the implementation 319*56e6d5c0SMauro Carvalho Chehabof the subbuf_start() callback, as described below. The default if no 320*56e6d5c0SMauro Carvalho Chehabsubbuf_start() callback is defined is 'no-overwrite' mode. If the 321*56e6d5c0SMauro Carvalho Chehabdefault mode suits your needs, and you plan to use the read() 322*56e6d5c0SMauro Carvalho Chehabinterface to retrieve channel data, you can ignore the details of this 323*56e6d5c0SMauro Carvalho Chehabsection, as it pertains mainly to mmap() implementations. 324*56e6d5c0SMauro Carvalho Chehab 325*56e6d5c0SMauro Carvalho ChehabIn 'overwrite' mode, also known as 'flight recorder' mode, writes 326*56e6d5c0SMauro Carvalho Chehabcontinuously cycle around the buffer and will never fail, but will 327*56e6d5c0SMauro Carvalho Chehabunconditionally overwrite old data regardless of whether it's actually 328*56e6d5c0SMauro Carvalho Chehabbeen consumed. In no-overwrite mode, writes will fail, i.e. data will 329*56e6d5c0SMauro Carvalho Chehabbe lost, if the number of unconsumed sub-buffers equals the total 330*56e6d5c0SMauro Carvalho Chehabnumber of sub-buffers in the channel. It should be clear that if 331*56e6d5c0SMauro Carvalho Chehabthere is no consumer or if the consumer can't consume sub-buffers fast 332*56e6d5c0SMauro Carvalho Chehabenough, data will be lost in either case; the only difference is 333*56e6d5c0SMauro Carvalho Chehabwhether data is lost from the beginning or the end of a buffer. 334*56e6d5c0SMauro Carvalho Chehab 335*56e6d5c0SMauro Carvalho ChehabAs explained above, a relay channel is made of up one or more 336*56e6d5c0SMauro Carvalho Chehabper-cpu channel buffers, each implemented as a circular buffer 337*56e6d5c0SMauro Carvalho Chehabsubdivided into one or more sub-buffers. Messages are written into 338*56e6d5c0SMauro Carvalho Chehabthe current sub-buffer of the channel's current per-cpu buffer via the 339*56e6d5c0SMauro Carvalho Chehabwrite functions described below. Whenever a message can't fit into 340*56e6d5c0SMauro Carvalho Chehabthe current sub-buffer, because there's no room left for it, the 341*56e6d5c0SMauro Carvalho Chehabclient is notified via the subbuf_start() callback that a switch to a 342*56e6d5c0SMauro Carvalho Chehabnew sub-buffer is about to occur. The client uses this callback to 1) 343*56e6d5c0SMauro Carvalho Chehabinitialize the next sub-buffer if appropriate 2) finalize the previous 344*56e6d5c0SMauro Carvalho Chehabsub-buffer if appropriate and 3) return a boolean value indicating 345*56e6d5c0SMauro Carvalho Chehabwhether or not to actually move on to the next sub-buffer. 346*56e6d5c0SMauro Carvalho Chehab 347*56e6d5c0SMauro Carvalho ChehabTo implement 'no-overwrite' mode, the userspace client would provide 348*56e6d5c0SMauro Carvalho Chehaban implementation of the subbuf_start() callback something like the 349*56e6d5c0SMauro Carvalho Chehabfollowing:: 350*56e6d5c0SMauro Carvalho Chehab 351*56e6d5c0SMauro Carvalho Chehab static int subbuf_start(struct rchan_buf *buf, 352*56e6d5c0SMauro Carvalho Chehab void *subbuf, 353*56e6d5c0SMauro Carvalho Chehab void *prev_subbuf, 354*56e6d5c0SMauro Carvalho Chehab unsigned int prev_padding) 355*56e6d5c0SMauro Carvalho Chehab { 356*56e6d5c0SMauro Carvalho Chehab if (prev_subbuf) 357*56e6d5c0SMauro Carvalho Chehab *((unsigned *)prev_subbuf) = prev_padding; 358*56e6d5c0SMauro Carvalho Chehab 359*56e6d5c0SMauro Carvalho Chehab if (relay_buf_full(buf)) 360*56e6d5c0SMauro Carvalho Chehab return 0; 361*56e6d5c0SMauro Carvalho Chehab 362*56e6d5c0SMauro Carvalho Chehab subbuf_start_reserve(buf, sizeof(unsigned int)); 363*56e6d5c0SMauro Carvalho Chehab 364*56e6d5c0SMauro Carvalho Chehab return 1; 365*56e6d5c0SMauro Carvalho Chehab } 366*56e6d5c0SMauro Carvalho Chehab 367*56e6d5c0SMauro Carvalho ChehabIf the current buffer is full, i.e. all sub-buffers remain unconsumed, 368*56e6d5c0SMauro Carvalho Chehabthe callback returns 0 to indicate that the buffer switch should not 369*56e6d5c0SMauro Carvalho Chehaboccur yet, i.e. until the consumer has had a chance to read the 370*56e6d5c0SMauro Carvalho Chehabcurrent set of ready sub-buffers. For the relay_buf_full() function 371*56e6d5c0SMauro Carvalho Chehabto make sense, the consumer is responsible for notifying the relay 372*56e6d5c0SMauro Carvalho Chehabinterface when sub-buffers have been consumed via 373*56e6d5c0SMauro Carvalho Chehabrelay_subbufs_consumed(). Any subsequent attempts to write into the 374*56e6d5c0SMauro Carvalho Chehabbuffer will again invoke the subbuf_start() callback with the same 375*56e6d5c0SMauro Carvalho Chehabparameters; only when the consumer has consumed one or more of the 376*56e6d5c0SMauro Carvalho Chehabready sub-buffers will relay_buf_full() return 0, in which case the 377*56e6d5c0SMauro Carvalho Chehabbuffer switch can continue. 378*56e6d5c0SMauro Carvalho Chehab 379*56e6d5c0SMauro Carvalho ChehabThe implementation of the subbuf_start() callback for 'overwrite' mode 380*56e6d5c0SMauro Carvalho Chehabwould be very similar:: 381*56e6d5c0SMauro Carvalho Chehab 382*56e6d5c0SMauro Carvalho Chehab static int subbuf_start(struct rchan_buf *buf, 383*56e6d5c0SMauro Carvalho Chehab void *subbuf, 384*56e6d5c0SMauro Carvalho Chehab void *prev_subbuf, 385*56e6d5c0SMauro Carvalho Chehab size_t prev_padding) 386*56e6d5c0SMauro Carvalho Chehab { 387*56e6d5c0SMauro Carvalho Chehab if (prev_subbuf) 388*56e6d5c0SMauro Carvalho Chehab *((unsigned *)prev_subbuf) = prev_padding; 389*56e6d5c0SMauro Carvalho Chehab 390*56e6d5c0SMauro Carvalho Chehab subbuf_start_reserve(buf, sizeof(unsigned int)); 391*56e6d5c0SMauro Carvalho Chehab 392*56e6d5c0SMauro Carvalho Chehab return 1; 393*56e6d5c0SMauro Carvalho Chehab } 394*56e6d5c0SMauro Carvalho Chehab 395*56e6d5c0SMauro Carvalho ChehabIn this case, the relay_buf_full() check is meaningless and the 396*56e6d5c0SMauro Carvalho Chehabcallback always returns 1, causing the buffer switch to occur 397*56e6d5c0SMauro Carvalho Chehabunconditionally. It's also meaningless for the client to use the 398*56e6d5c0SMauro Carvalho Chehabrelay_subbufs_consumed() function in this mode, as it's never 399*56e6d5c0SMauro Carvalho Chehabconsulted. 400*56e6d5c0SMauro Carvalho Chehab 401*56e6d5c0SMauro Carvalho ChehabThe default subbuf_start() implementation, used if the client doesn't 402*56e6d5c0SMauro Carvalho Chehabdefine any callbacks, or doesn't define the subbuf_start() callback, 403*56e6d5c0SMauro Carvalho Chehabimplements the simplest possible 'no-overwrite' mode, i.e. it does 404*56e6d5c0SMauro Carvalho Chehabnothing but return 0. 405*56e6d5c0SMauro Carvalho Chehab 406*56e6d5c0SMauro Carvalho ChehabHeader information can be reserved at the beginning of each sub-buffer 407*56e6d5c0SMauro Carvalho Chehabby calling the subbuf_start_reserve() helper function from within the 408*56e6d5c0SMauro Carvalho Chehabsubbuf_start() callback. This reserved area can be used to store 409*56e6d5c0SMauro Carvalho Chehabwhatever information the client wants. In the example above, room is 410*56e6d5c0SMauro Carvalho Chehabreserved in each sub-buffer to store the padding count for that 411*56e6d5c0SMauro Carvalho Chehabsub-buffer. This is filled in for the previous sub-buffer in the 412*56e6d5c0SMauro Carvalho Chehabsubbuf_start() implementation; the padding value for the previous 413*56e6d5c0SMauro Carvalho Chehabsub-buffer is passed into the subbuf_start() callback along with a 414*56e6d5c0SMauro Carvalho Chehabpointer to the previous sub-buffer, since the padding value isn't 415*56e6d5c0SMauro Carvalho Chehabknown until a sub-buffer is filled. The subbuf_start() callback is 416*56e6d5c0SMauro Carvalho Chehabalso called for the first sub-buffer when the channel is opened, to 417*56e6d5c0SMauro Carvalho Chehabgive the client a chance to reserve space in it. In this case the 418*56e6d5c0SMauro Carvalho Chehabprevious sub-buffer pointer passed into the callback will be NULL, so 419*56e6d5c0SMauro Carvalho Chehabthe client should check the value of the prev_subbuf pointer before 420*56e6d5c0SMauro Carvalho Chehabwriting into the previous sub-buffer. 421*56e6d5c0SMauro Carvalho Chehab 422*56e6d5c0SMauro Carvalho ChehabWriting to a channel 423*56e6d5c0SMauro Carvalho Chehab-------------------- 424*56e6d5c0SMauro Carvalho Chehab 425*56e6d5c0SMauro Carvalho ChehabKernel clients write data into the current cpu's channel buffer using 426*56e6d5c0SMauro Carvalho Chehabrelay_write() or __relay_write(). relay_write() is the main logging 427*56e6d5c0SMauro Carvalho Chehabfunction - it uses local_irqsave() to protect the buffer and should be 428*56e6d5c0SMauro Carvalho Chehabused if you might be logging from interrupt context. If you know 429*56e6d5c0SMauro Carvalho Chehabyou'll never be logging from interrupt context, you can use 430*56e6d5c0SMauro Carvalho Chehab__relay_write(), which only disables preemption. These functions 431*56e6d5c0SMauro Carvalho Chehabdon't return a value, so you can't determine whether or not they 432*56e6d5c0SMauro Carvalho Chehabfailed - the assumption is that you wouldn't want to check a return 433*56e6d5c0SMauro Carvalho Chehabvalue in the fast logging path anyway, and that they'll always succeed 434*56e6d5c0SMauro Carvalho Chehabunless the buffer is full and no-overwrite mode is being used, in 435*56e6d5c0SMauro Carvalho Chehabwhich case you can detect a failed write in the subbuf_start() 436*56e6d5c0SMauro Carvalho Chehabcallback by calling the relay_buf_full() helper function. 437*56e6d5c0SMauro Carvalho Chehab 438*56e6d5c0SMauro Carvalho Chehabrelay_reserve() is used to reserve a slot in a channel buffer which 439*56e6d5c0SMauro Carvalho Chehabcan be written to later. This would typically be used in applications 440*56e6d5c0SMauro Carvalho Chehabthat need to write directly into a channel buffer without having to 441*56e6d5c0SMauro Carvalho Chehabstage data in a temporary buffer beforehand. Because the actual write 442*56e6d5c0SMauro Carvalho Chehabmay not happen immediately after the slot is reserved, applications 443*56e6d5c0SMauro Carvalho Chehabusing relay_reserve() can keep a count of the number of bytes actually 444*56e6d5c0SMauro Carvalho Chehabwritten, either in space reserved in the sub-buffers themselves or as 445*56e6d5c0SMauro Carvalho Chehaba separate array. See the 'reserve' example in the relay-apps tarball 446*56e6d5c0SMauro Carvalho Chehabat http://relayfs.sourceforge.net for an example of how this can be 447*56e6d5c0SMauro Carvalho Chehabdone. Because the write is under control of the client and is 448*56e6d5c0SMauro Carvalho Chehabseparated from the reserve, relay_reserve() doesn't protect the buffer 449*56e6d5c0SMauro Carvalho Chehabat all - it's up to the client to provide the appropriate 450*56e6d5c0SMauro Carvalho Chehabsynchronization when using relay_reserve(). 451*56e6d5c0SMauro Carvalho Chehab 452*56e6d5c0SMauro Carvalho ChehabClosing a channel 453*56e6d5c0SMauro Carvalho Chehab----------------- 454*56e6d5c0SMauro Carvalho Chehab 455*56e6d5c0SMauro Carvalho ChehabThe client calls relay_close() when it's finished using the channel. 456*56e6d5c0SMauro Carvalho ChehabThe channel and its associated buffers are destroyed when there are no 457*56e6d5c0SMauro Carvalho Chehablonger any references to any of the channel buffers. relay_flush() 458*56e6d5c0SMauro Carvalho Chehabforces a sub-buffer switch on all the channel buffers, and can be used 459*56e6d5c0SMauro Carvalho Chehabto finalize and process the last sub-buffers before the channel is 460*56e6d5c0SMauro Carvalho Chehabclosed. 461*56e6d5c0SMauro Carvalho Chehab 462*56e6d5c0SMauro Carvalho ChehabMisc 463*56e6d5c0SMauro Carvalho Chehab---- 464*56e6d5c0SMauro Carvalho Chehab 465*56e6d5c0SMauro Carvalho ChehabSome applications may want to keep a channel around and re-use it 466*56e6d5c0SMauro Carvalho Chehabrather than open and close a new channel for each use. relay_reset() 467*56e6d5c0SMauro Carvalho Chehabcan be used for this purpose - it resets a channel to its initial 468*56e6d5c0SMauro Carvalho Chehabstate without reallocating channel buffer memory or destroying 469*56e6d5c0SMauro Carvalho Chehabexisting mappings. It should however only be called when it's safe to 470*56e6d5c0SMauro Carvalho Chehabdo so, i.e. when the channel isn't currently being written to. 471*56e6d5c0SMauro Carvalho Chehab 472*56e6d5c0SMauro Carvalho ChehabFinally, there are a couple of utility callbacks that can be used for 473*56e6d5c0SMauro Carvalho Chehabdifferent purposes. buf_mapped() is called whenever a channel buffer 474*56e6d5c0SMauro Carvalho Chehabis mmapped from user space and buf_unmapped() is called when it's 475*56e6d5c0SMauro Carvalho Chehabunmapped. The client can use this notification to trigger actions 476*56e6d5c0SMauro Carvalho Chehabwithin the kernel application, such as enabling/disabling logging to 477*56e6d5c0SMauro Carvalho Chehabthe channel. 478*56e6d5c0SMauro Carvalho Chehab 479*56e6d5c0SMauro Carvalho Chehab 480*56e6d5c0SMauro Carvalho ChehabResources 481*56e6d5c0SMauro Carvalho Chehab========= 482*56e6d5c0SMauro Carvalho Chehab 483*56e6d5c0SMauro Carvalho ChehabFor news, example code, mailing list, etc. see the relay interface homepage: 484*56e6d5c0SMauro Carvalho Chehab 485*56e6d5c0SMauro Carvalho Chehab http://relayfs.sourceforge.net 486*56e6d5c0SMauro Carvalho Chehab 487*56e6d5c0SMauro Carvalho Chehab 488*56e6d5c0SMauro Carvalho ChehabCredits 489*56e6d5c0SMauro Carvalho Chehab======= 490*56e6d5c0SMauro Carvalho Chehab 491*56e6d5c0SMauro Carvalho ChehabThe ideas and specs for the relay interface came about as a result of 492*56e6d5c0SMauro Carvalho Chehabdiscussions on tracing involving the following: 493*56e6d5c0SMauro Carvalho Chehab 494*56e6d5c0SMauro Carvalho ChehabMichel Dagenais <michel.dagenais@polymtl.ca> 495*56e6d5c0SMauro Carvalho ChehabRichard Moore <richardj_moore@uk.ibm.com> 496*56e6d5c0SMauro Carvalho ChehabBob Wisniewski <bob@watson.ibm.com> 497*56e6d5c0SMauro Carvalho ChehabKarim Yaghmour <karim@opersys.com> 498*56e6d5c0SMauro Carvalho ChehabTom Zanussi <zanussi@us.ibm.com> 499*56e6d5c0SMauro Carvalho Chehab 500*56e6d5c0SMauro Carvalho ChehabAlso thanks to Hubertus Franke for a lot of useful suggestions and bug 501*56e6d5c0SMauro Carvalho Chehabreports. 502