1xdrgen - Linux Kernel XDR code generator 2 3Introduction 4------------ 5 6SunRPC programs are typically specified using a language defined by 7RFC 4506. In fact, all IETF-published NFS specifications provide a 8description of the specified protocol using this language. 9 10Since the 1990's, user space consumers of SunRPC have had access to 11a tool that could read such XDR specifications and then generate C 12code that implements the RPC portions of that protocol. This tool is 13called rpcgen. 14 15This RPC-level code is code that handles input directly from the 16network, and thus a high degree of memory safety and sanity checking 17is needed to help ensure proper levels of security. Bugs in this 18code can have significant impact on security and performance. 19 20However, it is code that is repetitive and tedious to write by hand. 21 22The C code generated by rpcgen makes extensive use of the facilities 23of the user space TI-RPC library and libc. Furthermore, the dialect 24of the generated code is very traditional K&R C. 25 26The Linux kernel's implementation of SunRPC-based protocols hand-roll 27their XDR implementation. There are two main reasons for this: 28 291. libtirpc (and its predecessors) operate only in user space. The 30 kernel's RPC implementation and its API are significantly 31 different than libtirpc. 32 332. rpcgen-generated code is believed to be less efficient than code 34 that is hand-written. 35 36These days, gcc and its kin are capable of optimizing code better 37than human authors. There are only a few instances where writing 38XDR code by hand will make a measurable performance different. 39 40In addition, the current hand-written code in the Linux kernel is 41difficult to audit and prove that it implements exactly what is in 42the protocol specification. 43 44In order to accrue the benefits of machine-generated XDR code in the 45kernel, a tool is needed that will output C code that works against 46the kernel's SunRPC implementation rather than libtirpc. 47 48Enter xdrgen. 49 50 51Dependencies 52------------ 53 54These dependencies are typically packaged by Linux distributions: 55 56- python3 57- python3-lark 58- python3-jinja2 59 60These dependencies are available via PyPi: 61 62- pip install 'lark[interegular]' 63 64 65XDR Specifications 66------------------ 67 68When adding a new protocol implementation to the kernel, the XDR 69specification can be derived by feeding a .txt copy of the RFC to 70the script located in tools/net/sunrpc/extract.sh. 71 72 $ extract.sh < rfc0001.txt > new2.x 73 74 75Operation 76--------- 77 78Once a .x file is available, use xdrgen to generate source and 79header files containing an implementation of XDR encoding and 80decoding functions for the specified protocol. 81 82 $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h 83 $ ./xdrgen declarations new2.x > new2xdr_gen.h 84 85and 86 87 $ ./xdrgen source new2.x > new2xdr_gen.c 88 89The files are ready to use for a server-side protocol implementation, 90or may be used as a guide for implementing these routines by hand. 91 92By default, the only comments added to this code are kdoc comments 93that appear directly in front of the public per-procedure APIs. For 94deeper introspection, specifying the "--annotate" flag will insert 95additional comments in the generated code to help readers match the 96generated code to specific parts of the XDR specification. 97 98Because the generated code is targeted for the Linux kernel, it 99is tagged with a GPLv2-only license. 100 101The xdrgen tool can also provide lexical and syntax checking of 102an XDR specification: 103 104 $ ./xdrgen lint xdr/new.x 105 106 107How It Works 108------------ 109 110xdrgen does not use machine learning to generate source code. The 111translation is entirely deterministic. 112 113RFC 4506 Section 6 contains a BNF grammar of the XDR specification 114language. The grammar has been adapted for use by the Python Lark 115module. 116 117The xdr.ebnf file in this directory contains the grammar used to 118parse XDR specifications. xdrgen configures Lark using the grammar 119in xdr.ebnf. Lark parses the target XDR specification using this 120grammar, creating a parse tree. 121 122xdrgen then transforms the parse tree into an abstract syntax tree. 123This tree is passed to a series of code generators. 124 125The generators are implemented as Python classes residing in the 126generators/ directory. Each generator emits code created from Jinja2 127templates stored in the templates/ directory. 128 129The source code is generated in the same order in which they appear 130in the specification to ensure the generated code compiles. This 131conforms with the behavior of rpcgen. 132 133xdrgen assumes that the generated source code is further compiled by 134a compiler that can optimize in a number of ways, including: 135 136 - Unused functions are discarded (ie, not added to the executable) 137 138 - Aggressive function inlining removes unnecessary stack frames 139 140 - Single-arm switch statements are replaced by a single conditional 141 branch 142 143And so on. 144 145 146Pragmas 147------- 148 149Pragma directives specify exceptions to the normal generation of 150encoding and decoding functions. Currently one directive is 151implemented: "public". 152 153Pragma exclude 154------ ------- 155 156 pragma exclude <RPC procedure> ; 157 158In some cases, a procedure encoder or decoder function might need 159special processing that cannot be automatically generated. The 160automatically-generated functions might conflict or interfere with 161the hand-rolled function. To avoid editing the generated source code 162by hand, a pragma can specify that the procedure's encoder and 163decoder functions are not included in the generated header and 164source. 165 166For example: 167 168 pragma exclude NFSPROC3_READDIRPLUS; 169 170Excludes the decoder function for the READDIRPLUS argument and the 171encoder function for the READDIRPLUS result. 172 173Note that because data item encoder and decoder functions are 174defined "static __maybe_unused", subsequent compilation 175automatically excludes data item encoder and decoder functions that 176are used only by excluded procedure. 177 178Pragma header 179------ ------ 180 181 pragma header <string> ; 182 183Provide a name to use for the header file. For example: 184 185 pragma header nlm4; 186 187Adds 188 189 #include "nlm4xdr_gen.h" 190 191to the generated source file. 192 193Pragma public 194------ ------ 195 196 pragma public <XDR data item> ; 197 198Normally XDR encoder and decoder functions are "static". In case an 199implementer wants to call these functions from other source code, 200s/he can add a public pragma in the input .x file to indicate a set 201of functions that should get a prototype in the generated header, 202and the function definitions will not be declared static. 203 204For example: 205 206 pragma public nfsstat3; 207 208Adds these prototypes in the generated header: 209 210 bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr); 211 bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value); 212 213And, in the generated source code, both of these functions appear 214without the "static __maybe_unused" modifiers. 215 216 217Future Work 218----------- 219 220Finish implementing XDR pointer and list types. 221 222Generate client-side procedure functions 223 224Expand the README into a user guide similar to rpcgen(1) 225 226Add more pragma directives: 227 228 * @pages -- use xdr_read/write_pages() for the specified opaque 229 field 230 * @skip -- do not decode, but rather skip, the specified argument 231 field 232 233Enable something like a #include to dynamically insert the content 234of other specification files 235 236Properly support line-by-line pass-through via the "%" decorator 237 238Build a unit test suite for verifying translation of XDR language 239into compilable code 240 241Add a command-line option to insert trace_printk call sites in the 242generated source code, for improved (temporary) observability 243 244Generate kernel Rust code as well as C code 245