1xdrgen - Linux Kernel XDR code generator 2 3Introduction 4------------ 5 6SunRPC programs are typically specified using a language defined by 7RFC 4506. In fact, all IETF-published NFS specifications provide a 8description of the specified protocol using this language. 9 10Since the 1990's, user space consumers of SunRPC have had access to 11a tool that could read such XDR specifications and then generate C 12code that implements the RPC portions of that protocol. This tool is 13called rpcgen. 14 15This RPC-level code is code that handles input directly from the 16network, and thus a high degree of memory safety and sanity checking 17is needed to help ensure proper levels of security. Bugs in this 18code can have significant impact on security and performance. 19 20However, it is code that is repetitive and tedious to write by hand. 21 22The C code generated by rpcgen makes extensive use of the facilities 23of the user space TI-RPC library and libc. Furthermore, the dialect 24of the generated code is very traditional K&R C. 25 26The Linux kernel's implementation of SunRPC-based protocols hand-roll 27their XDR implementation. There are two main reasons for this: 28 291. libtirpc (and its predecessors) operate only in user space. The 30 kernel's RPC implementation and its API are significantly 31 different than libtirpc. 32 332. rpcgen-generated code is believed to be less efficient than code 34 that is hand-written. 35 36These days, gcc and its kin are capable of optimizing code better 37than human authors. There are only a few instances where writing 38XDR code by hand will make a measurable performance different. 39 40In addition, the current hand-written code in the Linux kernel is 41difficult to audit and prove that it implements exactly what is in 42the protocol specification. 43 44In order to accrue the benefits of machine-generated XDR code in the 45kernel, a tool is needed that will output C code that works against 46the kernel's SunRPC implementation rather than libtirpc. 47 48Enter xdrgen. 49 50 51Dependencies 52------------ 53 54These dependencies are typically packaged by Linux distributions: 55 56- python3 57- python3-lark 58- python3-jinja2 59 60These dependencies are available via PyPi: 61 62- pip install 'lark[interegular]' 63 64 65XDR Specifications 66------------------ 67 68When adding a new protocol implementation to the kernel, the XDR 69specification can be derived by feeding a .txt copy of the RFC to 70the script located in tools/net/sunrpc/extract.sh. 71 72 $ extract.sh < rfc0001.txt > new2.x 73 74 75Operation 76--------- 77 78Once a .x file is available, use xdrgen to generate source and 79header files containing an implementation of XDR encoding and 80decoding functions for the specified protocol. 81 82 $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h 83 $ ./xdrgen declarations new2.x > new2xdr_gen.h 84 85and 86 87 $ ./xdrgen source new2.x > new2xdr_gen.c 88 89The files are ready to use for a server-side protocol implementation, 90or may be used as a guide for implementing these routines by hand. 91 92By default, the only comments added to this code are kdoc comments 93that appear directly in front of the public per-procedure APIs. For 94deeper introspection, specifying the "--annotate" flag will insert 95additional comments in the generated code to help readers match the 96generated code to specific parts of the XDR specification. 97 98Because the generated code is targeted for the Linux kernel, it 99is tagged with a GPLv2-only license. 100 101The xdrgen tool can also provide lexical and syntax checking of 102an XDR specification: 103 104 $ ./xdrgen lint xdr/new.x 105 106 107How It Works 108------------ 109 110xdrgen does not use machine learning to generate source code. The 111translation is entirely deterministic. 112 113RFC 4506 Section 6 contains a BNF grammar of the XDR specification 114language. The grammar has been adapted for use by the Python Lark 115module. 116 117The xdr.ebnf file in this directory contains the grammar used to 118parse XDR specifications. xdrgen configures Lark using the grammar 119in xdr.ebnf. Lark parses the target XDR specification using this 120grammar, creating a parse tree. 121 122xdrgen then transforms the parse tree into an abstract syntax tree. 123This tree is passed to a series of code generators. 124 125The generators are implemented as Python classes residing in the 126generators/ directory. Each generator emits code created from Jinja2 127templates stored in the templates/ directory. 128 129The source code is generated in the same order in which they appear 130in the specification to ensure the generated code compiles. This 131conforms with the behavior of rpcgen. 132 133xdrgen assumes that the generated source code is further compiled by 134a compiler that can optimize in a number of ways, including: 135 136 - Unused functions are discarded (ie, not added to the executable) 137 138 - Aggressive function inlining removes unnecessary stack frames 139 140 - Single-arm switch statements are replaced by a single conditional 141 branch 142 143And so on. 144 145 146Pragmas 147------- 148 149Pragma directives specify exceptions to the normal generation of 150encoding and decoding functions. Currently one directive is 151implemented: "public". 152 153Pragma big_endian 154------ ---------- 155 156 pragma big_endian <enum> ; 157 158For variables that might contain only a small number values, it 159is more efficient to avoid the byte-swap when encoding or decoding 160on little-endian machines. Such is often the case with error status 161codes. For example: 162 163 pragma big_endian nfsstat3; 164 165In this case, when generating an XDR struct or union containing a 166field of type "nfsstat3", xdrgen will make the type of that field 167"__be32" instead of "enum nfsstat3". XDR unions then switch on the 168non-byte-swapped value of that field. 169 170Pragma exclude 171------ ------- 172 173 pragma exclude <RPC procedure> ; 174 175In some cases, a procedure encoder or decoder function might need 176special processing that cannot be automatically generated. The 177automatically-generated functions might conflict or interfere with 178the hand-rolled function. To avoid editing the generated source code 179by hand, a pragma can specify that the procedure's encoder and 180decoder functions are not included in the generated header and 181source. 182 183For example: 184 185 pragma exclude NFSPROC3_READDIRPLUS; 186 187Excludes the decoder function for the READDIRPLUS argument and the 188encoder function for the READDIRPLUS result. 189 190Note that because data item encoder and decoder functions are 191defined "static __maybe_unused", subsequent compilation 192automatically excludes data item encoder and decoder functions that 193are used only by excluded procedure. 194 195Pragma header 196------ ------ 197 198 pragma header <string> ; 199 200Provide a name to use for the header file. For example: 201 202 pragma header nlm4; 203 204Adds 205 206 #include "nlm4xdr_gen.h" 207 208to the generated source file. 209 210Pragma public 211------ ------ 212 213 pragma public <XDR data item> ; 214 215Normally XDR encoder and decoder functions are "static". In case an 216implementer wants to call these functions from other source code, 217s/he can add a public pragma in the input .x file to indicate a set 218of functions that should get a prototype in the generated header, 219and the function definitions will not be declared static. 220 221For example: 222 223 pragma public nfsstat3; 224 225Adds these prototypes in the generated header: 226 227 bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr); 228 bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value); 229 230And, in the generated source code, both of these functions appear 231without the "static __maybe_unused" modifiers. 232 233 234Future Work 235----------- 236 237Finish implementing XDR pointer and list types. 238 239Generate client-side procedure functions 240 241Expand the README into a user guide similar to rpcgen(1) 242 243Add more pragma directives: 244 245 * @pages -- use xdr_read/write_pages() for the specified opaque 246 field 247 * @skip -- do not decode, but rather skip, the specified argument 248 field 249 250Enable something like a #include to dynamically insert the content 251of other specification files 252 253Properly support line-by-line pass-through via the "%" decorator 254 255Build a unit test suite for verifying translation of XDR language 256into compilable code 257 258Add a command-line option to insert trace_printk call sites in the 259generated source code, for improved (temporary) observability 260 261Generate kernel Rust code as well as C code 262