14b132aacSChuck Leverxdrgen - Linux Kernel XDR code generator 24b132aacSChuck Lever 34b132aacSChuck LeverIntroduction 44b132aacSChuck Lever------------ 54b132aacSChuck Lever 64b132aacSChuck LeverSunRPC programs are typically specified using a language defined by 74b132aacSChuck LeverRFC 4506. In fact, all IETF-published NFS specifications provide a 84b132aacSChuck Leverdescription of the specified protocol using this language. 94b132aacSChuck Lever 104b132aacSChuck LeverSince the 1990's, user space consumers of SunRPC have had access to 114b132aacSChuck Levera tool that could read such XDR specifications and then generate C 124b132aacSChuck Levercode that implements the RPC portions of that protocol. This tool is 134b132aacSChuck Levercalled rpcgen. 144b132aacSChuck Lever 154b132aacSChuck LeverThis RPC-level code is code that handles input directly from the 164b132aacSChuck Levernetwork, and thus a high degree of memory safety and sanity checking 174b132aacSChuck Leveris needed to help ensure proper levels of security. Bugs in this 184b132aacSChuck Levercode can have significant impact on security and performance. 194b132aacSChuck Lever 204b132aacSChuck LeverHowever, it is code that is repetitive and tedious to write by hand. 214b132aacSChuck Lever 224b132aacSChuck LeverThe C code generated by rpcgen makes extensive use of the facilities 234b132aacSChuck Leverof the user space TI-RPC library and libc. Furthermore, the dialect 244b132aacSChuck Leverof the generated code is very traditional K&R C. 254b132aacSChuck Lever 264b132aacSChuck LeverThe Linux kernel's implementation of SunRPC-based protocols hand-roll 274b132aacSChuck Levertheir XDR implementation. There are two main reasons for this: 284b132aacSChuck Lever 294b132aacSChuck Lever1. libtirpc (and its predecessors) operate only in user space. The 304b132aacSChuck Lever kernel's RPC implementation and its API are significantly 314b132aacSChuck Lever different than libtirpc. 324b132aacSChuck Lever 334b132aacSChuck Lever2. rpcgen-generated code is believed to be less efficient than code 344b132aacSChuck Lever that is hand-written. 354b132aacSChuck Lever 364b132aacSChuck LeverThese days, gcc and its kin are capable of optimizing code better 374b132aacSChuck Leverthan human authors. There are only a few instances where writing 384b132aacSChuck LeverXDR code by hand will make a measurable performance different. 394b132aacSChuck Lever 404b132aacSChuck LeverIn addition, the current hand-written code in the Linux kernel is 414b132aacSChuck Leverdifficult to audit and prove that it implements exactly what is in 424b132aacSChuck Leverthe protocol specification. 434b132aacSChuck Lever 444b132aacSChuck LeverIn order to accrue the benefits of machine-generated XDR code in the 454b132aacSChuck Leverkernel, a tool is needed that will output C code that works against 464b132aacSChuck Leverthe kernel's SunRPC implementation rather than libtirpc. 474b132aacSChuck Lever 484b132aacSChuck LeverEnter xdrgen. 494b132aacSChuck Lever 504b132aacSChuck Lever 514b132aacSChuck LeverDependencies 524b132aacSChuck Lever------------ 534b132aacSChuck Lever 544b132aacSChuck LeverThese dependencies are typically packaged by Linux distributions: 554b132aacSChuck Lever 564b132aacSChuck Lever- python3 574b132aacSChuck Lever- python3-lark 584b132aacSChuck Lever- python3-jinja2 594b132aacSChuck Lever 604b132aacSChuck LeverThese dependencies are available via PyPi: 614b132aacSChuck Lever 624b132aacSChuck Lever- pip install 'lark[interegular]' 634b132aacSChuck Lever 644b132aacSChuck Lever 654b132aacSChuck LeverXDR Specifications 664b132aacSChuck Lever------------------ 674b132aacSChuck Lever 684b132aacSChuck LeverWhen adding a new protocol implementation to the kernel, the XDR 694b132aacSChuck Leverspecification can be derived by feeding a .txt copy of the RFC to 704b132aacSChuck Leverthe script located in tools/net/sunrpc/extract.sh. 714b132aacSChuck Lever 724b132aacSChuck Lever $ extract.sh < rfc0001.txt > new2.x 734b132aacSChuck Lever 744b132aacSChuck Lever 754b132aacSChuck LeverOperation 764b132aacSChuck Lever--------- 774b132aacSChuck Lever 784b132aacSChuck LeverOnce a .x file is available, use xdrgen to generate source and 794b132aacSChuck Leverheader files containing an implementation of XDR encoding and 804b132aacSChuck Leverdecoding functions for the specified protocol. 814b132aacSChuck Lever 824b132aacSChuck Lever $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h 834b132aacSChuck Lever $ ./xdrgen declarations new2.x > new2xdr_gen.h 844b132aacSChuck Lever 854b132aacSChuck Leverand 864b132aacSChuck Lever 874b132aacSChuck Lever $ ./xdrgen source new2.x > new2xdr_gen.c 884b132aacSChuck Lever 894b132aacSChuck LeverThe files are ready to use for a server-side protocol implementation, 904b132aacSChuck Leveror may be used as a guide for implementing these routines by hand. 914b132aacSChuck Lever 924b132aacSChuck LeverBy default, the only comments added to this code are kdoc comments 934b132aacSChuck Leverthat appear directly in front of the public per-procedure APIs. For 944b132aacSChuck Leverdeeper introspection, specifying the "--annotate" flag will insert 954b132aacSChuck Leveradditional comments in the generated code to help readers match the 964b132aacSChuck Levergenerated code to specific parts of the XDR specification. 974b132aacSChuck Lever 984b132aacSChuck LeverBecause the generated code is targeted for the Linux kernel, it 994b132aacSChuck Leveris tagged with a GPLv2-only license. 1004b132aacSChuck Lever 1014b132aacSChuck LeverThe xdrgen tool can also provide lexical and syntax checking of 1024b132aacSChuck Leveran XDR specification: 1034b132aacSChuck Lever 1044b132aacSChuck Lever $ ./xdrgen lint xdr/new.x 1054b132aacSChuck Lever 1064b132aacSChuck Lever 1074b132aacSChuck LeverHow It Works 1084b132aacSChuck Lever------------ 1094b132aacSChuck Lever 1104b132aacSChuck Leverxdrgen does not use machine learning to generate source code. The 1114b132aacSChuck Levertranslation is entirely deterministic. 1124b132aacSChuck Lever 1134b132aacSChuck LeverRFC 4506 Section 6 contains a BNF grammar of the XDR specification 1144b132aacSChuck Leverlanguage. The grammar has been adapted for use by the Python Lark 1154b132aacSChuck Levermodule. 1164b132aacSChuck Lever 1174b132aacSChuck LeverThe xdr.ebnf file in this directory contains the grammar used to 1184b132aacSChuck Leverparse XDR specifications. xdrgen configures Lark using the grammar 1194b132aacSChuck Leverin xdr.ebnf. Lark parses the target XDR specification using this 1204b132aacSChuck Levergrammar, creating a parse tree. 1214b132aacSChuck Lever 1224b132aacSChuck Leverxdrgen then transforms the parse tree into an abstract syntax tree. 1234b132aacSChuck LeverThis tree is passed to a series of code generators. 1244b132aacSChuck Lever 1254b132aacSChuck LeverThe generators are implemented as Python classes residing in the 1264b132aacSChuck Levergenerators/ directory. Each generator emits code created from Jinja2 1274b132aacSChuck Levertemplates stored in the templates/ directory. 1284b132aacSChuck Lever 1294b132aacSChuck LeverThe source code is generated in the same order in which they appear 1304b132aacSChuck Leverin the specification to ensure the generated code compiles. This 1314b132aacSChuck Leverconforms with the behavior of rpcgen. 1324b132aacSChuck Lever 1334b132aacSChuck Leverxdrgen assumes that the generated source code is further compiled by 1344b132aacSChuck Levera compiler that can optimize in a number of ways, including: 1354b132aacSChuck Lever 1364b132aacSChuck Lever - Unused functions are discarded (ie, not added to the executable) 1374b132aacSChuck Lever 1384b132aacSChuck Lever - Aggressive function inlining removes unnecessary stack frames 1394b132aacSChuck Lever 1404b132aacSChuck Lever - Single-arm switch statements are replaced by a single conditional 1414b132aacSChuck Lever branch 1424b132aacSChuck Lever 1434b132aacSChuck LeverAnd so on. 1444b132aacSChuck Lever 1454b132aacSChuck Lever 1464b132aacSChuck LeverPragmas 1474b132aacSChuck Lever------- 1484b132aacSChuck Lever 1494b132aacSChuck LeverPragma directives specify exceptions to the normal generation of 1504b132aacSChuck Leverencoding and decoding functions. Currently one directive is 1514b132aacSChuck Leverimplemented: "public". 1524b132aacSChuck Lever 153*b376d519SChuck LeverPragma big_endian 154*b376d519SChuck Lever------ ---------- 155*b376d519SChuck Lever 156*b376d519SChuck Lever pragma big_endian <enum> ; 157*b376d519SChuck Lever 158*b376d519SChuck LeverFor variables that might contain only a small number values, it 159*b376d519SChuck Leveris more efficient to avoid the byte-swap when encoding or decoding 160*b376d519SChuck Leveron little-endian machines. Such is often the case with error status 161*b376d519SChuck Levercodes. For example: 162*b376d519SChuck Lever 163*b376d519SChuck Lever pragma big_endian nfsstat3; 164*b376d519SChuck Lever 165*b376d519SChuck LeverIn this case, when generating an XDR struct or union containing a 166*b376d519SChuck Leverfield of type "nfsstat3", xdrgen will make the type of that field 167*b376d519SChuck Lever"__be32" instead of "enum nfsstat3". XDR unions then switch on the 168*b376d519SChuck Levernon-byte-swapped value of that field. 169*b376d519SChuck Lever 1704b132aacSChuck LeverPragma exclude 1714b132aacSChuck Lever------ ------- 1724b132aacSChuck Lever 1734b132aacSChuck Lever pragma exclude <RPC procedure> ; 1744b132aacSChuck Lever 1754b132aacSChuck LeverIn some cases, a procedure encoder or decoder function might need 1764b132aacSChuck Leverspecial processing that cannot be automatically generated. The 1774b132aacSChuck Leverautomatically-generated functions might conflict or interfere with 1784b132aacSChuck Leverthe hand-rolled function. To avoid editing the generated source code 1794b132aacSChuck Leverby hand, a pragma can specify that the procedure's encoder and 1804b132aacSChuck Leverdecoder functions are not included in the generated header and 1814b132aacSChuck Leversource. 1824b132aacSChuck Lever 1834b132aacSChuck LeverFor example: 1844b132aacSChuck Lever 1854b132aacSChuck Lever pragma exclude NFSPROC3_READDIRPLUS; 1864b132aacSChuck Lever 1874b132aacSChuck LeverExcludes the decoder function for the READDIRPLUS argument and the 1884b132aacSChuck Leverencoder function for the READDIRPLUS result. 1894b132aacSChuck Lever 1904b132aacSChuck LeverNote that because data item encoder and decoder functions are 1914b132aacSChuck Leverdefined "static __maybe_unused", subsequent compilation 1924b132aacSChuck Leverautomatically excludes data item encoder and decoder functions that 1934b132aacSChuck Leverare used only by excluded procedure. 1944b132aacSChuck Lever 1954b132aacSChuck LeverPragma header 1964b132aacSChuck Lever------ ------ 1974b132aacSChuck Lever 1984b132aacSChuck Lever pragma header <string> ; 1994b132aacSChuck Lever 2004b132aacSChuck LeverProvide a name to use for the header file. For example: 2014b132aacSChuck Lever 2024b132aacSChuck Lever pragma header nlm4; 2034b132aacSChuck Lever 2044b132aacSChuck LeverAdds 2054b132aacSChuck Lever 2064b132aacSChuck Lever #include "nlm4xdr_gen.h" 2074b132aacSChuck Lever 2084b132aacSChuck Leverto the generated source file. 2094b132aacSChuck Lever 2104b132aacSChuck LeverPragma public 2114b132aacSChuck Lever------ ------ 2124b132aacSChuck Lever 2134b132aacSChuck Lever pragma public <XDR data item> ; 2144b132aacSChuck Lever 2154b132aacSChuck LeverNormally XDR encoder and decoder functions are "static". In case an 2164b132aacSChuck Leverimplementer wants to call these functions from other source code, 2174b132aacSChuck Levers/he can add a public pragma in the input .x file to indicate a set 2184b132aacSChuck Leverof functions that should get a prototype in the generated header, 2194b132aacSChuck Leverand the function definitions will not be declared static. 2204b132aacSChuck Lever 2214b132aacSChuck LeverFor example: 2224b132aacSChuck Lever 2234b132aacSChuck Lever pragma public nfsstat3; 2244b132aacSChuck Lever 2254b132aacSChuck LeverAdds these prototypes in the generated header: 2264b132aacSChuck Lever 2274b132aacSChuck Lever bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr); 2284b132aacSChuck Lever bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value); 2294b132aacSChuck Lever 2304b132aacSChuck LeverAnd, in the generated source code, both of these functions appear 2314b132aacSChuck Leverwithout the "static __maybe_unused" modifiers. 2324b132aacSChuck Lever 2334b132aacSChuck Lever 2344b132aacSChuck LeverFuture Work 2354b132aacSChuck Lever----------- 2364b132aacSChuck Lever 2374b132aacSChuck LeverFinish implementing XDR pointer and list types. 2384b132aacSChuck Lever 2394b132aacSChuck LeverGenerate client-side procedure functions 2404b132aacSChuck Lever 2414b132aacSChuck LeverExpand the README into a user guide similar to rpcgen(1) 2424b132aacSChuck Lever 2434b132aacSChuck LeverAdd more pragma directives: 2444b132aacSChuck Lever 2454b132aacSChuck Lever * @pages -- use xdr_read/write_pages() for the specified opaque 2464b132aacSChuck Lever field 2474b132aacSChuck Lever * @skip -- do not decode, but rather skip, the specified argument 2484b132aacSChuck Lever field 2494b132aacSChuck Lever 2504b132aacSChuck LeverEnable something like a #include to dynamically insert the content 2514b132aacSChuck Leverof other specification files 2524b132aacSChuck Lever 2534b132aacSChuck LeverProperly support line-by-line pass-through via the "%" decorator 2544b132aacSChuck Lever 2554b132aacSChuck LeverBuild a unit test suite for verifying translation of XDR language 2564b132aacSChuck Leverinto compilable code 2574b132aacSChuck Lever 2584b132aacSChuck LeverAdd a command-line option to insert trace_printk call sites in the 2594b132aacSChuck Levergenerated source code, for improved (temporary) observability 2604b132aacSChuck Lever 2614b132aacSChuck LeverGenerate kernel Rust code as well as C code 262