xref: /linux/tools/net/sunrpc/xdrgen/README (revision 83e445e64f48bdae3f25013e788fcf592f142576)
1xdrgen - Linux Kernel XDR code generator
2
3Introduction
4------------
5
6SunRPC programs are typically specified using a language defined by
7RFC 4506. In fact, all IETF-published NFS specifications provide a
8description of the specified protocol using this language.
9
10Since the 1990's, user space consumers of SunRPC have had access to
11a tool that could read such XDR specifications and then generate C
12code that implements the RPC portions of that protocol. This tool is
13called rpcgen.
14
15This RPC-level code is code that handles input directly from the
16network, and thus a high degree of memory safety and sanity checking
17is needed to help ensure proper levels of security. Bugs in this
18code can have significant impact on security and performance.
19
20However, it is code that is repetitive and tedious to write by hand.
21
22The C code generated by rpcgen makes extensive use of the facilities
23of the user space TI-RPC library and libc. Furthermore, the dialect
24of the generated code is very traditional K&R C.
25
26The Linux kernel's implementation of SunRPC-based protocols hand-roll
27their XDR implementation. There are two main reasons for this:
28
291. libtirpc (and its predecessors) operate only in user space. The
30   kernel's RPC implementation and its API are significantly
31   different than libtirpc.
32
332. rpcgen-generated code is believed to be less efficient than code
34   that is hand-written.
35
36These days, gcc and its kin are capable of optimizing code better
37than human authors. There are only a few instances where writing
38XDR code by hand will make a measurable performance different.
39
40In addition, the current hand-written code in the Linux kernel is
41difficult to audit and prove that it implements exactly what is in
42the protocol specification.
43
44In order to accrue the benefits of machine-generated XDR code in the
45kernel, a tool is needed that will output C code that works against
46the kernel's SunRPC implementation rather than libtirpc.
47
48Enter xdrgen.
49
50
51Dependencies
52------------
53
54These dependencies are typically packaged by Linux distributions:
55
56- python3
57- python3-lark
58- python3-jinja2
59
60These dependencies are available via PyPi:
61
62- pip install 'lark[interegular]'
63
64
65XDR Specifications
66------------------
67
68When adding a new protocol implementation to the kernel, the XDR
69specification can be derived by feeding a .txt copy of the RFC to
70the script located in tools/net/sunrpc/extract.sh.
71
72   $ extract.sh < rfc0001.txt > new2.x
73
74
75Operation
76---------
77
78Once a .x file is available, use xdrgen to generate source and
79header files containing an implementation of XDR encoding and
80decoding functions for the specified protocol.
81
82   $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h
83   $ ./xdrgen declarations new2.x > new2xdr_gen.h
84
85and
86
87   $ ./xdrgen source new2.x > new2xdr_gen.c
88
89The files are ready to use for a server-side protocol implementation,
90or may be used as a guide for implementing these routines by hand.
91
92By default, the only comments added to this code are kdoc comments
93that appear directly in front of the public per-procedure APIs. For
94deeper introspection, specifying the "--annotate" flag will insert
95additional comments in the generated code to help readers match the
96generated code to specific parts of the XDR specification.
97
98Because the generated code is targeted for the Linux kernel, it
99is tagged with a GPLv2-only license.
100
101The xdrgen tool can also provide lexical and syntax checking of
102an XDR specification:
103
104   $ ./xdrgen lint xdr/new.x
105
106
107How It Works
108------------
109
110xdrgen does not use machine learning to generate source code. The
111translation is entirely deterministic.
112
113RFC 4506 Section 6 contains a BNF grammar of the XDR specification
114language. The grammar has been adapted for use by the Python Lark
115module.
116
117The xdr.ebnf file in this directory contains the grammar used to
118parse XDR specifications. xdrgen configures Lark using the grammar
119in xdr.ebnf. Lark parses the target XDR specification using this
120grammar, creating a parse tree.
121
122xdrgen then transforms the parse tree into an abstract syntax tree.
123This tree is passed to a series of code generators.
124
125The generators are implemented as Python classes residing in the
126generators/ directory. Each generator emits code created from Jinja2
127templates stored in the templates/ directory.
128
129The source code is generated in the same order in which they appear
130in the specification to ensure the generated code compiles. This
131conforms with the behavior of rpcgen.
132
133xdrgen assumes that the generated source code is further compiled by
134a compiler that can optimize in a number of ways, including:
135
136 - Unused functions are discarded (ie, not added to the executable)
137
138 - Aggressive function inlining removes unnecessary stack frames
139
140 - Single-arm switch statements are replaced by a single conditional
141   branch
142
143And so on.
144
145
146Pragmas
147-------
148
149Pragma directives specify exceptions to the normal generation of
150encoding and decoding functions. Currently one directive is
151implemented: "public".
152
153Pragma exclude
154------ -------
155
156  pragma exclude <RPC procedure> ;
157
158In some cases, a procedure encoder or decoder function might need
159special processing that cannot be automatically generated. The
160automatically-generated functions might conflict or interfere with
161the hand-rolled function. To avoid editing the generated source code
162by hand, a pragma can specify that the procedure's encoder and
163decoder functions are not included in the generated header and
164source.
165
166For example:
167
168  pragma exclude NFSPROC3_READDIRPLUS;
169
170Excludes the decoder function for the READDIRPLUS argument and the
171encoder function for the READDIRPLUS result.
172
173Note that because data item encoder and decoder functions are
174defined "static __maybe_unused", subsequent compilation
175automatically excludes data item encoder and decoder functions that
176are used only by excluded procedure.
177
178Pragma header
179------ ------
180
181  pragma header <string> ;
182
183Provide a name to use for the header file. For example:
184
185  pragma header nlm4;
186
187Adds
188
189  #include "nlm4xdr_gen.h"
190
191to the generated source file.
192
193Pragma public
194------ ------
195
196  pragma public <XDR data item> ;
197
198Normally XDR encoder and decoder functions are "static". In case an
199implementer wants to call these functions from other source code,
200s/he can add a public pragma in the input .x file to indicate a set
201of functions that should get a prototype in the generated header,
202and the function definitions will not be declared static.
203
204For example:
205
206  pragma public nfsstat3;
207
208Adds these prototypes in the generated header:
209
210  bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr);
211  bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value);
212
213And, in the generated source code, both of these functions appear
214without the "static __maybe_unused" modifiers.
215
216
217Future Work
218-----------
219
220Finish implementing XDR pointer and list types.
221
222Generate client-side procedure functions
223
224Expand the README into a user guide similar to rpcgen(1)
225
226Add more pragma directives:
227
228  * @pages -- use xdr_read/write_pages() for the specified opaque
229    field
230  * @skip -- do not decode, but rather skip, the specified argument
231    field
232
233Enable something like a #include to dynamically insert the content
234of other specification files
235
236Properly support line-by-line pass-through via the "%" decorator
237
238Build a unit test suite for verifying translation of XDR language
239into compilable code
240
241Add a command-line option to insert trace_printk call sites in the
242generated source code, for improved (temporary) observability
243
244Generate kernel Rust code as well as C code
245