xref: /linux/tools/net/sunrpc/xdrgen/README (revision e5ecedcd7cc231a115c11cfed79635583ef4f882)
1xdrgen - Linux Kernel XDR code generator
2
3Introduction
4------------
5
6SunRPC programs are typically specified using a language defined by
7RFC 4506. In fact, all IETF-published NFS specifications provide a
8description of the specified protocol using this language.
9
10Since the 1990's, user space consumers of SunRPC have had access to
11a tool that could read such XDR specifications and then generate C
12code that implements the RPC portions of that protocol. This tool is
13called rpcgen.
14
15This RPC-level code is code that handles input directly from the
16network, and thus a high degree of memory safety and sanity checking
17is needed to help ensure proper levels of security. Bugs in this
18code can have significant impact on security and performance.
19
20However, it is code that is repetitive and tedious to write by hand.
21
22The C code generated by rpcgen makes extensive use of the facilities
23of the user space TI-RPC library and libc. Furthermore, the dialect
24of the generated code is very traditional K&R C.
25
26The Linux kernel's implementation of SunRPC-based protocols hand-roll
27their XDR implementation. There are two main reasons for this:
28
291. libtirpc (and its predecessors) operate only in user space. The
30   kernel's RPC implementation and its API are significantly
31   different than libtirpc.
32
332. rpcgen-generated code is believed to be less efficient than code
34   that is hand-written.
35
36These days, gcc and its kin are capable of optimizing code better
37than human authors. There are only a few instances where writing
38XDR code by hand will make a measurable performance different.
39
40In addition, the current hand-written code in the Linux kernel is
41difficult to audit and prove that it implements exactly what is in
42the protocol specification.
43
44In order to accrue the benefits of machine-generated XDR code in the
45kernel, a tool is needed that will output C code that works against
46the kernel's SunRPC implementation rather than libtirpc.
47
48Enter xdrgen.
49
50
51Dependencies
52------------
53
54These dependencies are typically packaged by Linux distributions:
55
56- python3
57- python3-lark
58- python3-jinja2
59
60These dependencies are available via PyPi:
61
62- pip install 'lark[interegular]'
63
64
65XDR Specifications
66------------------
67
68When adding a new protocol implementation to the kernel, the XDR
69specification can be derived by feeding a .txt copy of the RFC to
70the script located in tools/net/sunrpc/extract.sh.
71
72   $ extract.sh < rfc0001.txt > new2.x
73
74
75Operation
76---------
77
78Once a .x file is available, use xdrgen to generate source and
79header files containing an implementation of XDR encoding and
80decoding functions for the specified protocol.
81
82   $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h
83   $ ./xdrgen declarations new2.x > new2xdr_gen.h
84
85and
86
87   $ ./xdrgen source new2.x > new2xdr_gen.c
88
89The files are ready to use for a server-side protocol implementation,
90or may be used as a guide for implementing these routines by hand.
91
92By default, the only comments added to this code are kdoc comments
93that appear directly in front of the public per-procedure APIs. For
94deeper introspection, specifying the "--annotate" flag will insert
95additional comments in the generated code to help readers match the
96generated code to specific parts of the XDR specification.
97
98Because the generated code is targeted for the Linux kernel, it
99is tagged with a GPLv2-only license.
100
101The xdrgen tool can also provide lexical and syntax checking of
102an XDR specification:
103
104   $ ./xdrgen lint xdr/new.x
105
106
107How It Works
108------------
109
110xdrgen does not use machine learning to generate source code. The
111translation is entirely deterministic.
112
113RFC 4506 Section 6 contains a BNF grammar of the XDR specification
114language. The grammar has been adapted for use by the Python Lark
115module.
116
117The xdr.ebnf file in this directory contains the grammar used to
118parse XDR specifications. xdrgen configures Lark using the grammar
119in xdr.ebnf. Lark parses the target XDR specification using this
120grammar, creating a parse tree.
121
122xdrgen then transforms the parse tree into an abstract syntax tree.
123This tree is passed to a series of code generators.
124
125The generators are implemented as Python classes residing in the
126generators/ directory. Each generator emits code created from Jinja2
127templates stored in the templates/ directory.
128
129The source code is generated in the same order in which they appear
130in the specification to ensure the generated code compiles. This
131conforms with the behavior of rpcgen.
132
133xdrgen assumes that the generated source code is further compiled by
134a compiler that can optimize in a number of ways, including:
135
136 - Unused functions are discarded (ie, not added to the executable)
137
138 - Aggressive function inlining removes unnecessary stack frames
139
140 - Single-arm switch statements are replaced by a single conditional
141   branch
142
143And so on.
144
145
146Pragmas
147-------
148
149Pragma directives specify exceptions to the normal generation of
150encoding and decoding functions. Currently one directive is
151implemented: "public".
152
153Pragma big_endian
154------ ----------
155
156  pragma big_endian <enum> ;
157
158For variables that might contain only a small number values, it
159is more efficient to avoid the byte-swap when encoding or decoding
160on little-endian machines. Such is often the case with error status
161codes. For example:
162
163  pragma big_endian nfsstat3;
164
165In this case, when generating an XDR struct or union containing a
166field of type "nfsstat3", xdrgen will make the type of that field
167"__be32" instead of "enum nfsstat3". XDR unions then switch on the
168non-byte-swapped value of that field.
169
170Pragma exclude
171------ -------
172
173  pragma exclude <RPC procedure> ;
174
175In some cases, a procedure encoder or decoder function might need
176special processing that cannot be automatically generated. The
177automatically-generated functions might conflict or interfere with
178the hand-rolled function. To avoid editing the generated source code
179by hand, a pragma can specify that the procedure's encoder and
180decoder functions are not included in the generated header and
181source.
182
183For example:
184
185  pragma exclude NFSPROC3_READDIRPLUS;
186
187Excludes the decoder function for the READDIRPLUS argument and the
188encoder function for the READDIRPLUS result.
189
190Note that because data item encoder and decoder functions are
191defined "static __maybe_unused", subsequent compilation
192automatically excludes data item encoder and decoder functions that
193are used only by excluded procedure.
194
195Pragma header
196------ ------
197
198  pragma header <string> ;
199
200Provide a name to use for the header file. For example:
201
202  pragma header nlm4;
203
204Adds
205
206  #include "nlm4xdr_gen.h"
207
208to the generated source file.
209
210Pragma public
211------ ------
212
213  pragma public <XDR data item> ;
214
215Normally XDR encoder and decoder functions are "static". In case an
216implementer wants to call these functions from other source code,
217s/he can add a public pragma in the input .x file to indicate a set
218of functions that should get a prototype in the generated header,
219and the function definitions will not be declared static.
220
221For example:
222
223  pragma public nfsstat3;
224
225Adds these prototypes in the generated header:
226
227  bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr);
228  bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value);
229
230And, in the generated source code, both of these functions appear
231without the "static __maybe_unused" modifiers.
232
233
234Future Work
235-----------
236
237Finish implementing XDR pointer and list types.
238
239Generate client-side procedure functions
240
241Expand the README into a user guide similar to rpcgen(1)
242
243Add more pragma directives:
244
245  * @pages -- use xdr_read/write_pages() for the specified opaque
246    field
247  * @skip -- do not decode, but rather skip, the specified argument
248    field
249
250Enable something like a #include to dynamically insert the content
251of other specification files
252
253Properly support line-by-line pass-through via the "%" decorator
254
255Build a unit test suite for verifying translation of XDR language
256into compilable code
257
258Add a command-line option to insert trace_printk call sites in the
259generated source code, for improved (temporary) observability
260
261Generate kernel Rust code as well as C code
262