xref: /illumos-gate/usr/src/man/man7/style.7 (revision 590e0b5da08d7261161e979afc4bf4aa0f543574)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright (c) 1993 by Sun Microsystems, Inc.
13.\" Copyright 2024 Oxide Computer Company
14.\"
15.Dd March 17, 2024
16.Dt STYLE 7
17.Os
18.Sh NAME
19.Nm STYLE
20.Nd C Style and Coding Standards for illumos
21.Sh SYNOPSIS
22This document describes a set of coding standards for C programs written in
23illumos gate, the illumos source repository.
24.Pp
25This document is based on
26.%T C Style and Coding Standards for SunOS
27by Bill Shannon.
28.Sh INTRODUCTION
29The purpose of the document is to establish a consistent style for C program
30source files within the illumos gate.
31Collectively, this document describes the
32.Dq illumos C style ,
33and the scope is limited to the application of illumos coding style to the C
34language.
35.Pp
36Source code tends to be read many more times than it is written or modified.
37Using a consistent style makes it easier for multiple people to co-operate in
38the development and maintenance of programs.
39It reduces cognitive complexity by eliminating superficial differences, freeing
40the programmer to concentrate on the task at hand.
41This in turn aids review and analysis of code, since small stylistic
42distractions are eliminated.
43Further, eliding such distractions makes it easier for programmers to work on
44unfamiliar parts of the code base.
45Finally, it facilitates the construction of tools that incorporate the rules in
46this standard to help programmers prepare programs.
47For example, automated formatters, text editor integration, and so on, can refer
48to this document to understand the rules of illumos C.
49.Pp
50Of necessity, these standards cannot cover all situations.
51Experience and informed judgment count for much.
52Inexperienced programmers who encounter unusual situations should refer to code
53written by experienced C programmers following these rules, or consult with
54experienced illumos programmers for help with creating a stylistically
55acceptable solution.
56.Pp
57The illumos code base has a long history, dating back to the original Unix from
58AT&T and Bell Labs.
59Furthermore, for many years the C style was not formally defined, and there was
60much variation and many corner cases as it evolved.
61As such, it is possible to find examples of code that do not conform to this
62standard in the source tree.
63If possible, strongly consider converting to this style before beginning
64substantial work on such code.
65If that is not practical, then favor consistency with surrounding code over
66conformity.
67All new code should conform to these rules.
68.\"
69.Sh Character Set
70Source files predominantly use ASCII printing characters.
71However, UTF-8 may be used when required for accurate representation of names
72and proper nouns in comments and string literals.
73Exercise some care here, however: be aware that source files are consumed by
74many tools beyond just compilers, and some may not be able to cope with
75multi-byte extended characters.
76In particular,
77.St -isoC-99
78is used for most illumos source and technically limits its
79.Dq extended
80character set to characters that can fit into a single byte.
81C11 relaxes this, and most illumos tools are already fairly tolerant here,
82but use sound judgment with non-ASCII characters: people should not be forced
83to change their names, but do not add emoji or other extraneous content.
84UTF-8 may not be used in identifiers.
85.Pp
86Generally favor ASCII-only in header files.
87For new code, Avoid non-ASCII characters from non-UTF-8 character encodings
88such as ISO-8859-1 or similar.
89Pseudo-graphical line printing characters and similar glyphs are not permitted,
90though diagrams made with
91.Dq ASCII art
92using
93.Sq + ,
94.Sq - ,
95.Sq |
96and so on are permitted.
97Non-printing characters, such as control characters
98.Pq including form-feeds, backspace, and similar
99should not appear in source files.
100Terminal escape sequences to change text color or position are similarly
101prohibited.
102.Pp
103Inside of string constants, prefer C escape sequences instead of literal
104characters for tabs, form-feeds, carriage returns, newlines, and so on.
105Obviously, use a literal space character when a space is required in a string:
106do not use octal or hex escape sequences when a space literal will do.
107.Pp
108Generally prefer the use of C character constants to numeric code points.
109For example, use
110.Bd -literal -offset indent
111if (*p == '\en')
112	return (EOL);		/* end of line */
113.Ed
114.Pp
115instead of,
116.Bd -literal -offset indent
117#define NL 10
118if (*p == NL)
119	return (EOL);		/* end of line */
120.Ed
121.Pp
122An exception here may be if reading octet-oriented data where specific values
123are known in advance, such as when parsing data read from a socket.
124.\"
125.Sh Lines in Source Files
126Lines in source files are limited to 80 columns.
127If a logical line exceeds this, it must be broken and continued on a new line.
128.Ss Continuation Lines
129Continuation lines are used when a logical statement or expression will not fit
130in the available space, such as a procedure call with many arguments, or a
131complex boolean or arithmetic expression.
132When this happens, the line should be broken as follows:
133.Bl -bullet -offset indent
134.It
135After a comma in the case of a function call or function definition.
136Note, never break in the middle of a parameter expression, such as between the
137type and argument name.
138.It
139After the last operator that fits on the line for arithmetic, boolean and
140ternary expressions.
141.El
142.Pp
143A continuation line should never start with a logical or binary operator.
144The next line should be further indented by four literal space characters
145.Pq half a tab stop .
146If needed, subsequent continuation lines should be broken in the same manner,
147and aligned with each other.
148For example,
149.Bd -literal -offset indent
150if (long_logical_test_1 || long_logical_test_2 ||
151    long_logical_test_3) {
152	statements;
153}
154.Ed
155.Bd -literal -offset indent
156a = (long_identifier_term1 - long_identifier_term2) *
157    long_identifier_term3;
158.Ed
159.Bd -literal -offset indent
160function(long_complicated_expression1, long_complicated_expression2,
161    long_complicated_expression3, long_complicated_expression4,
162    long_complicated_expression5, long_complicated_expression6)
163.Ed
164.Pp
165It is acceptable to break a line earlier than necessary in order to keep
166constructs together to aid readability or understanding.
167For example,
168.Bd -literal -offset indent
169if ((flag & FLAG1) != 0 ||
170    (flag & FLAG2) != 0 ||
171    (flag & FLAG3) != 0) {
172	statements;
173}
174.Ed
175.Pp
176Continuation lines usually occur when blocks are deeply nested or very long
177identifiers are used, or functions have many parameters.
178Often, this is a sign that code should be rewritten or broken up, or that the
179variable name is not fit for purpose.
180A strategically introduced temporary variable may help clarify the code.
181Breaking a particularly large function with deeply nested blocks up into
182multiple, smaller functions can be an improvement.
183Using a structure to group arguments together instead of having many positional
184parameters can make function signatures shorter and easier to understand.
185.\"
186.Ss Indentation and White Space
187Initial indentation must use only tab characters, with tabs set to eight spaces.
188Continuation lines are indented with tabs to the continued line, and then
189further indented by another four spaces, as described above.
190If indentation causes the code to be too wide to fit in 80 columns, it may be
191too complex and would be clearer if it were rewritten, as described above.
192The rules for how to indent particular C constructs such as
193.Ic if , for
194and
195.Ic switch
196are described in
197.Sx Compound Statements .
198.Pp
199Tab characters may also be used for alignment beyond indentation within source
200files, such as to line up comments, but avoid using spaces for this.
201Note that
202.Dq ASCII art
203diagrams in block comments are explicitly exempt from this rule, and may use
204spaces for alignment as needed.
205A space followed by a tab outside of a string constant is forbidden.
206.Pp
207Trailing white space is not permitted, whether at the ends of lines or at the
208end of the file.
209That is, neither trailing blanks or tabs at the ends of lines nor additional
210newlines at the end of a file are allowed.
211The last character in each source file should be a newline character.
212.\"
213.Sh Comments
214Comments should be used to give overviews of code and provide additional
215information that is not readily apparent from the source itself.
216Comments should only be used to describe
217.Em what
218code does or
219.Em why
220it is implemented the way that it is, but should not describe
221.Em how
222code works.
223Very rare exceptions are allowed for cases where the implementation is
224particularly subtle.
225.Pp
226Source files should begin with a block comment that includes license information
227for that file, as well as a list of copyright holders.
228However, source files should not contain comments listing authors or the
229modification history for the file: this information belongs in the revision
230control system and issue tracker.
231Following the copyright material, an explanatory comment that describes the
232file's purpose, provides background, refences to relevant standards, or similar
233information, is helpful.
234A suitable template for new files can be found in
235.Pa usr/src/prototypes
236within the illumos-gate code repository.
237.Pp
238Front-matter aside, comments should only contain information that is germane to
239reading and understanding the program.
240External information, such about how the corresponding package is built or what
241directory it should reside in should not be in a comment in a source file.
242Discussions of non-trivial design decisions are appropriate if they aid in
243understanding the code, but again avoid duplicating information that is present
244in, and clear from, the code.
245In general, avoid including information that is likely to become out-of-date in
246comments; for example, specific section numbers of rapidly evolving documents
247may change over time.
248.Pp
249Comments should
250.Em not
251be enclosed in large boxes drawn with asterisks or other characters.
252Comments should never include special characters, such as form-feed or
253backspace, and no terminal drawing characters.
254.Pp
255There are three styles of comments:
256block, single-line, and trailing.
257.Ss Block Comments
258The opening
259.Sq /*
260of a block comment that appears at the top-level of a file
261.Pq that is, outside of a function, structure definition, or similar construct
262should be in column one.
263There should be a
264.Sq \&*
265in column 2 before each line of text in the block comment,
266and the closing
267.Sq */
268should be in columns 2-3, so that the
269.Sm off
270.Sq \&*
271s
272.Sm on
273line up.
274This enables
275.Ql grep ^.\e*
276to match all of the top-level block comments in a file.
277There is never any text on the first or last lines of a block comment.
278The initial text line is separated from the * by a single space, although
279later text lines may be further indented, as appropriate for clarity.
280.Bd -literal -offset indent
281/*
282 * Here is a block comment.
283 * The comment text should be spaced or tabbed over
284 * and the opening slash-star and closing star-slash
285 * should be alone on a line.
286 */
287.Ed
288.Pp
289Block comments are used to provide high-level, natural language descriptions of
290the content of files, the purpose of functions, and to describe data structures
291and algorithms.
292Block comments should be used at the beginning of each file and before
293functions as necessary.
294.Pp
295The very first comment in a file should be front-matter containing license and
296copyright information, as mentioned above.
297.Pp
298Following the front-matter, files should have block comments that describe
299their contents and any special considerations the reader should take note of
300while reading.
301.Pp
302A block comment preceding a function should document what it does, input
303parameters, algorithm, and returned value.
304For example,
305.Bd -literal -offset indent
306/*
307 * index(c, str) returns a pointer to the first occurrence of
308 * character c in string str, or NULL if c doesn't occur
309 * in the string.
310 */
311.Ed
312.Pp
313In many cases, block comments inside a function are appropriate, and they
314should be indented to the same indentation level as the code that they
315describe.
316.Pp
317Block comments should contain complete, correct sentences and should follow
318the English language rules for punctuation, grammar, and capitalization.
319Sentences should be separated by either a single space or two space characters,
320and such spacing should be consistent within a comment.
321That is, either always separate sentences with a single space or with two
322spaces, but do not mix styles within a comment
323.Pq and ideally do not mix styles within a source file .
324Paragraphs within a block comment should be separated by an empty line
325containing only a space,
326.Sq \&*
327and newline.
328For example,
329.Bd -literal -offset indent
330/*
331 * This is a block comment.  It consists of several sentences
332 * that are separated by two space characters.  It should say
333 * something significant about the code.
334 *
335 * This comment also contains two separate paragraphs, separated
336 * by an "empty" line.  Note that the "empty" line still has the
337 * leading ' *'.
338 */
339.Ed
340.Pp
341Do not indent paragraphs with spaces or tabs.
342.Ss Single-Line Comments
343A single-line comment is a short comment that may appear on a single line
344indented so that it matches the code that follows.
345Short phrases or sentence fragments are acceptable in single-line comments.
346.Bd -literal -offset indent
347if (argc > 1) {
348	/* get input file from command line */
349	if (freopen(argv[1], "r", stdin) == NULL)
350		err(EXIT_FAILURE, "can't open %s\en", argv[1]);
351}
352.Ed
353.Pp
354The comment text should be separated from the opening
355.Sq /*
356and closing
357.Sq */
358by a space.
359.Pp
360The closing
361.Sm off
362.Sq */
363s
364.Sm on
365of several adjacent single-line comments should
366.Em not
367be forced to be aligned vertically.
368In general, a block comment should be used when a single line is insufficient.
369.Ss Trailing Comments
370Very short comments may appear on the same line as the code they describe,
371but should be tabbed over far enough to separate them from the statements.
372If more than one short comment appears in a block of code, they should all
373be tabbed to the same indentation level.
374Trailing comments are most often sentence fragments or short phrases.
375.Bd -literal -offset indent
376if (a == 2)
377	return (TRUE);		/* special case */
378else
379	return (isprime(a));	/* works only for odd a */
380.Ed
381.Pp
382Trailing comments are most useful for documenting declarations and non-obvious
383cases.
384Avoid the assembly language style of commenting every line of executable code
385with a trailing comment.
386.Pp
387Trailing comments are often also used on preprocessor
388.Sy #else
389and
390.Sy #endif
391statements if they are far away from the corresponding test.
392See
393.Sx Preprocessor
394for more guidance on this.
395.Ss XXX and TODO comments
396Do not add
397.Dq XXX
398or
399.Dq TODO
400comments in new code.
401.\"
402.Sh Naming Conventions
403It has been said that naming things is the hardest problem in computer science,
404and the longevity of illumos means that there is wide variation across the
405source base when it comes to identifiers.
406Much of this was driven by the demands of early C dialects, that restricted
407externally visible identifiers to 6 significant characters.
408While this ancient restriction no longer applies in modern C, there is still an
409aesthetic preference for brevity and some argument about backwards compatibility
410with third-party compilers.
411Regardless, consistent application of conventions for identifiers can make
412programs more understandable and easier to read.
413Naming conventions can also give information about the function of the
414identifier, whether constants, named types, variables, or similar, that can be
415helpful in understanding code.
416Programmers should therefore be consistent in using naming conventions within a
417project.
418Individual projects will undoubtedly have their own naming conventions
419incorporating terminology specific to that project.
420.Pp
421In general, the following guidelines should be followed:
422.Bl -bullet -offset indent
423.It
424The length of a name should be proportional to its scope.
425An identifier declared at global scope would generally be longer than one
426declared in a small block; an index variable used in a one-line loop might be a
427single character.
428.It
429Names should be short but meaningful.
430Favor brevity.
431.It
432One character names should be avoided except for temporary variables of short
433scope.
434If one uses a single character name, then use variables
435.Va i , j , k , m , n
436for integers,
437.Va c , d , e
438for characters,
439.Va p , q
440for pointers, and
441.Va s , t
442for character pointers.
443Avoid variable
444.Va l
445.Pq lower-case L
446because it is hard to distinguish between
447.Sq 1
448.Pq the digit one
449and
450.Sq I
451.Pq capital i
452on some printers and displays.
453.It
454Pointers may have a
455.Sq p
456appended to their names for each level of indirection.
457For example, a pointer to the variable
458.Va dbaddr
459can be named
460.Va dbaddrp
461.Po or perhaps simply
462.Va dp
463.Pc ,
464if the scope is small enough.
465Similarly,
466.Va dbaddrpp
467would be a pointer to a pointer to
468.Va dbaddr .
469.It
470Separate
471.Dq words
472in a long identifier with underscores:
473.Pp
474.Dl create_panel_item
475.Pp
476Mixed case names like
477.Sq CreatePanelItem ,
478or
479.Sq javaStyleName ,
480are strongly discouraged and should not be used for new code.
481.It
482Leading underscores are reserved by the C standard and generally should not be
483used in identifiers for user-space programs.
484They may be used in user-space libraries or in the kernel with some caution,
485though be careful to avoid conflicts with constructs from standard C, such
486as
487.Sq _Bool ,
488.Sq _Alignof ,
489and so on.
490Trailing underscores should be similarly avoided in user-space programs.
491.It
492Two conventions are used for named types in the form of typedefs.
493Within the kernel and in many places in userland, named types are given
494a name ending in
495.Sq _t ,
496for example,
497.Bd -literal -offset indent
498typedef enum { FALSE, TRUE } bool_t;
499typedef struct node node_t;
500.Ed
501.Pp
502Technically such names are reserved by POSIX, but some liberties are taken
503here given both the age and provenance of the illumos code base.
504Note that typedefs for function pointer types may end in
505.Sq _f
506to signify that they refer to function types.
507.Pp
508In some user programs named types have their first letter capitalized, as in,
509.Bd -literal -offset indent
510typedef enum { FALSE, TRUE } Bool;
511typedef struct node Node;
512.Ed
513This practice is deprecated; all new code must use the
514.Sq _f
515and
516.Sq _t
517suffices for named types.
518.It
519.Ic #define
520names for constants should be in all CAPS.
521Separate words with underscores, as for variable names.
522.It
523Function-like macro names may be all CAPS or all lower case.
524Prefer all upper case macro names for new code.
525Some macros
526.Po such as
527.Xr getchar 3C
528and
529.Xr putchar 3C
530.Pc
531are in lower case since they may also exist as functions.
532Others, such as
533.Xr major 3C ,
534.Xr minor 3C ,
535and
536.Xr makedev 3C
537are macros for historical reasons.
538.It
539Variable names, structure tag names, and function names should be lower case.
540.Pp
541Note: in general, with the exception of named types, it is best to avoid names
542that differ only in case, like
543.Va foo
544and
545.Va FOO .
546The potential for confusion is considerable.
547However, it is acceptable to use a name which differs only in capitalization
548from its base type for a typedef, such as,
549.Pp
550.Dl typedef struct node Node;
551.Pp
552It is also acceptable to give a variable of this type a name that is the
553all lower case version of the type name.
554For example,
555.Bd -literal -offset indent
556Node node;
557.Ed
558.It
559Struct members should be prefixed with an identifier as described in
560.Sx Structures and Unions .
561.It
562The individual items of enums should be made unique names by prefixing them
563with a tag identifying the package to which they belong.
564For example,
565.Bd -literal -offset indent
566enum rainbow { RB_red, RB_orange, RB_green, RB_blue };
567.Ed
568.Pp
569The
570.Xr mdb 1
571debugger supports enums in that it can print out the value of an enum, and can
572also perform assignment statements using an item in the range of an enum.
573Thus, the use of enums over equivalent
574.Ic #define Ns No s
575may aid debugging programs.
576For example, rather than writing:
577.Bd -literal -offset indent
578#define	SUNDAY	0
579#define	MONDAY	1
580.Ed
581.Pp
582write:
583.Bd -literal -offset indent
584enum day_of_week { DW_SUNDAY, DW_MONDAY, ... };
585.Ed
586.Pp
587Enums of this sort can be particularly useful for bitfields, as the
588.Xr mdb 1
589debugger can decode them symbolically.
590For example, an instance of:
591.Bd -literal -offset indent
592enum vmx_caps {
593        VMX_CAP_NONE            = 0,
594        VMX_CAP_TPR_SHADOW      = (1UL << 0),
595        VMX_CAP_APICV           = (1UL << 1),
596        VMX_CAP_APICV_X2APIC    = (1UL << 2),
597        VMX_CAP_APICV_PIR       = (1UL << 3),
598};
599.Ed
600.Pp
601with all bits set is printed by
602.Xr mdb 1
603as
604.Bd -literal -offset indent
6050xf (VMX_CAP_{TPR_SHADOW|APICV|APICV_X2APIC|APICV_PIR})
606.Ed
607.It
608Implementors of libraries should take care to hide symbols that are private
609to the library.
610If a symbol is local to a single module, one may simply declare it as
611.Ic static .
612For symbols that are shared between several translation units in the same
613library, and therefore must be declared
614.Ic extern ,
615the programmer should use the linker and mapfiles to hide private symbols.
616For symbols that are logically private to group of libraries, one may use
617a naming convention, such as prefixing the name with an underscore and a tag
618that is unique to the package, such as,
619.Ql _panel_caret_mpr ,
620but it is not necessary to use stylistic conventions to hide symbols that
621will not be exported.
622Programmers may optionally use such a naming convention as an additional signal
623that symbols are internal to a library, but this is not required.
624.It
625One should always use care to avoid conflicts with identifiers reserved by C.
626.It
627Generally use nouns for type names and verbs or verb phrases for functions.
628.El
629.\"
630.Sh Declarations
631There is considerable variation in the format of declarations within the illumos
632gate.
633As an example, there are many places that use one declaration per line, and
634employ tab characters to line up the variable names:
635.Bd -literal -offset indent
636int	level;		/* indentation level */
637int	size;		/* size of symbol table */
638int	lines;		/* lines read from input */
639.Ed
640.Pp
641and it is also common to declarations combined into a single line, particularly
642when the variable names are self-explanatory or temporary:
643.Bd -literal -offset indent
644int level, size, lines;
645.Ed
646.Pp
647Indentation between type names or qualifiers and identifiers also varies.
648Some use no such indentation:
649.Bd -literal -offset indent
650int level,
651volatile uint8_t byte;
652char *ptr;
653.Ed
654.Pp
655while many programmers feel that aligning variable declarations makes code more
656readable:
657.Bd -literal -offset indent
658int		x;
659extern int	y;
660volatile int	count;
661char		**pointer_to_string;
662.Ed
663.Pp
664However note that declarations such as the following probably make code
665.Em harder
666to read:
667.Bd -literal -offset indent
668struct very_long_structure_name			*p;
669struct another_very_long_structure_name		*q;
670char						*s;
671int						i;
672short						r;
673.Ed
674.Pp
675While these styles vary, there are some rules which should be applied
676consistently:
677.Bl -bullet -offset indent
678.It
679Always use function prototypes in preference to old-style function
680declarations for new code.
681.It
682Variables and functions should not be declared on the same line.
683.It
684Variables which are initialized at the time of declaration should be declared
685on separate lines.
686That is one should write:
687.Bd -literal -offset indent
688int size, lines;
689int level = 0;
690.Ed
691.Pp
692instead of:
693.Bd -literal -offset indent
694int level = 0, size, lines;
695.Ed
696.It
697Variable declarations should be scoped to the smallest possible block in which
698they are used.
699.It
700Variable names within inner blocks should not shadow those at higher levels.
701.It
702For code compiled with flags that enable
703.St -isoC-99
704features, additionally:
705.Bl -bullet -offset indent
706.It
707A
708.Ic for
709loop may declare and initialize its counting variable.
710Note that the most appropriate type for counting variables is often
711.Vt size_t
712or
713.Vt uint_t
714rather than
715.Vt int .
716In particular, take care when indexing into arrays:
717.Vt size_t
718is guaranteed to be large enough to index any array, whereas
719.Vt uint_t
720is not.
721.It
722Variables do not have to be declared at the start of a block.
723However, care should be taken to use this feature only where it makes the code
724more readable.
725.El
726.El
727.Ss External Declarations
728External declarations should begin in column 1.
729Each declaration should be on a separate line.
730A comment describing the role of the object being declared should be included,
731with the exception that a list of defined constants does not need comments if
732the constant names themselves are sufficient documentation.
733Constant names and their defined values should be tabbed so that they line up
734underneath each other.
735For a block of related objects, a single block comment is sufficient.
736However, if trailing comments are used, these should also be tabbed to line up
737underneath each other.
738.Ss Structures and Unions
739For structure and union template declarations, each element should be on its
740own line with a comment describing it.
741The
742.Ic struct
743keyword and opening brace
744.Sq \&{
745should be on the same line as the structure tag, and the closing brace should
746be alone on a line in column 1.
747Each member is indented by one tab:
748.Bd -literal -offset indent
749struct boat {
750	int	b_wllength;	/* water line length in feet */
751	int	b_type;		/* see below */
752	long	b_sarea;	/* sail area in square feet */
753};
754.Ed
755.Pp
756Struct members should be prefixed with an abbreviation of the struct name
757followed by an underscore
758.Pq Sq _ .
759Typically the first character of each word in the struct's name is used for the
760prefix.
761While not required by the language, this convention disambiguates the members
762for tools such as
763.Xr cscope 1 .
764For example, consider a structure with a member named
765.Sq len ,
766this could lead to many ambiguous references.
767.Ss Use of Sq static
768In any file which is part of a larger whole rather than a self-contained
769program, maximum use should be made of the
770.Sy static
771keyword to make functions and variables local to single files.
772Variables in particular should be accessible from other files only when there
773is a clear need that cannot be filled in another way.
774Such usage, and in particular its rationale, should be made clear with comments,
775and possibly with a private header file.
776.Ss Qualifiers
777Qualifiers, like
778.Sq const ,
779.Sq volatile ,
780and
781.Sq restrict
782are used to communicate information to the compiler about how an object is used.
783This can be very useful for facilitating optimizations that can dramatically
784improve the runtime performance of code.
785Appropriate qualification can prevent bugs.
786For example, a
787.Sq const
788qualified pointer points to an object that cannot be modified; an attempt to do
789so will give a compile-time error, rather than runtime data corruption.
790Additionally, use of such qualifiers can communicate attributes of an interface
791to a programmer who uses that interface; a programmer who passes a pointer to a
792function that expects a
793.Sq const
794qualified parameter knows that the function will not modify the value the
795pointer refers to.
796Use qualifiers, but beware of some caveats.
797.Pp
798Pointer variables that are
799.Sq const
800qualified should not cast away the qualifier; the compiler may make
801optimizations based on the qualification that are invalid if applied in a
802non-const context.
803Similarly, it is undefined behavior to discard the qualifier for variables that
804are
805.Sq volatile .
806Note that this means that one cannot, for example, pass a volatile-qualified
807pointer to many functions, such as
808.Xr memcpy 3C
809or
810.Xr memset 3C .
811.\"
812.Sh Function Definitions
813A complex function should be preceded by a prologue in a block comment that
814gives the name and a short description of what the function does.
815.Pp
816The type of the value returned should be alone on a line in column 1, including
817any qualifiers, such as
818.Sq const
819or
820.Sq static .
821Functions that return
822.Vt int
823should have that return type explicitly specified: traditional C's default of
824.Vt int
825for the return type of unqualified functions is deprecated.
826If the function does not return a value then it should be given the return type
827.Vt void .
828If the return value requires explanation, it should be given in the block
829comment.
830Functions and variables that are not used outside of the file they are defined
831in should be declared as
832.Sy static .
833This lets the reader know explicitly that they are private, and also eliminates
834the possibility of name conflicts with variables and procedures in other files.
835.Pp
836Functions must be declared using
837.St -ansiC
838syntax rather than K&R.
839There are still places within the illumos gate that use K&R syntax and these
840should be converted as work is done in those areas.
841.Pp
842All local declarations and code within the function body should be tabbed over
843at least one tab, with the level of indentation reflecting the structure of the
844code.
845Labels should appear in column 1.
846If the function uses any external variables or functions that are not
847otherwise declared
848.Sy extern
849at the file level or in a header file,
850these should have their own declarations in the function body using the
851.Sy extern
852keyword.
853If the external variable is an array, the array bounds must be repeated
854in the
855.Sy extern
856declaration.
857.Pp
858If an external variable or value of a parameter passed by pointer is changed by
859the function, that should be noted in the block comment.
860.Pp
861All comments about parameters and local variables should be tabbed so that they
862line up vertically.
863The declarations should be separated from the function's statements by a blank
864line.
865.Pp
866Note that functions that take no parameters must always have a void parameter,
867as shown in the first example below.
868.Pp
869The following examples illustrate many of the rules for function definitions.
870.Bd -literal -offset indent
871/*
872 * sky_is_blue()
873 *
874 * Return true if the sky is blue, else false.
875 */
876bool
877sky_is_blue(void)
878{
879	extern int hour;
880
881	if (hour < MORNING || hour > EVENING)
882		return (false);	/* black */
883	else
884		return (true);	/* blue */
885}
886.Ed
887.Bd -literal -offset indent
888/*
889 * tail(nodep)
890 *
891 * Find the last element in the linked list
892 * pointed to by nodep and return a pointer to it.
893 */
894Node *
895tail(Node *nodep)
896{
897	Node *np;	/* current pointer advances to NULL */
898	Node *lp;	/* last pointer follows np */
899
900	np = lp = nodep;
901	while ((np = np->next) != NULL)
902		lp = np;
903	return (lp);
904}
905.Ed
906.Bd -literal -offset indent
907/*
908 * ANSI C Form 1.
909 * Use this form when the arguments easily fit on one line,
910 * and no per-argument comments are needed.
911 */
912int
913foo(int alpha, char *beta, struct bar gamma)
914{
915	\&...
916}
917.Ed
918.Bd -literal -offset indent
919/*
920 * ANSI C Form 2.
921 * This is a variation on form 1, using the standard continuation
922 * line technique (indent by 4 spaces). Use this form when no
923 * per-argument comments are needed, but all argument declarations
924 * won't fit on one line.
925 */
926int
927foo(int alpha, char *beta,
928    struct bar gamma)
929{
930	\&...
931}
932.Ed
933.Bd -literal -offset indent
934/*
935 * ANSI C Form 3.
936 * Use this form when per-argument comments are needed.
937 * Note that each line of arguments is indented by a full
938 * tab stop. Note carefully the placement of the left
939 * and right parentheses.
940 */
941int
942foo(
943	int alpha,		/* first arg */
944	char *beta,		/* arg with a long comment needed */
945				/*   to describe its purpose */
946	struct bar gamma)	/* big arg */
947{
948	\&...
949}
950.Ed
951.Pp
952A single blank line should separate function definitions.
953.\"
954.Sh Type Declarations
955Many programmers use named types, such as,
956.Sy typedef Ns No s ,
957liberally.
958They feel that the use of typedefs simplifies declaration lists and can
959make program modification easier when types must change.
960Other programmers feel that the use of a typedef hides the underlying type when
961they want to know what the type is.
962This is particularly true for programmers who need to be concerned with
963efficiency, like kernel programmers, and therefore need to be aware of the
964implementation details.
965The choice of whether or not to use typedef is left to the implementor.
966.Pp
967If one elects to use a typedef in conjunction with a pointer type, the
968underlying type should be typedef-ed, rather than typedef-ing a pointer to
969underlying type, because it is often necessary and usually helpful to be able
970to tell if a type is a pointer.
971.Pp
972The use of
973.St -isoC-99
974unsigned integer identifiers of the form
975.Vt uintXX_t
976is preferred over the older BSD-style
977.Vt u_intXX_t .
978New code should use the former, and old code should be converted to the new form
979if other work is being done in that area.
980.Sh Boolean Types
981.St -isoC-99
982introduced the
983.Sq _Bool
984keyword and preprocessor macros for the
985.Vt bool ,
986.Vt true ,
987and
988.Vt false
989symbols in the
990.In stdbool.h
991header
992.Po
993.In sys/stdbool.h
994in the kernel
995.Pc .
996Prior to this, C had no standard boolean type, but illumos provided an
997.Sq enum ,
998.Vt boolean_t ,
999with variants
1000.Dv B_FALSE
1001and
1002.Dv B_TRUE
1003that is widely used.
1004.Pp
1005Sadly, these two types differ significantly:
1006.Bl -bullet -offset indent
1007.It
1008.Vt bool
1009tends to be defined by ABIs as being a single byte wide, while
1010enumerations, and thus
1011.Vt boolean_t ,
1012use the same representation as an
1013.Vt int .
1014.It
1015.Vt bool
1016is defined to be unsigned, while the enumerated type
1017.Vt boolean_t
1018is signed.
1019.It
1020The type
1021.Dq rank
1022of
1023.Vt _Bool
1024is defined to be lower than all other integer types.
1025.It
1026The only legal values of variables of type
1027.Vt bool
1028are 0 and 1 (false and true respectively), and while
1029.Vt boolean_t
1030is only defined with two variants, nothing structurally prevents an assignment
1031from a different value.
1032.It
1033Type conversion to
1034.Vt _Bool
1035has different semantics than assignment to other integer types: conversion
1036results in a 0 if and only if the original value compares equal to 0, otherwise
1037the result is a 1.
1038For an
1039.Vt int ,
1040truncating, rounding behavior, or sign extending behavior is used.
1041.El
1042Thus, programmers must exercise significant care when mixing code using the
1043standard type and
1044.Vt boolean_t .
1045.Pp
1046Broadly, new code should prefer the use of
1047.Vt bool
1048when available.
1049However, code that makes extensive use of
1050.Vt boolean_t
1051should generally continue to do so.
1052Do not mix
1053.Vt bool
1054and
1055.Vt boolean_t
1056in the same
1057.Vt struct ,
1058for example.
1059Similarly, if a file makes extensive use of one, then do not use the other.
1060Furthermore be aware that using
1061.Vt bool
1062requires at least
1063.St -isoC-99 ,
1064which is not mandated across the system, so exercise care in public interfaces.
1065Be particularly aware that transitive includes of header files could mean
1066that code using constructs such as
1067.Vt bool
1068might leak into code that targets an older version of the language; the
1069programmer must not allow this to happen.
1070For example, should a use of
1071.Vt bool
1072inadvertantly end up in
1073.In stdlib.h ,
1074.In sys/types.h ,
1075or another standard-mandated or traditional Unix header file and be
1076available outside of a
1077.St -isoC-99
1078compilation environment, older programs could fail to compile.
1079.Pp
1080Do not use
1081.Vt int
1082or another type to present boolean values in new code.
1083.Ss Guidelines for mixing boolean types
1084As mentioned above, care must taken when mixing
1085.Vt bool
1086and
1087.Vt boolean_t
1088types.
1089In particular:
1090.Bl -bullet -offset indent
1091.It
1092Assigning from a variable of type
1093.Vt bool
1094to one of
1095.Vt boolean_t ,
1096or vice versa, is generally safe.
1097This includes assigning the value returned from a function of one type to the
1098other.
1099.It
1100Passing arguments of one type to a function expecting the other is generally
1101safe.
1102.It
1103Simple comparisons between the two types are generally safe.
1104.El
1105.Pp
1106However, taking a pointer to a variable of one type and casting it to the
1107other is not safe and should never be done.
1108Similarly, changing the definition of one type to another in a
1109.Vt struct
1110or
1111.Vt union
1112is not safe unless one can guarantee that the element of such a compound type
1113is never referred to by pointer and that the type is never used as part of a
1114public interface, such as an
1115.Xr ioctl 2 .
1116.\"
1117.Sh Statements
1118Each line should contain at most one statement.
1119In particular, do not use the comma operator to group multiple statements on
1120one line, or to avoid using braces.
1121For example,
1122.Bd -literal -offset indent
1123argv++; argc--;		/* WRONG */
1124
1125if (err)
1126	fprintf(stderr, "error"), exit(1);	/* VERY WRONG */
1127.Ed
1128.Pp
1129Nesting the ternary conditional operator
1130.Pq ?:
1131can lead to confusing, hard to follow code.
1132For example:
1133.Bd -literal -offset indent
1134num = cnt < tcnt ? (cnt < fcnt ? fcnt : cnt) :
1135    tcnt < bcnt ? tcnt : bcnt > fcnt ? fcnt : bcnt;	/* WRONG */
1136.Ed
1137.Pp
1138Avoid expressions like these, and in general do not nest the ternary operator
1139unless doing so is unavoidable.
1140.Pp
1141If the
1142.Ic return
1143statement is used to return a value, the expression should always be enclosed
1144in parentheses.
1145.Pp
1146Functions that return no value should
1147.Em not
1148include a return statement as the last statement in the function, though early
1149return via a bare
1150.Ic return ;
1151on a line by itself is permitted.
1152.Ss Compound Statements
1153Compound statements are statements that contain lists of statements
1154enclosed in braces
1155.Pq Sq {} .
1156The enclosed list should be indented one more level than the compound statement
1157itself.
1158The opening left brace should be at the end of the line beginning the compound
1159statement, and the closing right brace should be alone on a line, positioned
1160under the beginning of the compound statement
1161.Pq see examples below .
1162Note that the left brace that begins a function body is the only occurrence
1163of a left brace which should be alone on a line.
1164.Pp
1165Braces are also used around a single statement when it is part of a control
1166structure, such as an
1167.Ic if-else
1168or
1169.Ic for
1170statement, as in:
1171.Bd -literal -offset indent
1172if (condition) {
1173	if (other_condition)
1174		statement;
1175}
1176.Ed
1177.Pp
1178Some programmers feel that braces should be used to surround
1179.Em all
1180statements that are part of control structures, even singletons, because this
1181makes it easier to add or delete statements without thinking about whether
1182braces should be added or removed.
1183Some programmers reason that, since some apparent function calls might actually
1184be macros that expand into multiple statements, always using braces allows such
1185macros to always work safely.
1186Thus, they would write:
1187.Bd -literal -offset indent
1188if (condition) {
1189	return (0);
1190}
1191.Ed
1192.Pp
1193Here, the braces are optional and may be omitted to save vertical space.
1194However:
1195.Bl -bullet -offset indent
1196.It
1197if one arm of an
1198.Ic if-else
1199statement contains braces, all arms should contain braces;
1200.It
1201if the condition or singleton occupies more than one line, braces should always
1202be used;
1203.Bd -literal -offset indent
1204if (condition) {
1205	fprintf(stderr, "wrapped singleton: %d\en",
1206	    errno);
1207}
1208.Ed
1209.Bd -literal -offset indent
1210if (strncmp(str, "long condition",
1211    sizeof ("long condition") - 1) == 0) {
1212	fprintf(stderr, "singleton: %d\en", errno);
1213}
1214.Ed
1215.It
1216if the body of a
1217.Ic for
1218or
1219.Ic while
1220loop is empty, no braces are needed:
1221.Bd -literal -offset indent
1222while (*p++ != c)
1223	;
1224.Ed
1225.El
1226.Ss Examples
1227.Sy if, if-else, if-else if-else statements
1228.Bd -literal -offset indent
1229if (condition) {
1230	statements;
1231}
1232.Ed
1233.Bd -literal -offset indent
1234if (condition) {
1235	statements;
1236} else {
1237	statements;
1238}
1239.Ed
1240.Bd -literal -offset indent
1241if (condition) {
1242	statements;
1243} else if (condition) {
1244	statements;
1245}
1246.Ed
1247.Pp
1248Note that the right brace before the
1249.Ic else
1250and the right brace before the
1251.Ic while
1252of a
1253.Ic do-while
1254statement
1255.Pq see below
1256are the only places where a right brace appears that is not alone on a line.
1257.Pp
1258.Sy for statements
1259.Bd -literal -offset indent
1260for (initialization; condition; update) {
1261	statements;
1262}
1263.Ed
1264.Pp
1265When using the comma operator in the initialization or update clauses
1266of a
1267.Ic for
1268statement, it is suggested that no more than three variables should be updated.
1269More than this tends to make the expression too complex.
1270In this case it is generally better to use separate statements outside
1271the
1272.Ic for
1273loop
1274.Pq for the initialization clause ,
1275or at the end of the loop
1276.Pq for the update clause .
1277.Pp
1278The initialization, condition, and update portions of a
1279.Ic for
1280loop may be omitted.
1281.Pp
1282The infinite loop is written using a
1283.Ic for
1284loop.
1285.Bd -literal -offset indent
1286for (;;) {
1287	statements;
1288}
1289.Ed
1290.Pp
1291.Sy while statements
1292.Bd -literal -offset indent
1293while (condition) {
1294	statements;
1295}
1296.Ed
1297.Pp
1298When writing
1299.Ic while
1300loops, prefer nested assignment inside of comparison.
1301That is, prefer:
1302.Bd -literal -offset indent
1303while ((c = getc()) != EOF) {
1304	statements;
1305}
1306.Ed
1307.Pp
1308over,
1309.Bd -literal -offset indent
1310c = get();
1311while (c != EOF) {
1312	statements;
1313	c = getc();
1314}
1315.Ed
1316.Pp
1317.Sy do-while statements
1318.Bd -literal -offset indent
1319do {
1320	statements;
1321} while (condition);
1322.Ed
1323.Pp
1324.Sy switch statements
1325.Bd -literal -offset indent
1326switch (condition) {
1327case ABC:
1328case DEF:
1329	statements;
1330	break;
1331case GHI:
1332	statements;
1333	/* FALLTHROUGH */
1334case JKL: {
1335	int local;
1336
1337	statements;
1338}
1339case XYZ:
1340	statements;
1341	break;
1342default:
1343	statements;
1344	break;
1345}
1346.Ed
1347.Pp
1348The last
1349.Ic break
1350is, strictly speaking, redundant, but it is recommended form nonetheless
1351because it prevents a fall-through error if another
1352.Ic case
1353is added later after the last one.
1354.Pp
1355When using the fall-through feature of
1356.Ic switch ,
1357a comment of the style shown above should be used.
1358In addition to being a useful note for future maintenance, it serves as a
1359hint to the compiler that this is intentional and should not therefore
1360generate a warning.
1361.Pp
1362All
1363.Ic switch
1364statements should include a default case with the possible exception of a
1365switch on an
1366.Vt enum
1367variable for which all possible values of the
1368.Vt enum
1369are listed.
1370.Pp
1371Don't assume that the list of cases covers all possible cases.
1372New, unanticipated, cases may be added later, or bugs elsewhere in the program
1373may cause variables to take on unexpected values.
1374.Pp
1375Each
1376.Ic case
1377statement should be indented to the same level as the
1378.Ic switch
1379statement.
1380Each
1381.Ic case
1382statement should be on a line separate from the statements within the case.
1383.\"
1384.Sh White Space
1385.Ss Vertical White Space
1386Judicious use of lines can improve readability by setting off sections of code
1387that are logically related.
1388Use vertical white space to make it clear that stanzas are logically separated.
1389.Pp
1390A blank line should always be used in the following circumstances:
1391.Bl -bullet -offset indent
1392.It
1393After the
1394.Ic #include
1395section at the top of a source file.
1396.It
1397After blocks of
1398.Ic #define Ns No s
1399of constants, and before and after
1400.Ic #define Ns No s
1401of macros.
1402.It
1403Between structure declarations.
1404.It
1405Between functions.
1406.It
1407After local variable declarations.
1408.El
1409.Pp
1410Form-feeds should never be used to separate functions.
1411.\"
1412.Sh Horizontal White Space
1413Here are the guidelines for blank spaces:
1414.Bl -bullet -offset indent
1415.It
1416A blank should follow a keyword whenever a parenthesis follows the keyword.
1417Note that both
1418.Ic sizeof
1419and
1420.Ic return
1421are keywords, whereas things like
1422.Xr strlen 3C
1423and
1424.Xr exit 3C
1425are not.
1426.Pp
1427Blanks should not be used between procedure names
1428.Pq or macro calls
1429and their argument list.
1430This helps to distinguish keywords from procedure calls.
1431.Bd -literal -offset indent
1432/*
1433 * No space between strncmp and '(' but
1434 * there is one between sizeof and '('
1435 */
1436if (strncmp(x, "done", sizeof ("done") - 1) == 0)
1437	...
1438.Ed
1439.It
1440Blanks should appear after commas in argument lists.
1441.It
1442Blanks should
1443.Em not
1444appear immediately after a left parenthesis or immediately before a right
1445parenthesis.
1446.It
1447All binary operators except
1448.Sq \&.
1449and
1450.Sq ->
1451should be separated from their operands by blanks.
1452In other words, blanks should appear around assignment, arithmetic, relational,
1453and logical operators.
1454.Pp
1455Blanks should never separate unary operators such as unary minus,
1456address
1457.Pq Sq \&& ,
1458indirection
1459.Pq Sq \&* ,
1460increment
1461.Pq Sq ++ ,
1462and decrement
1463.Pq Sq --
1464from their operands.
1465Note that this includes the unary
1466.Sq \&*
1467that is a part of pointer declarations.
1468.Pp
1469Examples:
1470.Bd -literal -offset indent
1471char *d, *s;
1472a += c + d;
1473a = (a + b) / (c * d);
1474strp->field = str.fl - ((x & MASK) >> DISP);
1475while ((*d++ = *s++) != '\0')
1476	n++;
1477.Ed
1478.It
1479The expressions in a
1480.Ic for
1481statement should be separated by blanks:
1482.Bd -literal -offset indent
1483for (expr1; expr2; expr3)
1484.Ed
1485.Pp
1486If an expression is omitted, no space should be left in its place:
1487.Bd -literal -offset indent
1488for (expr1; expr2;)
1489.Ed
1490.It
1491Casts should not be followed by a blank, with the exception of function
1492calls whose return values are ignored:
1493.Bd -literal -offset indent
1494(void) myfunc((uintptr_t)ptr, (char *)x);
1495.Ed
1496.El
1497.Ss Hidden White Space
1498There are many uses of blanks that will not be visible when viewed
1499on a terminal, and it is often difficult to distinguish blanks from tabs.
1500However, inconsistent use of blanks and tabs may produce unexpected results
1501when the code is printed with a pretty-printer, and may make simple regular
1502expression searches fail unexpectedly.
1503The following guidelines are helpful:
1504.Bl -bullet -offset indent
1505.It
1506Spaces and tabs at the end of a line are not permitted.
1507.It
1508Spaces between tabs, and tabs between spaces, are not permitted.
1509.It
1510Use tabs to line things up in columns
1511.Po
1512such as for indenting code, and to line up elements within a series of
1513declarations
1514.Pc
1515and spaces to separate items within a line.
1516.It
1517Use tabs to separate single line comments from the corresponding code.
1518.El
1519.\"
1520.Sh Parentheses
1521Since C has complex precedence rules, parentheses can clarify the programmer's
1522intent in complex expressions that mix operators.
1523Programmers should feel free to use parentheses if they feel that they make
1524the code clearer and easier to understand.
1525However, bear in mind that this can be taken too far, so some judgment must
1526be applied to prevent making things less readable.
1527For example, compare:
1528.Bd -literal -offset indent
1529x = ((x * 2) * 3) + (((y / 2) * 3) + 1);
1530.Ed
1531.Pp
1532to,
1533.Bd -literal -offset indent
1534x = x * 2 * 3 + y / 2 * 3 + 1;
1535.Ed
1536.Pp
1537It is also important to remember that complex expressions can be used as
1538parameters to macros, and operator-precedence problems can arise unless
1539.Em all
1540occurrences of parameters in the body of a macro definition have parentheses
1541around them.
1542.\"
1543.Sh Constants
1544Numeric constants should not generally be written directly.
1545Instead, give the constant a meaningful name using a
1546.Ic const
1547variable, an
1548.Ic enum
1549or the
1550.Ic #define
1551feature of the C preprocessor.
1552This makes it easier to maintain large programs since the constant value can be
1553changed uniformly by changing only the constant's definition.
1554.Pp
1555The enum data type is the preferred way to handle situations where
1556a variable takes on only a discrete set of values, since additional type
1557checking is available through the compiler and, as mentioned above,
1558tools such as the
1559.Xr mdb 1
1560debugger also support enums.
1561.Pp
1562There are some cases where the constants 0 and 1 may appear as themselves
1563instead of as
1564.Ic #define Ns No s .
1565For example if a
1566.Ic for
1567loop indexes through an array, then
1568.Bd -literal -offset indent
1569for (i = 0; i < ARYBOUND; i++)
1570.Ed
1571.Pp
1572is reasonable.
1573.Pp
1574In rare cases, other constants may appear as themselves.
1575Some judgment is required to determine whether the semantic meaning of the
1576constant is obvious from its value, or whether the code would be easier
1577to understand if a symbolic name were used for the value.
1578.\"
1579.Sh Goto
1580While not completely avoidable, use of
1581.Ic goto
1582is generally discouraged.
1583In many cases, breaking a procedure into smaller pieces, or using a different
1584language construct can eliminate the need for
1585.Ic goto Ns No s .
1586For example, instead of:
1587.Bd -literal -offset indent
1588again:
1589	if (s = proc(args))
1590		if (s == -1 && errno == EINTR)
1591			goto again;
1592.Ed
1593.Pp
1594write:
1595.Bd -literal -offset indent
1596	do {
1597		s = proc(args);
1598	} while (s == -1 && errno == EINTR);
1599.Ed
1600.Pp
1601The main place where
1602.Ic goto Ns No s
1603can be usefully employed is to break out of several levels of
1604.Ic switch
1605or loop nesting, or to centralize error path cleanup code in a function.
1606For example:
1607.Bd -literal -offset indent
1608	for (...)
1609		for (...) {
1610			...
1611			if (disaster)
1612				goto error;
1613		}
1614	...
1615error:
1616	clean up the mess;
1617.Ed
1618.Pp
1619However the need to do such things may indicate that the inner constructs
1620should be broken out into a separate function.
1621Never use a
1622.Ic goto
1623outside of a given block to branch to a label within a block:
1624.Bd -literal -offset indent
1625goto label;	/* WRONG */
1626\&...
1627for (...) {
1628	...
1629label:
1630	statement;
1631	...
1632}
1633.Ed
1634.Pp
1635When a
1636.Ic goto
1637is necessary, the accompanying label should be alone on a line.
1638.Sh Variable Initialization
1639C permits initializing a variable where it is declared.
1640Programmers are equally divided about whether or not this is a good idea:
1641.Qo
1642I like to think of declarations and executable code as separate units.
1643Intermixing them only confuses the issue.
1644If only a scattered few declarations are initialized, it is easy not to see
1645them.
1646.Qc
1647.Qo
1648The major purpose of code style is clarity.
1649I think the less hunting around for the connections between different places in
1650the code, the better.
1651I don't think variables should be initialized for no reason, however.
1652If the variable doesn't need to be initialized, don't waste the reader's time
1653by making him/her think that it does.
1654.Qc
1655.Pp
1656A convention used by some programmers is to only initialize automatic variables
1657in declarations if the value of the variable is constant throughout the block;
1658such variables should be declared
1659.Ic const .
1660Note that as a matter of correctness, all automatic variables must be
1661initialized before use, either in the declaration or elsewhere.
1662.Pp
1663The decision about whether or not to initialize a variable in a declaration is
1664therefore left to the implementor.
1665Use good taste.
1666For example, don't bury a variable initialization in the middle of a long
1667declaration:
1668.Bd -literal -offset indent
1669int	a, b, c, d = 4, e, f;		/* This is NOT good style */
1670.Ed
1671.Sh Multiple Assignments
1672C also permits assigning several variables to the same value in a single
1673statement, as in,
1674.Bd -literal -offset indent
1675x = y = z = 0;
1676.Ed
1677Good taste is required here also.
1678For example, assigning several variables that are used the same way in the
1679program in a single statement clarifies the relationship between the variables
1680by making it more explicit:
1681.Bd -literal -offset indent
1682x = y = z = 0;
1683vx = vy = vz = 1;
1684count = 0;
1685scale = 1;
1686.Ed
1687.Pp
1688is good, whereas:
1689.Bd -literal -offset indent
1690x = y = z = count = 0;
1691vx = vy = vz = scale = 1;
1692.Ed
1693.Pp
1694sacrifices clarity for brevity.
1695In any case, the variables that are so assigned should all be of the same type
1696.Po
1697or all pointers being initialized to
1698.Dv NULL
1699.Pc .
1700It is not a good idea to use multiple assignments for complex expressions,
1701as this can be significantly harder to read.
1702E.g.,
1703.Bd -literal -offset indent
1704foo_bar.fb_name.firstch = bar_foo.fb_name.lastch = 'c'; /* Yecch */
1705.Ed
1706.\"
1707.Sh Preprocessor
1708The C preprocessor provides support for textual inclusion of files
1709.Pq most often header files ,
1710conditional compilation, and macro definitions and substitutions.
1711.Pp
1712It should be noted that the preprocessor works at the lexicographical, not
1713syntactic level of the language.
1714It is possible to define macros that are not syntactically valid when expanded,
1715and the programmer should take care when using the preprocessor.
1716Some general advice follows.
1717.Pp
1718Do not rename members of a structure using
1719.Ic #define
1720within a subsystem; instead, use a
1721.Ic union .
1722The legacy practice of using
1723.Ic #define
1724to define shorthand notations for referencing members of a union should
1725not be used in new code.
1726.Pp
1727Be
1728.Em extremely
1729careful when choosing names for
1730.Ic #define Ns No s .
1731For example, never use something like
1732.Bd -literal -offset indent
1733#define	size	10
1734.Ed
1735.Pp
1736especially in a header file, since it is not unlikely that the user
1737might want to declare a variable named
1738.Va size .
1739.Pp
1740Remember that names used in
1741.Ic #define
1742statements come out of a global preprocessor name space and can conflict with
1743names in any other namespace.
1744For this reason, this use of
1745.Ic #define
1746is discouraged.
1747.Pp
1748Note that
1749.Ic #define
1750follows indentation rules similar to other declarations; see the section on
1751.Sx Indentation
1752for details.
1753.Pp
1754Care is needed when defining macros that replace functions since functions
1755pass their parameters by value whereas macros pass their arguments by
1756name substitution.
1757.Pp
1758At the end of an
1759.Ic #ifdef
1760construct used to select among a required set of options
1761.Pq such as machine types ,
1762include a final
1763.Ic #else
1764clause containing a useful but illegal statement so that the compiler will
1765generate an error message if none of the options has been defined:
1766.Bd -literal -offset indent
1767#ifdef vax
1768	...
1769#elif sun
1770	...
1771#elif u3b2
1772	...
1773#else
1774#error unknown machine type;
1775#endif /* machine type */
1776.Ed
1777.Pp
1778Header files should make use of
1779.Dq include guards
1780to prevent their contents from being evaluated multiple times.
1781For example,
1782.Bd -literal -offset indent
1783#ifndef	_FOOBAR_H
1784#define	_FOOBAR_H
1785
1786/* Header contents....
1787
1788#endif	/* !_FOOBAR_H */
1789.Ed
1790.Pp
1791The symbol defined for the include guard should be uniquely derived from the
1792header file's name.
1793Note that this is one area where library authors often use a leading underscore
1794in an identifier.
1795While this is technically in violation of the ISO C standard, the practice is
1796common.
1797.Pp
1798Don't change C syntax via macro substitution.
1799For example,
1800.Bd -literal -offset indent
1801#define	BEGIN	{
1802.Ed
1803.Pp
1804It makes the program unintelligible to all but the perpetrator.
1805.Pp
1806Be especially aware that function-like macros are textually substituted, and
1807side-effects in their arguments may be multiply-evaluated if the arguments are
1808referred to more than once in the body of the macro.
1809Similarly, variables defined inside of a macro's body may conflict with
1810variables in the outer scope.
1811Finally, macros are not generally type safe.
1812For most macros and most programs, these are non-issues, but programmers who
1813run into problems here may consider judicious use of
1814.Sq inline
1815functions as an alternative.
1816.Ss Whitespace and the Preprocessor
1817Use the following conventions with respect to whitespace and the preprocessor:
1818.Bl -bullet -offset indent
1819.It
1820.Sq #include
1821should be followed by a single space character.
1822.It
1823.Sq #define
1824should be followed by a single tab character.
1825.It
1826.Sq #if ,
1827.Sq #ifdef ,
1828and other preprocessor statements may be followed by either a tab or space, but
1829be consistent with the surrounding code.
1830.El
1831.\"
1832.Sh Miscellaneous Comments on Good Taste
1833Avoid undefined behavior wherever possible.
1834Note that the rules of C are very subtle, and many things that at first
1835appear well-defined can actually conceal undefined behavior.
1836When in doubt, consult the C standard.
1837.Pp
1838Traditional Unix style favors guard clauses, which check a precondition and fail
1839.Pq possibly via an early return
1840over deeply nested control structures.
1841For example, prefer:
1842.Bd -literal -offset indent
1843void
1844foo(void)
1845{
1846	struct foo *foo;
1847	struct bar *bar;
1848	struct baz *baz;
1849
1850	foo = some_foo();
1851	if (!is_valid_foo(foo))
1852		return;
1853	bar = some_bar(foo);
1854	if (!is_valid_bar(bar)
1855		return;
1856	baz = some_baz(bar);
1857	if (!is_valid_baz(baz));
1858		return;
1859
1860	/* All of the preconditions are met */
1861	do_something(baz);
1862}
1863.Ed
1864.Pp
1865over,
1866.Bd -literal -offset indent
1867void
1868foo(void)
1869{
1870	struct foo *f;
1871
1872	foo = some_foo();
1873	if (is_valid_foo(foo)) {
1874		bar = some_bar(foo);
1875		if (is_valid_bar(bar)) {
1876			baz = some_baz(bar);
1877			if (is_valid_baz(baz)) {
1878				/* Preconditions met */
1879				do_something(baz);
1880			}
1881		}
1882	}
1883}
1884.Ed
1885.Pp
1886Try to make the structure of your program match the intent.
1887For example, replace:
1888.Bd -literal -offset indent
1889if (boolean_expression)
1890	return (TRUE);
1891else
1892	return (FALSE);
1893.Ed
1894.Pp
1895with:
1896.Bd -literal -offset indent
1897return (boolean_expression);
1898.Ed
1899.Pp
1900Similarly,
1901.Bd -literal -offset indent
1902if (condition)
1903	return (x);
1904return (y);
1905.Ed
1906.Pp
1907is usually clearer than:
1908.Bd -literal -offset indent
1909if (condition)
1910	return (x);
1911else
1912	return (y);
1913.Ed
1914.Pp
1915or even better, if the condition and return expressions are short;
1916.Bd -literal -offset indent
1917return (condition ? x : y);
1918.Ed
1919.Pp
1920Do not default the boolean test for nonzero.
1921Prefer
1922.Bd -literal -offset indent
1923if (f() != 0)
1924.Ed
1925.Pp
1926rather than
1927.Bd -literal -offset indent
1928if (f())
1929.Ed
1930.Pp
1931even though 0 is considered to
1932.Dq false
1933in boolean contexts in C.
1934An exception is commonly made for predicate functions, which encapsulate
1935.Pq possibly complex
1936boolean expressions.
1937Predicates must meet the following restrictions:
1938.Bl -bullet -offset indent
1939.It
1940Has no other purpose than to return true or false.
1941.It
1942Returns 0 for false, non-zero for true.
1943.It
1944Is named so that the meaning of the return value is obvious.
1945.El
1946.Pp
1947Call a predicate
1948.Fn is_valid
1949or
1950.Fn valid ,
1951not
1952.Fn check_valid .
1953Note that
1954.Fn isvalid
1955and similar names with the
1956.Sq is
1957prefix followed by a letter or number
1958.Pq but not underscore
1959are reserved by the ISO C standard.
1960.Pp
1961The set of POSIX ctype functions including
1962.Xr isalpha 3C ,
1963.Xr isalnum 3C ,
1964and
1965.Xr isdigit 3C
1966are examples of predicates.
1967.Pp
1968A particularly notorious case of not obeying the rules around predicates is
1969using
1970.Xr strcmp 3C
1971to test for string equality, where the result should never be defaulted
1972.Pq and indeed, a return value of 0 denotes equality .
1973.Pp
1974Never use the boolean negation operator
1975.Pq Sq \&!
1976with non-boolean expressions.
1977In particular, never use it to test for a NULL pointer or to test for
1978success of comparison functions like
1979.Xr strcmp 3C
1980or
1981.Xr memcmp 3C .
1982E.g.,
1983.Bd -literal -offset indent
1984char *p;
1985\&...
1986if (!p)			/* WRONG */
1987	return;
1988
1989if (!strcmp(*argv, "-a"))	/* WRONG */
1990	aflag++;
1991.Ed
1992.Pp
1993When testing whether a bit is set in a value, it is good style to explicitly
1994test the result of a bitwise operation against 0, rather than defaulting the
1995boolean condition.
1996Prefer
1997.Bd -literal -offset indent
1998if ((flags & FLAG_VERBOSE) != 0)
1999if ((flags & FLAG_VERBOSE) == 0)
2000.Ed
2001.Pp
2002rather than the following:
2003.Bd -literal -offset indent
2004if (flags & FLAG_VERBOSE)
2005if (!(flags & FLAG_VERBOSE))
2006.Ed
2007.Pp
2008Do not use the assignment operator in a place where it could be easily
2009confused with the equality operator.
2010For instance, in the simple expression
2011.Bd -literal -offset indent
2012if (x = y)
2013	statement;
2014.Ed
2015.Pp
2016it is hard to tell whether the programmer really meant assignment or
2017mistyped an equality test.
2018Instead, use
2019.Bd -literal -offset indent
2020if ((x = y) != 0)
2021	statement;
2022.Ed
2023.Pp
2024or something similar, if the assignment is actually needed within the
2025.Ic if
2026statement.
2027.Pp
2028There is a time and a place for embedded assignments.
2029The
2030.Ic ++
2031and
2032.Ic --
2033operators count as assignments; so, for many purposes, do functions with side
2034effects.
2035.Pp
2036In some constructs there is no better way to accomplish the results without
2037making the code bulkier and less readable.
2038To repeat the earlier loop example:
2039.Bd -literal -offset indent
2040while ((c = getchar()) != EOF) {
2041	process the character
2042}
2043.Ed
2044.Pp
2045Embedded assignments used to provide modest improvement in run-time
2046performance, but this is no longer the case with modern optimizing compilers.
2047Do note write, for example,
2048.Bd -literal -offset indent
2049d = (a = b + c) + 4;		/* WRONG */
2050.Ed
2051.Pp
2052believing that it will somehow be
2053.Dq faster
2054than
2055.Bd -literal -offset indent
2056a = b + c;
2057d = a + 4;
2058.Ed
2059.Pp
2060In general, avoid such premature micro-optimization unless performance
2061is clearly a bottleneck, and a profile shows that the optimization provides
2062a significant performance boost.
2063Be aware how in the long run hand-optimized code often turns into a
2064pessimization, and maintenance difficulty will increase as the human
2065memory of what's going on in a given piece of code fades.
2066Note also that side effects within expressions can result in code
2067whose semantics are compiler-dependent, since C's order of evaluation
2068is explicitly undefined in most places.
2069Compilers do differ.
2070.Pp
2071There is also a time and place for the ternary
2072.Pq Sq \&? \&:
2073operator and the binary comma operator.
2074If an expression containing a binary operator appears before the
2075.Sq \&? ,
2076it should be parenthesized:
2077.Bd -literal -offset indent
2078(x >= 0) ? x : -x
2079.Ed
2080.Pp
2081Nested ternary operators can be confusing and should be avoided if possible.
2082.Pp
2083The comma operator can be useful in
2084.Ic for
2085statements to provide multiple initializations or incrementations.
2086.Sh SEE ALSO
2087.Rs
2088.%T C Style and Coding Standards for SunOS
2089.%A Bill Shannon
2090.%D 1996
2091.Re
2092.Pp
2093.Xr mdb 1
2094