1 2 Linux kernel coding style 3 4This is a short document describing the preferred coding style for the 5linux kernel. Coding style is very personal, and I won't _force_ my 6views on anybody, but this is what goes for anything that I have to be 7able to maintain, and I'd prefer it for most other things too. Please 8at least consider the points made here. 9 10First off, I'd suggest printing out a copy of the GNU coding standards, 11and NOT read it. Burn them, it's a great symbolic gesture. 12 13Anyway, here goes: 14 15 16 Chapter 1: Indentation 17 18Tabs are 8 characters, and thus indentations are also 8 characters. 19There are heretic movements that try to make indentations 4 (or even 2!) 20characters deep, and that is akin to trying to define the value of PI to 21be 3. 22 23Rationale: The whole idea behind indentation is to clearly define where 24a block of control starts and ends. Especially when you've been looking 25at your screen for 20 straight hours, you'll find it a lot easier to see 26how the indentation works if you have large indentations. 27 28Now, some people will claim that having 8-character indentations makes 29the code move too far to the right, and makes it hard to read on a 3080-character terminal screen. The answer to that is that if you need 31more than 3 levels of indentation, you're screwed anyway, and should fix 32your program. 33 34In short, 8-char indents make things easier to read, and have the added 35benefit of warning you when you're nesting your functions too deep. 36Heed that warning. 37 38The preferred way to ease multiple indentation levels in a switch statement is 39to align the "switch" and its subordinate "case" labels in the same column 40instead of "double-indenting" the "case" labels. E.g.: 41 42 switch (suffix) { 43 case 'G': 44 case 'g': 45 mem <<= 30; 46 break; 47 case 'M': 48 case 'm': 49 mem <<= 20; 50 break; 51 case 'K': 52 case 'k': 53 mem <<= 10; 54 /* fall through */ 55 default: 56 break; 57 } 58 59 60Don't put multiple statements on a single line unless you have 61something to hide: 62 63 if (condition) do_this; 64 do_something_everytime; 65 66Don't put multiple assignments on a single line either. Kernel coding style 67is super simple. Avoid tricky expressions. 68 69Outside of comments, documentation and except in Kconfig, spaces are never 70used for indentation, and the above example is deliberately broken. 71 72Get a decent editor and don't leave whitespace at the end of lines. 73 74 75 Chapter 2: Breaking long lines and strings 76 77Coding style is all about readability and maintainability using commonly 78available tools. 79 80The limit on the length of lines is 80 columns and this is a hard limit. 81 82Statements longer than 80 columns will be broken into sensible chunks. 83Descendants are always substantially shorter than the parent and are placed 84substantially to the right. The same applies to function headers with a long 85argument list. Long strings are as well broken into shorter strings. 86 87void fun(int a, int b, int c) 88{ 89 if (condition) 90 printk(KERN_WARNING "Warning this is a long printk with " 91 "3 parameters a: %u b: %u " 92 "c: %u \n", a, b, c); 93 else 94 next_statement; 95} 96 97 Chapter 3: Placing Braces and Spaces 98 99The other issue that always comes up in C styling is the placement of 100braces. Unlike the indent size, there are few technical reasons to 101choose one placement strategy over the other, but the preferred way, as 102shown to us by the prophets Kernighan and Ritchie, is to put the opening 103brace last on the line, and put the closing brace first, thusly: 104 105 if (x is true) { 106 we do y 107 } 108 109This applies to all non-function statement blocks (if, switch, for, 110while, do). E.g.: 111 112 switch (action) { 113 case KOBJ_ADD: 114 return "add"; 115 case KOBJ_REMOVE: 116 return "remove"; 117 case KOBJ_CHANGE: 118 return "change"; 119 default: 120 return NULL; 121 } 122 123However, there is one special case, namely functions: they have the 124opening brace at the beginning of the next line, thus: 125 126 int function(int x) 127 { 128 body of function 129 } 130 131Heretic people all over the world have claimed that this inconsistency 132is ... well ... inconsistent, but all right-thinking people know that 133(a) K&R are _right_ and (b) K&R are right. Besides, functions are 134special anyway (you can't nest them in C). 135 136Note that the closing brace is empty on a line of its own, _except_ in 137the cases where it is followed by a continuation of the same statement, 138ie a "while" in a do-statement or an "else" in an if-statement, like 139this: 140 141 do { 142 body of do-loop 143 } while (condition); 144 145and 146 147 if (x == y) { 148 .. 149 } else if (x > y) { 150 ... 151 } else { 152 .... 153 } 154 155Rationale: K&R. 156 157Also, note that this brace-placement also minimizes the number of empty 158(or almost empty) lines, without any loss of readability. Thus, as the 159supply of new-lines on your screen is not a renewable resource (think 16025-line terminal screens here), you have more empty lines to put 161comments on. 162 163 3.1: Spaces 164 165Linux kernel style for use of spaces depends (mostly) on 166function-versus-keyword usage. Use a space after (most) keywords. The 167notable exceptions are sizeof, typeof, alignof, and __attribute__, which look 168somewhat like functions (and are usually used with parentheses in Linux, 169although they are not required in the language, as in: "sizeof info" after 170"struct fileinfo info;" is declared). 171 172So use a space after these keywords: 173 if, switch, case, for, do, while 174but not with sizeof, typeof, alignof, or __attribute__. E.g., 175 s = sizeof(struct file); 176 177Do not add spaces around (inside) parenthesized expressions. This example is 178*bad*: 179 180 s = sizeof( struct file ); 181 182When declaring pointer data or a function that returns a pointer type, the 183preferred use of '*' is adjacent to the data name or function name and not 184adjacent to the type name. Examples: 185 186 char *linux_banner; 187 unsigned long long memparse(char *ptr, char **retptr); 188 char *match_strdup(substring_t *s); 189 190Use one space around (on each side of) most binary and ternary operators, 191such as any of these: 192 193 = + - < > * / % | & ^ <= >= == != ? : 194 195but no space after unary operators: 196 & * + - ~ ! sizeof typeof alignof __attribute__ defined 197 198no space before the postfix increment & decrement unary operators: 199 ++ -- 200 201no space after the prefix increment & decrement unary operators: 202 ++ -- 203 204and no space around the '.' and "->" structure member operators. 205 206 207 Chapter 4: Naming 208 209C is a Spartan language, and so should your naming be. Unlike Modula-2 210and Pascal programmers, C programmers do not use cute names like 211ThisVariableIsATemporaryCounter. A C programmer would call that 212variable "tmp", which is much easier to write, and not the least more 213difficult to understand. 214 215HOWEVER, while mixed-case names are frowned upon, descriptive names for 216global variables are a must. To call a global function "foo" is a 217shooting offense. 218 219GLOBAL variables (to be used only if you _really_ need them) need to 220have descriptive names, as do global functions. If you have a function 221that counts the number of active users, you should call that 222"count_active_users()" or similar, you should _not_ call it "cntusr()". 223 224Encoding the type of a function into the name (so-called Hungarian 225notation) is brain damaged - the compiler knows the types anyway and can 226check those, and it only confuses the programmer. No wonder MicroSoft 227makes buggy programs. 228 229LOCAL variable names should be short, and to the point. If you have 230some random integer loop counter, it should probably be called "i". 231Calling it "loop_counter" is non-productive, if there is no chance of it 232being mis-understood. Similarly, "tmp" can be just about any type of 233variable that is used to hold a temporary value. 234 235If you are afraid to mix up your local variable names, you have another 236problem, which is called the function-growth-hormone-imbalance syndrome. 237See chapter 6 (Functions). 238 239 240 Chapter 5: Typedefs 241 242Please don't use things like "vps_t". 243 244It's a _mistake_ to use typedef for structures and pointers. When you see a 245 246 vps_t a; 247 248in the source, what does it mean? 249 250In contrast, if it says 251 252 struct virtual_container *a; 253 254you can actually tell what "a" is. 255 256Lots of people think that typedefs "help readability". Not so. They are 257useful only for: 258 259 (a) totally opaque objects (where the typedef is actively used to _hide_ 260 what the object is). 261 262 Example: "pte_t" etc. opaque objects that you can only access using 263 the proper accessor functions. 264 265 NOTE! Opaqueness and "accessor functions" are not good in themselves. 266 The reason we have them for things like pte_t etc. is that there 267 really is absolutely _zero_ portably accessible information there. 268 269 (b) Clear integer types, where the abstraction _helps_ avoid confusion 270 whether it is "int" or "long". 271 272 u8/u16/u32 are perfectly fine typedefs, although they fit into 273 category (d) better than here. 274 275 NOTE! Again - there needs to be a _reason_ for this. If something is 276 "unsigned long", then there's no reason to do 277 278 typedef unsigned long myflags_t; 279 280 but if there is a clear reason for why it under certain circumstances 281 might be an "unsigned int" and under other configurations might be 282 "unsigned long", then by all means go ahead and use a typedef. 283 284 (c) when you use sparse to literally create a _new_ type for 285 type-checking. 286 287 (d) New types which are identical to standard C99 types, in certain 288 exceptional circumstances. 289 290 Although it would only take a short amount of time for the eyes and 291 brain to become accustomed to the standard types like 'uint32_t', 292 some people object to their use anyway. 293 294 Therefore, the Linux-specific 'u8/u16/u32/u64' types and their 295 signed equivalents which are identical to standard types are 296 permitted -- although they are not mandatory in new code of your 297 own. 298 299 When editing existing code which already uses one or the other set 300 of types, you should conform to the existing choices in that code. 301 302 (e) Types safe for use in userspace. 303 304 In certain structures which are visible to userspace, we cannot 305 require C99 types and cannot use the 'u32' form above. Thus, we 306 use __u32 and similar types in all structures which are shared 307 with userspace. 308 309Maybe there are other cases too, but the rule should basically be to NEVER 310EVER use a typedef unless you can clearly match one of those rules. 311 312In general, a pointer, or a struct that has elements that can reasonably 313be directly accessed should _never_ be a typedef. 314 315 316 Chapter 6: Functions 317 318Functions should be short and sweet, and do just one thing. They should 319fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, 320as we all know), and do one thing and do that well. 321 322The maximum length of a function is inversely proportional to the 323complexity and indentation level of that function. So, if you have a 324conceptually simple function that is just one long (but simple) 325case-statement, where you have to do lots of small things for a lot of 326different cases, it's OK to have a longer function. 327 328However, if you have a complex function, and you suspect that a 329less-than-gifted first-year high-school student might not even 330understand what the function is all about, you should adhere to the 331maximum limits all the more closely. Use helper functions with 332descriptive names (you can ask the compiler to in-line them if you think 333it's performance-critical, and it will probably do a better job of it 334than you would have done). 335 336Another measure of the function is the number of local variables. They 337shouldn't exceed 5-10, or you're doing something wrong. Re-think the 338function, and split it into smaller pieces. A human brain can 339generally easily keep track of about 7 different things, anything more 340and it gets confused. You know you're brilliant, but maybe you'd like 341to understand what you did 2 weeks from now. 342 343In source files, separate functions with one blank line. If the function is 344exported, the EXPORT* macro for it should follow immediately after the closing 345function brace line. E.g.: 346 347int system_is_up(void) 348{ 349 return system_state == SYSTEM_RUNNING; 350} 351EXPORT_SYMBOL(system_is_up); 352 353In function prototypes, include parameter names with their data types. 354Although this is not required by the C language, it is preferred in Linux 355because it is a simple way to add valuable information for the reader. 356 357 358 Chapter 7: Centralized exiting of functions 359 360Albeit deprecated by some people, the equivalent of the goto statement is 361used frequently by compilers in form of the unconditional jump instruction. 362 363The goto statement comes in handy when a function exits from multiple 364locations and some common work such as cleanup has to be done. 365 366The rationale is: 367 368- unconditional statements are easier to understand and follow 369- nesting is reduced 370- errors by not updating individual exit points when making 371 modifications are prevented 372- saves the compiler work to optimize redundant code away ;) 373 374int fun(int a) 375{ 376 int result = 0; 377 char *buffer = kmalloc(SIZE); 378 379 if (buffer == NULL) 380 return -ENOMEM; 381 382 if (condition1) { 383 while (loop1) { 384 ... 385 } 386 result = 1; 387 goto out; 388 } 389 ... 390out: 391 kfree(buffer); 392 return result; 393} 394 395 Chapter 8: Commenting 396 397Comments are good, but there is also a danger of over-commenting. NEVER 398try to explain HOW your code works in a comment: it's much better to 399write the code so that the _working_ is obvious, and it's a waste of 400time to explain badly written code. 401 402Generally, you want your comments to tell WHAT your code does, not HOW. 403Also, try to avoid putting comments inside a function body: if the 404function is so complex that you need to separately comment parts of it, 405you should probably go back to chapter 6 for a while. You can make 406small comments to note or warn about something particularly clever (or 407ugly), but try to avoid excess. Instead, put the comments at the head 408of the function, telling people what it does, and possibly WHY it does 409it. 410 411When commenting the kernel API functions, please use the kernel-doc format. 412See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc 413for details. 414 415Linux style for comments is the C89 "/* ... */" style. 416Don't use C99-style "// ..." comments. 417 418The preferred style for long (multi-line) comments is: 419 420 /* 421 * This is the preferred style for multi-line 422 * comments in the Linux kernel source code. 423 * Please use it consistently. 424 * 425 * Description: A column of asterisks on the left side, 426 * with beginning and ending almost-blank lines. 427 */ 428 429It's also important to comment data, whether they are basic types or derived 430types. To this end, use just one data declaration per line (no commas for 431multiple data declarations). This leaves you room for a small comment on each 432item, explaining its use. 433 434 435 Chapter 9: You've made a mess of it 436 437That's OK, we all do. You've probably been told by your long-time Unix 438user helper that "GNU emacs" automatically formats the C sources for 439you, and you've noticed that yes, it does do that, but the defaults it 440uses are less than desirable (in fact, they are worse than random 441typing - an infinite number of monkeys typing into GNU emacs would never 442make a good program). 443 444So, you can either get rid of GNU emacs, or change it to use saner 445values. To do the latter, you can stick the following in your .emacs file: 446 447(defun linux-c-mode () 448 "C mode with adjusted defaults for use with the Linux kernel." 449 (interactive) 450 (c-mode) 451 (c-set-style "K&R") 452 (setq tab-width 8) 453 (setq indent-tabs-mode t) 454 (setq c-basic-offset 8)) 455 456This will define the M-x linux-c-mode command. When hacking on a 457module, if you put the string -*- linux-c -*- somewhere on the first 458two lines, this mode will be automatically invoked. Also, you may want 459to add 460 461(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode) 462 auto-mode-alist)) 463 464to your .emacs file if you want to have linux-c-mode switched on 465automagically when you edit source files under /usr/src/linux. 466 467But even if you fail in getting emacs to do sane formatting, not 468everything is lost: use "indent". 469 470Now, again, GNU indent has the same brain-dead settings that GNU emacs 471has, which is why you need to give it a few command line options. 472However, that's not too bad, because even the makers of GNU indent 473recognize the authority of K&R (the GNU people aren't evil, they are 474just severely misguided in this matter), so you just give indent the 475options "-kr -i8" (stands for "K&R, 8 character indents"), or use 476"scripts/Lindent", which indents in the latest style. 477 478"indent" has a lot of options, and especially when it comes to comment 479re-formatting you may want to take a look at the man page. But 480remember: "indent" is not a fix for bad programming. 481 482 483 Chapter 10: Configuration-files 484 485For configuration options (arch/xxx/Kconfig, and all the Kconfig files), 486somewhat different indentation is used. 487 488Help text is indented with 2 spaces. 489 490if CONFIG_EXPERIMENTAL 491 tristate CONFIG_BOOM 492 default n 493 help 494 Apply nitroglycerine inside the keyboard (DANGEROUS) 495 bool CONFIG_CHEER 496 depends on CONFIG_BOOM 497 default y 498 help 499 Output nice messages when you explode 500endif 501 502Generally, CONFIG_EXPERIMENTAL should surround all options not considered 503stable. All options that are known to trash data (experimental write- 504support for file-systems, for instance) should be denoted (DANGEROUS), other 505experimental options should be denoted (EXPERIMENTAL). 506 507 508 Chapter 11: Data structures 509 510Data structures that have visibility outside the single-threaded 511environment they are created and destroyed in should always have 512reference counts. In the kernel, garbage collection doesn't exist (and 513outside the kernel garbage collection is slow and inefficient), which 514means that you absolutely _have_ to reference count all your uses. 515 516Reference counting means that you can avoid locking, and allows multiple 517users to have access to the data structure in parallel - and not having 518to worry about the structure suddenly going away from under them just 519because they slept or did something else for a while. 520 521Note that locking is _not_ a replacement for reference counting. 522Locking is used to keep data structures coherent, while reference 523counting is a memory management technique. Usually both are needed, and 524they are not to be confused with each other. 525 526Many data structures can indeed have two levels of reference counting, 527when there are users of different "classes". The subclass count counts 528the number of subclass users, and decrements the global count just once 529when the subclass count goes to zero. 530 531Examples of this kind of "multi-level-reference-counting" can be found in 532memory management ("struct mm_struct": mm_users and mm_count), and in 533filesystem code ("struct super_block": s_count and s_active). 534 535Remember: if another thread can find your data structure, and you don't 536have a reference count on it, you almost certainly have a bug. 537 538 539 Chapter 12: Macros, Enums and RTL 540 541Names of macros defining constants and labels in enums are capitalized. 542 543#define CONSTANT 0x12345 544 545Enums are preferred when defining several related constants. 546 547CAPITALIZED macro names are appreciated but macros resembling functions 548may be named in lower case. 549 550Generally, inline functions are preferable to macros resembling functions. 551 552Macros with multiple statements should be enclosed in a do - while block: 553 554#define macrofun(a, b, c) \ 555 do { \ 556 if (a == 5) \ 557 do_this(b, c); \ 558 } while (0) 559 560Things to avoid when using macros: 561 5621) macros that affect control flow: 563 564#define FOO(x) \ 565 do { \ 566 if (blah(x) < 0) \ 567 return -EBUGGERED; \ 568 } while(0) 569 570is a _very_ bad idea. It looks like a function call but exits the "calling" 571function; don't break the internal parsers of those who will read the code. 572 5732) macros that depend on having a local variable with a magic name: 574 575#define FOO(val) bar(index, val) 576 577might look like a good thing, but it's confusing as hell when one reads the 578code and it's prone to breakage from seemingly innocent changes. 579 5803) macros with arguments that are used as l-values: FOO(x) = y; will 581bite you if somebody e.g. turns FOO into an inline function. 582 5834) forgetting about precedence: macros defining constants using expressions 584must enclose the expression in parentheses. Beware of similar issues with 585macros using parameters. 586 587#define CONSTANT 0x4000 588#define CONSTEXP (CONSTANT | 3) 589 590The cpp manual deals with macros exhaustively. The gcc internals manual also 591covers RTL which is used frequently with assembly language in the kernel. 592 593 594 Chapter 13: Printing kernel messages 595 596Kernel developers like to be seen as literate. Do mind the spelling 597of kernel messages to make a good impression. Do not use crippled 598words like "dont" and use "do not" or "don't" instead. 599 600Kernel messages do not have to be terminated with a period. 601 602Printing numbers in parentheses (%d) adds no value and should be avoided. 603 604 605 Chapter 14: Allocating memory 606 607The kernel provides the following general purpose memory allocators: 608kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API 609documentation for further information about them. 610 611The preferred form for passing a size of a struct is the following: 612 613 p = kmalloc(sizeof(*p), ...); 614 615The alternative form where struct name is spelled out hurts readability and 616introduces an opportunity for a bug when the pointer variable type is changed 617but the corresponding sizeof that is passed to a memory allocator is not. 618 619Casting the return value which is a void pointer is redundant. The conversion 620from void pointer to any other pointer type is guaranteed by the C programming 621language. 622 623 624 Chapter 15: The inline disease 625 626There appears to be a common misperception that gcc has a magic "make me 627faster" speedup option called "inline". While the use of inlines can be 628appropriate (for example as a means of replacing macros, see Chapter 11), it 629very often is not. Abundant use of the inline keyword leads to a much bigger 630kernel, which in turn slows the system as a whole down, due to a bigger 631icache footprint for the CPU and simply because there is less memory 632available for the pagecache. Just think about it; a pagecache miss causes a 633disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles 634that can go into these 5 miliseconds. 635 636A reasonable rule of thumb is to not put inline at functions that have more 637than 3 lines of code in them. An exception to this rule are the cases where 638a parameter is known to be a compiletime constant, and as a result of this 639constantness you *know* the compiler will be able to optimize most of your 640function away at compile time. For a good example of this later case, see 641the kmalloc() inline function. 642 643Often people argue that adding inline to functions that are static and used 644only once is always a win since there is no space tradeoff. While this is 645technically correct, gcc is capable of inlining these automatically without 646help, and the maintenance issue of removing the inline when a second user 647appears outweighs the potential value of the hint that tells gcc to do 648something it would have done anyway. 649 650 651 Chapter 16: Function return values and names 652 653Functions can return values of many different kinds, and one of the 654most common is a value indicating whether the function succeeded or 655failed. Such a value can be represented as an error-code integer 656(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, 657non-zero = success). 658 659Mixing up these two sorts of representations is a fertile source of 660difficult-to-find bugs. If the C language included a strong distinction 661between integers and booleans then the compiler would find these mistakes 662for us... but it doesn't. To help prevent such bugs, always follow this 663convention: 664 665 If the name of a function is an action or an imperative command, 666 the function should return an error-code integer. If the name 667 is a predicate, the function should return a "succeeded" boolean. 668 669For example, "add work" is a command, and the add_work() function returns 0 670for success or -EBUSY for failure. In the same way, "PCI device present" is 671a predicate, and the pci_dev_present() function returns 1 if it succeeds in 672finding a matching device or 0 if it doesn't. 673 674All EXPORTed functions must respect this convention, and so should all 675public functions. Private (static) functions need not, but it is 676recommended that they do. 677 678Functions whose return value is the actual result of a computation, rather 679than an indication of whether the computation succeeded, are not subject to 680this rule. Generally they indicate failure by returning some out-of-range 681result. Typical examples would be functions that return pointers; they use 682NULL or the ERR_PTR mechanism to report failure. 683 684 685 686 Appendix I: References 687 688The C Programming Language, Second Edition 689by Brian W. Kernighan and Dennis M. Ritchie. 690Prentice Hall, Inc., 1988. 691ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback). 692URL: http://cm.bell-labs.com/cm/cs/cbook/ 693 694The Practice of Programming 695by Brian W. Kernighan and Rob Pike. 696Addison-Wesley, Inc., 1999. 697ISBN 0-201-61586-X. 698URL: http://cm.bell-labs.com/cm/cs/tpop/ 699 700GNU manuals - where in compliance with K&R and this text - for cpp, gcc, 701gcc internals and indent, all available from http://www.gnu.org/manual/ 702 703WG14 is the international standardization working group for the programming 704language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ 705 706Kernel CodingStyle, by greg@kroah.com at OLS 2002: 707http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ 708 709-- 710Last updated on 2006-December-06. 711