123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696 |
- STYLE(9)
- ========
- :doctype: manpage
- :man source: X15
- :man manual: X15 Kernel Developer{rsquo}s Manual
- NAME
- ----
- style - kernel coding style and rules
- DESCRIPTION
- -----------
- This document describes the preferred coding style for the X15 kernel,
- which is a close variant of the modern K&R style. It probably does not
- encompass all the rules, which are quite numerous despite efforts to
- keep them simple and compact. As a result, developers should compare
- their work against existing recent modules and infer the rules that
- may not be mentioned here. If you identify multiple rules that seem
- to conflict, assume any of them is valid. If in doubt, make a pragmatic
- decision and move forward.
- EXPLICIT PROGRAMMING
- --------------------
- As a general rule, the coding style for the X15 kernel emphasizes what
- could be called "explicit programming", which means that it encourages
- developers to clearly state their intent in the code itself. There are
- various ways to attempt that, and it can never be completely achieved,
- but it remains a strong guiding principle. For example, it is the main
- reason behind the adoption of the C programming language, despite its
- limitations, and object-oriented programming, despite the lack of support
- in the language.
- The price of explicit programming is verbosity. Correlations between
- code quantity and bugs do exist, and they form part of the motivation
- for more modern tools and languages, but such results merely hide the
- confounding variable behind the correlations : inadequate programming
- skills. Programming is a demanding activity, and kernel programming
- even more so. Reducing the number of bugs by writing less code is
- therefore not a robust strategy. Instead, developers should learn
- a set of good practices, one of them (hopefully) being described in
- this document. Since it is expected from developers that they spend
- most of their time and efforts designing their code, the ratio of the
- time spent writing code compared to the total development time should
- be low, from 5 to 30% most of the time. The increase in verbosity should
- therefore be close to negligible. On the other hand, the code should be
- very easy to understand and follow, which matters a lot more because
- there are a lot more people likely to read a piece of code compared
- to developers who have written that same piece of code. Besides, authors
- often find themselves in situations where they have forgotten the details
- of their own code and must understand it again.
- The environment around the X15 kernel closely adheres to the Unix culture.
- As a result, for developers with little or no experience with Unix, it is
- suggested to read {the-art-of-unix-programming} by Eric S. Raymond. Despite
- its age, it is still very relevant. The section "Basics of the Unix Philosophy"
- is of particular interest.
- LANGUAGES
- ---------
- The kernel is mostly written in the C programming language, conforming
- to ISO/IEC 9899:2011 (C11), with GNU extensions. While many GNU extensions
- are used, nested functions are forbidden. They are considered too specific
- to the GCC compiler. Note that X15 can currently be built with both GCC
- and Clang, and the latter has no support for nested functions. Also note
- that the kernel only expects the freestanding C environment from the compiler.
- Since X15 is low level system software, processor-specific assembly is
- inevitable. Developers should always refrain from writing assembly code
- when possible. Obviously, that code must only be used in
- architecture-specific modules.
- OBJECT-ORIENTED PROGRAMMING
- ---------------------------
- The X15 kernel is a set of functional modules which often implement
- objects. As a result, it naturally borrows some properties of
- object-oriented programming. Structures are used to define classes,
- and functions are used to implement methods, including constructors
- and destructors.
- Note however that only a subset of the object-oriented programming
- paradigm is applied. In particular, by choice more so than by technical
- lack, composition rather than inheritance is how code reuse is achieved,
- because inheritance is intrinsically implicit, which is against the
- general idea of explicit programming. In addition, polymorphic behaviour
- can also be achieved through abstract interfaces implemented with
- function pointer structures, an alternative which retains the property
- of being explicit to readers.
- The main purpose of OOP in X15 is to guide the structure of the code
- and data, so that developers have somewhat easy ways to determine how
- to group data together, assemble operations around them, and name them.
- Encapsulation naturally emerges as a side-effect of keeping interfaces
- as compact as possible. By restricting the selected practices of OOP,
- we get the banana, and the gorilla happily remains in the jungle.
- LINES
- -----
- Lines are normally limited to 80 columns, unless there is a compelling
- reason not to. This makes it easy to identify code with too many levels
- of indentation, and also allows comfortable viewing on small screens,
- including smartphones and tablets, as well as using multiple buffers
- side-by-side, which happens to also be convenient on larger screens.
- When defining a function, always break the line right before its name,
- so that qualifiers and the return type are on their own line. As a
- result, there is enough space for arguments even when the function
- name is long. It also makes functions easier to find with grep.
- Use at most one empty line as a separator. Use empty lines around
- functions and blocks, and also when you consider it appropriate
- inside blocks, e.g. to isolate critical sections, and/or locking
- functions.
- COMMENTS
- --------
- Each source file must start with a copyright statement comment,
- followed by a description of the file, unless the content is simple
- and obvious enough not to need one. The copyright statement and the
- description must be separated by two empty comment lines :
- [source,c]
- --------------------------------------------------------------------------------
- /*
- * Copyright (c) 2017 John Smith.
- *
- * This program is free software: you can redistribute it and/or modify
- * etc...
- *
- *
- * Description of the module.
- */
- --------------------------------------------------------------------------------
- Comments are written in C style, not C++. Here are a few examples :
- [source,c]
- --------------------------------------------------------------------------------
- /* Most single-line comments look like this */
- /*
- * Multi-line comments look like this. They should also be used for
- * "reference" descriptions in public headers.
- */
- struct dummy {
- int useless; /* Here is a way to provide short member descriptions */
- char moot; /* But don't hesitate to move descriptions above members
- when they spawn too many lines on the right side. */
- };
- /*
- * Instead of using C comments to disable code blocks, use the #if 0
- * preprocessor directive. Remember that C comments do not nest.
- */
- #if 0
- ...
- #endif
- --------------------------------------------------------------------------------
- Comments should not be abused. A large number of comments creates noise
- that readers must filter. Instead of writing a comment, ask yourself
- whether you can improve the meaning of the code itself, e.g. by changing
- names or breaking the code into smaller functions with names that are
- good enough to convey the message of the comment. Prefer to describe data
- over code. Comments in the code must point at details that are important
- and not obvious, and quality code should have few of those.
- Note that the project does not use an annotation format for documentation
- generation such as Doxygen, because, despite the undeniable improvement over
- other types of tools, the benefit still seems too small considering the
- additional rules that developers must keep in mind, the effort spent on
- checking the annotations and the duplication that they may cause, and the
- actual use of the generated documentation compared to direct source code
- browsing.
- INDENTATION
- -----------
- An indentation level is 4 spaces. It is strongly recommended, although
- not compulsory, to avoid using more than 3 indentation levels. More
- levels are tolerated for very short blocks. Combining 4 spaces per level
- with 80 columns allows the use of somewhat long, significant names, and
- naturally warns that a function should be simplified when the code becomes
- overly crammed to the right.
- Tabulation characters are strictly forbidden in source files, and should
- only be used in Makefiles where absolutely required. This allows everyone
- to get the same view, whatever the tools, including editors, comparison
- tools, web browsers, etc... It also removes the question of how to align
- lines in the code, e.g. when a function call spans multiple lines.
- In a switch statement, the case labels are aligned on the same column as
- the switch statement :
- [source,c]
- --------------------------------------------------------------------------------
- switch (var) {
- case VAL1:
- do_one_thing();
- break;
- case VAL2:
- case VAL3:
- do_another();
- break;
- }
- --------------------------------------------------------------------------------
- When conditions don't fit on a single line, break before operators,
- and indent the new line on the same column as the associated condition.
- Use parentheses to make precedence explicit, even when not strictly needed.
- Here is an example :
- [source,c]
- --------------------------------------------------------------------------------
- static void
- do_something(void)
- {
- if (overly_long_stupid_condition_name_that_you_shouldn_t_use
- && another_overly_long_stupid_condition_name
- && (yet_another_variable_with_an_overly_long_name
- || (a_first_long_variable_name != a_second_long_variable_name))) {
- return;
- }
- }
- --------------------------------------------------------------------------------
- One reason for those rules is to make reviews easier, by pushing code as much
- to the left as possible. The code should allow superficial reviews by just
- taking a quick look at the left side of the code, e.g. control statements,
- most conditions, operators, function names, their first parameter, etc...
- A line which reaches the right side marks a spot that may need more careful
- review.
- Don't write multiple statements on a single line.
- BRACES
- ------
- Opening braces are written at the end of lines containing control statements
- or struct/union/enum definitions. Closing braces are written on the same
- column as their associated control statement. Opening braces for functions
- are written on their own line after the arguments. Braces must also be used
- for single-line blocks, as this reduces both efforts when adding debugging
- code in such blocks and risks when merging conflicting code. In the case of
- multi-part statements, closing braces are written one the same line as the
- continuation. Here is an example :
- [source,c]
- --------------------------------------------------------------------------------
- void
- do_something(void *arg)
- {
- if (arg == NULL) {
- return;
- } else {
- print_what_it_is(arg);
- }
- do {
- play_with(arg);
- } while (can_play_with(arg));
- }
- --------------------------------------------------------------------------------
- The main reason for this style is that it allows line compression with
- almost no loss of readability, since it is easy to identify where blocks
- start.
- SPACES
- ------
- Spaces are used around most keywords. The exceptions are sizeof, typeof,
- alignof, and pass:[__attribute__], because of their similarity with
- functions. These exceptions must be used with parentheses. Spaces are also
- used around operators, except unary operators. For example :
- [source,c]
- --------------------------------------------------------------------------------
- if (!skip_copy && (memcmp(&a, &b, sizeof(a)) != 0)) {
- memcpy(&a, &b, sizeof(a));
- }
- --------------------------------------------------------------------------------
- When declaring pointer variables, use a space before "`*`", but not after,
- so that the "`*`" character and the variable name are adjacent. The main
- reason is that, while pointers are variables, with a type of their own,
- they are a special kind of variables, where the pointer nature is part
- of the variable, rather than just the type. Another is to reduce mistakes
- when declaring several variables on the same line. On the other hand, when
- writing functions returning pointers, use spaces both before and after
- "`*`". The reason is to always clearly separate returned types from the
- function name. Here is an example :
- [source,c]
- --------------------------------------------------------------------------------
- static struct my_type * my_function(struct my_type *var);
- /* ... */
- static struct my_type *
- my_function(struct my_type *var)
- {
- struct my_type *a, tmp;
- a = do_something(var, &tmp);
- return a;
- }
- --------------------------------------------------------------------------------
- Don't leave trailing spaces at the end of lines.
- NAMES
- -----
- First of all, use underscores instead of camel case to separate words
- in compound names. Function and variable names are in lower case. Macro
- names generally are in upper case, except for macros which behave
- strictly like functions and could be replaced with inline functions
- with no API change.
- There are two ways to name symbols, depending on whether they're global
- or not. Global symbols, either private (declared static) or public
- (part of the publically exposed interface), must always be prefixed with
- the namespace of the module they belong to. This makes it easy to know
- that a symbol is global and where to find it in the project. It also
- makes it possible to move definitions between the opaque implementation
- and header files with few diffs. Names for local variables should be
- short, with no prefixes, since prefixes are used to add context, and
- local variables are confined to the current context.
- When a module defines an object type, the type name immediately follows
- the module name. Functions applying to objects (methods) are named by
- adding the method name to the object type name. The name of helper
- functions and methods is simply added at the end of the main function
- that calls them.
- Here are some examples :
- [source,c]
- --------------------------------------------------------------------------------
- /*
- * Object type "cpu_pool" of module "kmem".
- */
- struct kmem_cpu_pool;
- /*
- * Method "init" of object type "cpu_pool" of module "kmem".
- */
- static void kmem_cpu_pool_init(struct kmem_cpu_pool *cpu_pool,
- struct kmem_cache *cache);
- /*
- * Object type "cache" of module "kmem".
- */
- struct kmem_cache;
- /*
- * Helper function "verify" of method "alloc" of object type "cache" of
- * module "kmem".
- */
- static void kmem_cache_alloc_verify(struct kmem_cache *cache, void *buf,
- int construct);
- /*
- * Method "alloc" of object type "cache" of module "kmem".
- */
- void *
- kmem_cache_alloc(struct kmem_cache *cache)
- {
- struct kmem_cpu_pool *cpu_pool;
- int filled, verify;
- void *buf;
- /* ... */;
- }
- /*
- * Function "info" of module "kmem".
- */
- void kmem_info(void);
- --------------------------------------------------------------------------------
- FUNCTIONS
- ---------
- Functions should do one thing, and do it well. Their name should be
- carefully chosen to best reflect their operation. Ideally, functions
- should be short enough to fit in a "page", i.e. not much more than
- 25 lines. The real metric developers should use to decide whether to
- break a function or not is the number of variables, including parameters.
- Assuming most people can remember around 5 things at the same time, you
- should break functions when the number of variables gets higher than
- this value. Variables such as a buffer pointer and its size can be
- considered a single variable.
- Do not hesitate to break functions down when things get even slightly
- complicated. If that helps, remember that static functions can easily
- be inlined by the compiler. See <<inline_functions,inline functions>>.
- Always name arguments in function prototypes.
- Common call protocols
- ~~~~~~~~~~~~~~~~~~~~~
- Functions should match one of the following call protocols :
- type get_value(...)::
- Accessor with no side effect.
- void do_something(...)::
- Function that cannot fail.
- struct my_struct * my_struct_lookup(...)::
- Function that returns an object, or NULL if an error occurred.
- int do_something(...)::
- Function that returns an error code, unless stated otherwise.
- It's easy to know when to return an object, or an error and the
- object through a pointer argument : if there is a single possible
- error code, let a NULL value encode that error, otherwise return
- it explicitely :
- struct my_struct * my_struct_create(...)::
- Return NULL if an error occurs, which may only be ENOMEM.
- int my_struct_create(struct my_struct **my_structp, ...)::
- Return an error (ENOMEM or something else) if creation fails, otherwise
- pass the new object to the caller through the double pointer argument
- and return 0.
- Methods, which are functions invoked on an object instance, must be
- defined so that their first argument is always the object instance
- on which the method is invoked.
- Unconditional branch statements
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Developers are encouraged to use unconditional branches in cases where
- they contribute to maintaining complexity low. Such cases include return
- on invalid argument, breaking or continuing simple loops, and centralized
- error handling. Note that this includes the *goto* statement. The rationale,
- beyond making error handling code common, is to keep the indentation level
- low. The main code flow, with the lowest indentation level, should match
- the most likely case. The compiler often uses heuristics based on keywords
- such as *return* or *goto* which are applied to the conditions of the
- containing block. You may also use the *likely* and *unlikely* macros
- to give hints about branches to the compiler in performance critical
- paths.
- Here is an example for object creation :
- [source,c]
- --------------------------------------------------------------------------------
- static int
- obj_create(struct obj **objp, int var)
- {
- struct subobj *subobj;
- struct obj *obj;
- int error;
- obj = kmem_cache_alloc(&obj_cache);
- if (obj == NULL) {
- return ENOMEM;
- }
- error = subobj_create(&subobj);
- if (error) {
- goto error_subobj;
- }
- error = obj_check_var(var);
- if (error) {
- goto error_var;
- }
- obj_init(obj, subobj, var);
- *objp = obj;
- return 0;
- error_var:
- subobj_destroy(subobj);
- error_subobj:
- kmem_cache_free(&obj_cache, obj);
- return error;
- }
- --------------------------------------------------------------------------------
- Local variables
- ~~~~~~~~~~~~~~~
- Local variables must be declared at the beginning of their scope. An empty
- line separates them from the function body. A notable exception is iterators,
- which may be declared inside loop statements.
- Common functions and methods
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The following list can be used to easily find appropriate standard names
- for the most common operations.
- *bootstrap*::
- This function is used for early module initialization. It should only
- exist if initialization must be broken down into multiple steps. If
- there is a single initialization step, run it in the *setup* function.
- *setup*::
- This function is used for module initialization.
- *init*::
- Object initialization that may not fail (e.g. without memory allocation).
- *build*::
- Object construction, i.e. initialization that may fail.
- *cleanup*::
- Object clean-up, releasing internal resources but not the object itself.
- *create*::
- Object creation, including both allocation and construction.
- *destroy*::
- Object destruction, releasing all resources including the object itself.
- *ref*::
- Add a reference to an object.
- *unref*::
- Remove a reference from an object, destroying it if there are no more
- references.
- *acquire* or *lock*::
- Acquire exclusive access to an object.
- *release* or *unlock*::
- Release exclusive access to an object.
- *add*::
- Add an object to a container.
- *remove*::
- Remove an object from a container.
- *lookup*::
- Look up an object in a container.
- [[inline_functions]]
- Inline functions
- ~~~~~~~~~~~~~~~~
- Do not abuse the inline keyword. First of all, this keyword is only a
- hint for the compiler, with no guarantee that a function will actually
- be inlined in the generated code. In addition, decent compilers can
- already inline functions without the presence of the keyword. Private
- functions (declared static) are easily inlined because the compiler
- knows it doesn't need to produce an externally visible symbol. Besides,
- with link-time optimizations (LTO), public functions can also be inlined
- or even removed if unused. As a result, use inline for a few performance
- critical short functions only.
- Passing arrays as arguments
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The C language automatically converts arrays to pointers when passed as
- function arguments. As a result, declaring an argument as an array is
- misleading. In particular, it may give the impression that the sizeof
- operator may be used safely to get the size of the array, when it
- actually always returns the size of a pointer. Therefore, declaring
- arguments as arrays is strictly forbidden.
- GIT COMMITS
- -----------
- Commits should be atomic, which means a single commit should focus on
- one functional modification. The size of the diff doesn't matter.
- The log format complies with the standard Git format, i.e. the log
- must start with a single line describing the changes, followed by
- an empty line and more details if necessary. Since any commit may be
- sent as email, try to limit lines to around 70 characters, so that
- there is room for replies before reaching/exceeding 80 characters.
- In addition to the standard Git format, the first line should start
- with the logical path of a module, e.g. *kern/kmem* or *x86/pmap*
- for architecture-specific modules, when changes are local to one
- or a few modules, which should be the case most of the time. For
- example :
- --------------------------------------------------------------------------------
- kern/kmem: undefine KMEM_VERIFY
- That macro was left over during the slab allocator rework.
- --------------------------------------------------------------------------------
- The first line being a short description, usable as an email subject,
- there is no need for a final dot. This also keeps it slightly shorter.
- --------------------------------------------------------------------------------
- kern/{sleepq,turnstile}: handle spurious wakeups
- --------------------------------------------------------------------------------
- This format is based on brace expansion in the bash shell.
- MISCELLANEOUS
- -------------
- *typedef* usage
- ~~~~~~~~~~~~~~~
- Usage of the *typedef* keyword is generally avoided. It is used in tricky
- situations, such as architecture-specific integer types, function pointers,
- and truely opaque data (none of which exist at the time of this writing).
- This rule is driven by the need to know as much as possible the nature of
- the data being processed. Readers should quickly be able to know whether
- a variable is an integer or a structure (or a union, which can cause bad
- surprises), and of course if it's a pointer. This could be done with a
- naming scheme, but that would only add unnecessary rules.
- Developers are allowed to use *typedef* for function pointers because,
- unlike a structure, without *typedef* a function pointer type has no name.
- It is strongly advised to end the name of the type with the suffix _fn_t.
- *sizeof* usage
- ~~~~~~~~~~~~~~
- One of the pillars of robust C code is correct usage of the *sizeof*
- operator. The main rule here is to always use *sizeof* on variables,
- not types. In addition, when iterating on arrays, use the *ARRAY_SIZE*
- macro, which can be thought of as *sizeof* in terms of items rather
- than bytes (technically characters). It is important that sizes are
- directly related to the data being processed, instead of e.g. reusing
- the macro used to declare an array again in the iteration code.
- Boolean Coercion
- ~~~~~~~~~~~~~~~~
- Boolean coercion should only be used when the resulting text is semantically
- valid, i.e. it can be understood that the expression is either true or false.
- For example :
- [source,c]
- --------------------------------------------------------------------------------
- int error;
- error = do_something();
- /* Valid */
- if (error) {
- if (error != EAGAIN) {
- printf("unexpected error\n");
- }
- return error;
- }
- int color;
- color = get_color();
- /* Invalid */
- if (!color) {
- print_black();
- } else if (color == 1) {
- print_red();
- }
- --------------------------------------------------------------------------------
- There are historical uses of the *int* type for boolean values, but all of
- them should be converted to the C99 *bool* type.
- Side effects and conditions
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Conditions must never be made of expressions with side effects. This
- completely removes the need to think about short-circuit evaluations.
- Increment/decrement operators
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Increment and decrement operators may never be used inside larger expressions.
- Their usage as expression-3 of a for loop is considered as an entire expression
- and doesn't violate this rule. The rationale is to avoid tricky errors related
- to sequence points. Since such operators may not be used inside larger
- expressions, there is no difference between prefix and postfix operators.
- The latter are preferred because they look more object-oriented, somewhat
- matching the object.method() syntax.
- Function-like macros with local variables
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Variable shadowing occurs in function-like macros if one of their local
- variables has the same name as one of the arguments. To avoid such situations,
- local variables in function-like macros must be named with an underscore
- suffix. This naming scheme is reserved for function-like macro variables.
- SEE
- ---
- manpage:intro
- {x15-operating-system}
- {the-art-of-unix-programming}
|