Adopting C programming conventions

by Jean Lacrosse , TechOnline India - April 28, 2011

This paper discusses some of the problems found in a lot of code today and suggests how they can be avoided. There are many ways you and your embedded development team can improve code quality and in the process, become significantly more productive.

This paper discusses some of the problems found in a lot of code today and suggests how they can be avoided. There are many ways you and your embedded development team can improve code quality and in the process, become significantly more productive. Techniques are presented to organize project directories, naming files, laying out code, naming variables, functions, and more. Examples are presented for C but most of the concepts apply to other languages.

Today's competitive world forces us to introduce products at an increasingly faster rate due to one simple fact of business life: having a product out first may mean acquiring a major share of the market. One way to help make this possible is to assure that the mechanics of writing code become second nature. All project members should clearly understand where each file resides on the company's file server, what each file should be named, what style to use, and how to name variables and functions.

The topic of coding conventions is controversial because we all have our own ways of doing things. One way is not necessarily better than the other. However, it's important that all team members adopt a single set of rules and that these rules are followed religiously and consistently by all participants. The worse thing that you can do is to leave each programmer to do his or her own thing. Such an undisciplined activity will certainly lead to chaos. When you consider that close to half of the development effort of a software-based system comes after its release, why not make the sometimes unpleasant task of supporting code less painful?

In this paper, I'll share some of the conventions I've been using for years and I hope that you'll find some of them useful for your own organization. I urge you to document your own conventions because it makes life easier for everyone, especially when it comes to supporting someone else's code.

Directory structures

One of the first rules to establish is how files are organized. Do you place all the source files in a single directory or do you create different directories for different pieces? I like to use a structure similar to that shown in Table 1.

Each product (such as ProdName) has its own directory under PRODUCTS\. If a product requires more than one microprocessors then each has its own directory under ProdName\. All products that contain a microprocessor has a SOFTWARE\ directory. The SOURCE\ directory contains all the source files that are specific to the product. If you have highly modular code and strive to reuse as much code as possible from product to product, the SOURCE\ directory should generally only contain about 10% to 20% of the software which makes the product unique. The remaining 80% to 90% of the code should be located in the \SOFTWARE directory (discussed later). The DOC\ directory contains documentation files specific to the software aspects of the product (such as specifications, state diagrams, flow diagrams, and software description). The TEST\ directory contains product build files (such as batch files, make files, and IDE project) for creating a test version of the product. A test version will build the product using the source files located in the SOURCE\ directory of the product, any reusable code (building blocks), and any test-specific source code you may want to include to verify the proper operation of the application. The latter files generally reside in the TEST\ directory because they don't belong in the final product. The PROD\ directory contains build files for retrieving any released versions of your product. The other directories under ProdName\ are provided to show you where other disciplines within your organization can store their product-related files. In fact, this scheme makes it easy to backup or archive all the files related to a given product whether they're related to software or not.

The \SOFTWARE\ directory is where you could store all reusable, non-product specific files. I call these building blocks, and each contains its own documentation. 

Source files

It's well known that coding is the only aspect of programming that must be done to have a product. All the documentation in the world is useless if it doesn't reflect what the code does. C has been called a write-only language because once you have written the code, it's difficult to read and understand what it does. I believe that any language can be made write-only because this depends more on your attitude than the language you use. I always say that, if you hate writing (or typing) then you're in the wrong business. Many programmers write for themselves and are not concerned with the life of a product. Another sign that you're in the wrong business is if you believe that your job is done once the code works. A major portion of a product's life consist of maintenance. In fact, in most cases, maintenance accounts for the majority of a product's development cost and thus, you should write source code to facilitate maintenance.

One of the easiest ways I found to accomplish this goal is to adopt a clean and consistent coding style. Which style you decide to use doesn't matter, as long as you and others in your organization follow a common guide. To that end, somebody should document the style used in your organization and have everyone follow it religiously. A few years ago, I read an article in the Hewlett-Packard Journal that stated that the product development manager decided to have the team adopt a common coding style (a potentially dangerous management decision!).1 The team members were initially reluctant about having to conform to the style guide, but at the completion of the project everybody was impressed by the productivity gains. Team members were able to help each other because they didn't have to adjust themselves to other members' coding styles to find bugs.

Source files should be grouped by modules with each module having its own directory. Taking C as an example, a directory may contain as few as two files (a .C and a .H file) or as many files as needed to perform the functions of the module. In some cases, lookup tables are placed in the .C file along with the module's code while in other cases, large tables are located in their own files. All files within a module share a common file name prefix. For example, a display module may consist of three files: DISP.C contains code and variables, DISP.H contains function prototypes and DISP_TC.C contains tables (the _TC means Tables written in C). If you have multiple types of display modules (such as LCD and LED), each can be located in its own directory. For example, the LCD module can be placed in \SOFTWARE\LCD while the LED module in \SOFTWARE\LED. Also, the name of the files can be the same in both cases. In fact, you should try to provide the same functionality (such as function names and variable names) whether you have an LCD-based display or an LED-based display. Where the application code is concerned, it only knows that it has a display, and the product requirements dictate which type is used.

I don't like to limit the width of source code to 80 characters. In fact, I still don't know why some programmers limit themselves to 80 columns. Deciding on the width of a source file can become controversial. I believe that you should not limit your source code to 80 columns even if you never print your source code. The reason is simple. I like to put executable statements on the left and comments on the right. This simple technique makes it easier to follow your code—your mind doesn't continuously have to "filter" the code from the comments if the comments are interleaved with your code (more on this later). The 80-column rule comes from old text-based monitors that allowed you to display only 80-characters wide and was useful for code reviews when code was printed. Today's large monitors can easily display 200-columns wide and quite frankly, I don't recall the last time I actually printed code.

A source file should contain the following sections and these sections should always be in the same order. Header files should never contain code except for macros.

1. File heading.
2. Revision history.
3. #defines and macros.
4. #includes.
5. Variables.
6. Function prototypes.
7. Functions.

The file heading section is a comment block that contains your company name, address, copyright notice, file name, author's name, and a description of the module.

The revision history section is also a comment block that describes what changes were made to the module over time. This section can automatically be filled in by your version control software when the module is released (assuming you're using version control software—hopefully you do). I have seen cases where the revision history section is actually at the end of the file so that the line numbers are not affected as much each time revision comments are inserted. Either way is acceptable.

#defines and macros are located in three different places. First, if #defines and macros are only applicable to the module, they're placed in the module's .C file (for example, equating the different states of a state machine). There is no point of extending the scope of something that is used locally. If used by multiple .C files in a given module, #defines and macros can be placed in the module's .H file to make them globally visible to the module. Finally, if the #defines are meant to be product specific, they're placed in a file called APP.H (you can certainly use a different name) in the product's directory. For example, the size of a module's buffer can be specified in APP_CFG.H (application configuration) instead of the module itself. Also, you can #define a constant to enable/disable compilation of a feature in a module. This allows your code/data size to be reduced in case you don't need all the functionality of a module.

In other cases, conditional compilation is used to enable a version of an algorithm. For example, a CRC (cyclic redundancy check) module can contain two versions. The first version can be slow but require very little ROM while the other can be very fast but require the use of a 512-bytes table and thus consumes more ROM. In this case, setting the #define CRC_CFG_FAST_EN to 1 selects the faster version. Where the application is concerned, it doesn't know the difference. #defines and macros are always written using uppercase characters with an underscore character separating words. This agrees with the conventions established by Kernighan and Ritchie.[2]

Every .C file whether product specific or part of a reusable module contains a #include section that always consists of a single statement as follows:

#include "INCLUDES.H"

I like to use a single "master" include file called INCLUDES.H whose contents is defined in the product's directory. I use a single master header file because it prevents you from having to remember which header file goes with which source file, especially when new modules are added. When you design reusable modules, you will never have to remember which header files are needed with which module—they are always all included. If you add a new module, you simply include its header file in INCLUDES.H. The only inconvenience that I found is that it takes a little bit longer to compile each file but, this is barely perceptible with today's fast computers. Some people hate the idea of exposing all the header files to all the .C files because they believe programmers will then start accessing everything that the headers are exposing. I believe that can easily be controlled. Using a single include file is a matter of preference.

Variables

Most of today's C compilers conform to the ANSI X3J11 standard, which allows up to 32 characters for identifiers. Descriptive variables can be formulated using this 32-character feature and the use of acronyms, abbreviations, and mnemonics (see section on acronyms, abbreviations, and mnemonics). Variable names should reflect what the variable is used for. I like to use a hierarchical method when creating a variable. For instance, the array KeyBufIn[] indicates that it is part of the keyboard module (Key), it is a buffer (Buf)—and specifically the input buffer (In). Uppercase characters are used to separate words in a variable, but this rule only applies to global variables. This is called camel back.

In C, you can have two types of global variables: global to a file (also called local globals) and global to the rest of the world (such as the product). I personally don't think that globals are bad and thus should be avoided. Having globals doesn't mean that you should not encapsulate their access through interface functions. Having variables globally accessible may be beneficial when debugging and at run time to visualize what your product is doing.

All global variables (variables seen by other modules) are placed in the .H file of the module and not in the .C file. In C, however, the .H file generally contains an extern statement so, the question is: "How do you extern a variable and allocate storage for it at the same time?" The answer is that you use conditional compilation through the C preprocessor as shown in the following statements that are placed at the beginning of the .H file:

#ifdef xxx_GLOBALS
#define xxx_EXT
#else
#define xxx_EXT extern
#endif

Where, xxx is the name of the module. Each variable that needs to be declared global will be prefixed with xxx_EXT in the .H file. The module's .C file will contain the following declarations:

#define xxx_GLOBALS
#includes "INCLUDES.H"

When the compiler processes the .C file, it forces xxx_EXT (found in the corresponding .H file) to "nothing" (because xxx_GLOBALS is defined) and thus each global variable will be allocated storage space. When the compiler processes the other .C files, xxx_GLOBALS will not be defined for that module and thus xxx_EXT will be set to extern allowing you to reference the global variables. To illustrate this concept, let's suppose you need to create the following global variables:

CPU_INT08U CtrlState;
CPU_FP32 CtrlLevel;
CPU_INT16U CtrlCtr;

CTRL.H would look like this:

#ifdef CTRL_GLOBALS
#define CTRL_EXT
#else
#define CTRL_EXT extern
#endif

CTRL_EXT CPU_INT08U CtrlState;
CTRL_EXT CPU_FP32 CtrlLevel;
CTRL_EXT CPU_INT16U CtrlCtr;

CTRL.C would look like this:

#define CTRL_GLOBALS
#includes "INCLUDES.H"

The nice thing about this technique is that you don't have to declare global variables in the .C file and duplicate the statements with the addition of the extern attribute in the .H file. You not only save a lot of time but you also reduce the chances of introducing an error in the process. Once you use this technique, you'll never want to declare globals any other way.

Variable names should be declared on separate lines rather than combining them on a single line. Separate lines make it easy to provide a descriptive comment for each variable. You should also explicitly declare the data type of every variable instead of relying on the default, int.

By convention, all variables are prefixed with the module's name. This convention makes it quite easy to know where variables are declared when you're dealing with large applications. Furthermore, a file scope global should have the underscore character (in other words, ‘_') after the module name. For example, a local global variable in the file CTRL.C would be prefixed by Ctrl_. Because CTRL.C will most likely manipulate Ctrl variables you will be able to know whether the variables are declared at the top of CTRL.C (Ctrl_ prefix) or in CTRL.H (Ctrl prefix).

Formal arguments to a function and local variables within a function are declared in lowercase. The lowercase convention makes it obvious that such variables are local to a function because, also by convention, global variables will contain a mixture of upper- and lowercase characters. To make local variables or function arguments readable, you can use the underscore character (in other words, _ ). Within functions, certain variable names can be reserved to always have the same meaning. Some examples are given below but others can be used as long as consistency is maintained.

i, j, and k    for loop counters.
p1, p2 ... pn    for pointers.
c, c1  ... cn    for characters.
s, s1  ... sn    for strings.
ix, iy, and iz    for intermediate integer variables
fx, fy, and fz    for intermediate floating-point variables

Structures are typedef since this allows a single name to represent the structure. The structure type is declared using all uppercase characters with underscore characters used to separate words, shown in Listing 1.

                            

 

I find it very useful to include the name of the structure in the suffix of a pointer as shown below. This allows the reader to know what structure the element being referenced belongs to.

p_line->>Color;

To summarize, global variables should use the file/module name (or a portion of it) as a prefix and should make use of upper-/lowercase characters. File scope globals should have an underscore character following the module's prefix. Function arguments and local variables should use only lowercase characters. #define constants and macros are always written in uppercase with underscore characters separating words for sake of legibility.

Acronyms, abbreviations, and mnemonics

When creating names for variables and functions, it's often useful to use acronyms (such as OS, ISR, TCB), abbreviations (such as buf and doc), and mnemonics (such as clr and cmp). Their use allows an identifier to be descriptive while requiring fewer characters. Unfortunately, if the terms are not used consistently, they may add confusion. To ensure consistency, you should create a list of acronyms, abbreviations, and mnemonics that you will use in all your projects. I call this list the Acronyms, Abbreviations, and Mnemonic Dictionary. Once it is assigned, an acronym, abbreviation, or mnemonic is used throughout. As we need more terms, we simply add them to the list. Once everyone has agreed that Buf means buffer, all project members should use that instead of having some individuals use Buffer and others use Bfr. To further this concept, you should always use Buf even if your identifier can accommodate the full name. In other words, stick to Buf even if you can fully write the word Buffer.

There might be instances where one list for all products doesn't make sense. For instance, if you are an engineering firm working on a project for different clients and the products that you develop are totally unrelated, a different list for each project would be more appropriate; the vocabulary for the farming industry is not the same as the vocabulary for the defense industry. My rule is that if all products are similar, they use the same dictionary.

Data types

While we're on the subject of variables, you may have noticed that I don't use the standard C types in variable declarations. In fact, unless you have to use the C standard library, you should avoid using C's data types because they're inherently not portable. An int can either be 8, 16, 32, or even 64 bits. Similarly, a float is either a 32-bit, a 64-bit or 80-bit value depending on the target processor and compiler. To resolve the portability issue, we create a header file (I call it CPU.H) that defines the following data types:

typedef unsigned char  CPU_INT08U;
typedef signed   char  CPU_INT08S;
typedef unsigned int   CPU_INT16U;
typedef signed   int   CPU_INT16S;
typedef unsigned long  CPU_INT32U;
typedef signed   long  CPU_INT32S;
typedef float          CPU_FP32;
typedef double         CPU_FP64;

The current version of the C standard "resolved" the issue of data type sizes by introducing data types that specifies the resolution of each type. However, I still contend that an all UPPERCASE data type makes code much more readable. That being said, you could declare the new data types while still using the C99 types as the base types as shown below:

typedef uint8_t CPU_INT08U;
typedef int8_t CPU_INT08S;
typedef uint16_t CPU_INT16U;
typedef int16_t CPU_INT16S;
typedef uint32_t CPU_INT32U;
typedef int32_t CPU_INT32S;
typedef float CPU_FP32;
typedef double CPU_FP64;

 

By convention, CPU_INT08S is always used to declare 8-bit signed variables, similarly, CPU_INT16U declares 16-bit unsigned variables, and so forth. Your application code as well as the reusable modules can now assume the appropriate range for each variable in a portable fashion. If you then decide to port your code to a different target, you'll only need to look up the definition of various data-type sizes in your compiler literature and change the above definitions.

Functions

Function naming follow the same convention as with global variables. Every function is prefixed with the module name; again, I use acronyms, abbreviations, and mnemonics. The first letter of each word is capitalized and local functions (file scope) have an underscore after the module name. I found that indenting four spaces works out well, but you should use whatever you are comfortable with.

Whatever you do, use spaces instead of tabs to indent your code. Tabs are interpreted differently on different editors and printers. Avoiding tab characters doesn't mean that you can't use the tab key on your keyboard. A good editor will give you the option to replace tabs with spaces (in this case, four spaces). If you have a lot of legacy code that contain tab characters, you can write a simple utility that scans your source code for tab characters and replaces each one with the number of spaces you decided to adopt.

Functions that are only used within the file should be declared static to hide them from other functions in different files. Each local variable name should be declared on its own line, an action that allows the programmer to comment each one as needed. Actual code statements should start after adding two blank lines after local variable declarations. This makes the delineation between variables and executable statements clear. A function should be declared as follows:
                            

void  CommRx (CPU_INT08U ch, CPU_INT08U c)
{
}

I even like the following style, which allows you to isolate the arguments onto its own line. When you have many arguments, it's a lot easier to see how many arguments there are, what their type is, and so on:

void
CommRx (CPU_INT08U  ch,
        CPU_INT08U  c)
{
}

You should note that I included a single space between the function name and the open parenthesis. This convention allows you to quickly locate where a function is actually declared when using your editor's search capability or even a grep utility. When you actually invoke the function, you should not include a space between the function name and the open parenthesis. This "feature" may not be necessary anymore if you use advanced editors that can jump to the function declaration when you "hover" over the name of the function or variable.

Your style guide should also specify how every C construct should be written. A space follows the keywords if, for, while, and do. The keyword else has the privilege of having one before and one after it if curly braces are used. We write if (condition) on its own line and the statement(s) to execute on the next following line(s):

if (y > 2) {
    z =  10;
    x = 100;
    p++;
} else {
    z =  5;
}

I always fully enclose statements within the if (condition) with curly braces even though the condition executes a single statement. This makes it convenient to add additional statements and prevents you from forgetting to add the curly braces when you add these extra statements. Also, the placement of curly braces follows the K&R style, but obviously you should adopt the style your organization is comfortable with.

Treat switch statements as you would any other conditional statement. Note that the case statements are lined up with the case label. The important point here is that switch statements must be easy to follow. Cases should also be separated from one another by a blank line.

switch (key) {
  case KEY_BS:
       if (cnt > 0) {
           p--;
           cnt--;
        }
        break;
       
  case KEY_CR:
        *p = NUL;
        break;
       

  case KEY_LINE_FEED:
        p++;
        break;
       
default:
        *p++ = key;
        cnt++;
        break;
}

By convention, I use for loops when I know ahead of time the number of iterations the loop will perform. On the other hand, I use while and do-while loops when the number of iterations is only known at run time as shown below.

for (i = 0; i < MAX_ITER; i++) {
     *p2++ = *p1++;
     xx[i] = 0;
}


while (*p1 != 0) {
     *p2++ = *p1++;
     cnt++;
}


do {
     cnt--;
     *p2++ = *p1++;
} while (cnt > 0);

You should avoid multiple assignments on the same line.

x = y = z = 1;

I also like to break up a long statement into multiple lines (as long as it's one statement). Note how the ‘+' sign lines up with the equal signs. This makes the code much more readable.


x2   = x    * x;
x3   = x2   * x;
x4   = x3   * x;
x5   = x4   * x;
x6   = x5   * x;
temp = b0   * x
     + b1   * x2;
     + b2   * x3;
     + b3   * x4;
     + b4   * x5;
     + b5   * x6;

The following operators are written with no space around them:

->> Structure pointer operator p->>m
. Structure member operator s.m
[]
Array subscripting a[i]

I also like to break up a long statement into multiple lines (as long as it's one statement). Note how the ‘+' sign lines up with the equal signs. This makes the code much more readable.The following operators are written with no space around them: Structure pointer operator. Structure member operator s Array subscripting

As previously mentioned, you declare a function with one space following its name and the open parenthesis while you invoke the function with no space after the function name. A space should be introduced after each comma to separate each actual argument in a function. Expressions within parentheses are written with no space after the opening parenthesis and no space before the closing parenthesis. Commas and semicolons should have one space after them.

 

strncat(t, s, n);
for (i = 0; i < n; i++)

The unary operators are written with no space between them and their operands:


!p ~b ++i --j (long)m *p &x sizeof(k)

The binary operators are preceded and followed by one or more spaces, as is the ternary operator:


c1 = c2 x + y i += 2 n > 0 ? n : -n;

The keywords if, while, for, switch and return are followed by one space.

For assignments, numbers are lined up in columns as if you were to add them. This allows you to quickly spot errors. The equal signs are also lined up. "Magic numbers" are shown here only for sake of illustration to show how the "weight" of the numbers should line up. Magic numbers must be avoided and, in fact, replaced with #define constants so they are more legible.

x        = 100.567;
temp     =  12.700;
var5     =   0.768;
variable =  12;
storage  = &array[0];


Comments

Comments should be meaningful and help you and others understand how the code works. Don't just state the obvious or what a reasonable programmer would conclude by simply looking at the code.

Each function should be preceded with a comment block to describe what the function does, what arguments are passed, what the function returns, and any other notes about the function.

I find it very difficult to mentally separate code from comments when code and comments are interleaved. Because of this, I avoid using this practice. Comments should go to the right of the actual C code. When large comments are necessary, they're written in the function description header or in a comment block before the actual code. Comments are lined up as shown in Listing 2. The comment terminators (*/) does not need to be lined up, but for neatness I prefer to do so. It is not necessary to have one comment per line since a comment could apply to a few lines.

                            

                             

 

Final words

You may not agree with some of the conventions that I adopt but what's important is that you recognize that you'll increase productivity and increase the quality of your code by having your organization work from a common set of rules. Programmers will certainly resist this kind of change but, the long-term benefits will be worth the struggle. I have concluded over the years that I much rather fight to have things done correctly the first time than spend double the effort when a programmer moves on to greener pastures and leaves the rest of us holding the proverbial bag!


About the author:

Jean Labrosse is founder, CEO, and president of Micrium. He is a regular speaker at the Embedded Systems Conferences and is the author of three books: MicroC/OS-II, The Real-Time Kernel, Embedded Systems Building Blocks, Complete and Ready-to-Use Modules in C and MicroC/OS-III, The Real-Time Kernel. Jean has also written numerous articles for magazines. He has an MSEE and has been designing embedded systems for many years.

Endnotes:

Long, David W. and Christopher P. Duff. "A Survey of Processes Used in the Development of Firmware for a Multiprocessor Embedded System." Hewlett-Packard Journal, October 1993, p.59-65.
Kernighan, Brian W. and Dennis M. Ritchie. The C Programming Language. Prentice Hall, Englewood Cliffs, NJ, 1988 ISBN 0-13-110362-8.

Further reading:

Labrosse, Jean J. µC/OS-III, The Real-Time Kernel. Weston, FL Micrium Press, 2009, ISBN 978-0-9823375-3-0.
Maguire, Steve. Writing solid code. Microsoft Press, Redmond, WA 1993.
McConnell, Steve. Code Complete. Microsoft Press, Redmond, WA, 1993, ISBN 1-55615-484-4.
Straker, David. C-Style, Standards and Guidelines. Prentice Hall, 1992, ISBN 0-13-116898-3.
Barr, Michael. Embedded C Coding Standard. Netrino Institute, 2008, ISBN 978-1442164826.

About Author

Comments

blog comments powered by Disqus