The best coding standards eliminate bugs

by Fergus Bolger , TechOnline India - July 14, 2009

Fergus Bolger provides a brief tutorial of the Motor Industry Software Reliability Association (MISRA) C and C++ guidelines for bug detection, avoidance, and prevention.

The topic of coding standards is an emotive one among software developers, whose divergent opinions raise questions that range from "Why do we need such restrictions?" to "How could we possibly operate without them?"

Software engineering has always wrestled with standards, and the development of the C and C++ languages brought the issue into even sharper focus. These flexible and powerful languages are now deeply rooted in industrial and embedded environments. In the past decade, developers have accepted the need to control and restrict these languages for industrial, commercial, or other safety-conscious purposes.

Many of the early attempts to define coding standards focused on style rather than safety and reliability. However, recent collective efforts such as the Motor Industry Software Reliability Association (MISRA) C and C++ guidelines target bug detection, avoidance, and prevention.

The primary intent behind these modern coding standards is to prevent software misbehavior. Software languages generally contain features that are rich beyond the needs of most software practitioners. Developers are not expected to be experts in the full-language feature set, and coding rules help protect them from language danger or misuse.

Undefined or bug?
Language dangers cover a broad range of issues involving language specification and misuse. All languages have undefined outcomes from unplanned usage.

Unfortunately, despite its flexibility and suitability to embedded applications, the C language contains many of these issues. C++ has inherited most of these issues, thanks to its full compatibility with C.

Well-recognized instances of undefined behavior include dereferencing a null pointer, dividing by zero in an expression, and a function returning a handle to nonstatic local data. In C++, this particular danger extends to returns of function parameters:

class A {...}

const A& Bad(const A& a)
{
    return a;
    // returns ref to alias:
    // undefined if it exceeds
    // aliased object lifetime
}

It is not uncommon for a C/C++ coding standard to include a blanket rule referencing adherence to the language standard and avoidance of any undefined behavior. More specific guidelines, such as testing a pointer's non-null property before each attempted dereference or programmatically ensuring that a divisor cannot be zero, provide more focused approaches.

A special subclass of language definition involves situations where compiler vendors choose between several behaviors. Does a left-shift protect the leftmost sign bit? How will a larger integer value be represented when cast to an 8-bit character type? Are plain char and bit field types signed or unsigned?

While these behaviors can be established from compiler documentation or configuration settings, a safe practice coding rule would prescribe that such outcomes be documented and restricted to only those that are deterministic. The richest seam of buggy code is thus within the developer's control. This includes a wide variety of coding errors often brought about by suspect coding assumptions or lack of foresight.

Data initialization. Data initialization is a sensible and safe practice that is especially important if developers don't take full advantage of C++'s member initialization semantics:

class A
{ public:
A (); // 'm_i' not init'ed
int getI() const
    { return m_i; }
private:
int m_i;
};

int j = A().getI();
// 'm_i' and subsequently j
// have indeterminate value

To avoid these initialization issues, the coding rule states that constructors shall initialize (either through initial value or constructor call) each base class and all nonstatic data members.

Name hiding. Reusing a variable name in a different scope is a particularly difficult bug. The identifier at the innermost scope hides any matching name in an outer scope, whether intended or not. A coding rule stating that "an identifier in an inner scope shall not hide an outer scope identifier" guards against this. Consider this for-loop example:

void foo(void)
{
    int i = 15;
    int MyArray[10];

for (int i=0; i<10; ++i)
{
    MyArray[i] = 0;
};
...
// whatever intended ..
  MyArray[i-1] = 1;
} // ..out-of-bounds results

C++ encourages developers to declare the control variable (i) in the loop statement, thus restricting its scope to that block. This example could be legacy C or an early version of C++.

Boolean expressions. With no Boolean type in the most popular version of C (ISO 1990), developers have to work with quasi-Boolean concepts. The resulting lack of type safety can lead to some subtle and pernicious bugs:

x = ((a > b) & (c > d));
/* logical rather than             */
/* bitwise AND intended?     */

y = ((a + b) || (c - d));
/* odd: logical OR of two      */
/* arithmetic expressions       */

These can be neatly avoided with a coding rule to "prohibit the mixing of arithmetic and logical (effectively Boolean) expressions."

Assign in conditional. An assignment in a conditional expression, while legal, can expose a typing error or a more complex logic issue:

// assign or test equality?
if (y=x) {...}

// conditional side-effect
if ((a == b) || (c = d)) {...}

All such behaviors can be elegantly avoided through a coding rule "prohibiting assignment operators in effectively Boolean expressions."

Type conversions. The type system in C has great flexibility in handling conversions. While this enables powerful data manipulation in expressions, it often betrays poor understanding of the underlying compiler actions and occasionally reveals difficult, value-sensitive bugs:

// unsigned 16 and 32 bits
uint16_t u16a = 40000;
uint16_t u16b = 30000;
uint32_t u32a;

// result: 70000 or 4464?
u32a = u16a + u16b;

C's balancing and promotion rules might result in either of these values, depending on how these two types are defined. An integer size of 16 bits will likely cause an erroneous result even if the result size is set to a 32-bit type.

Switch statements. Conversion issues can lurk in places developers don't expect to be a problem. For example, switch statements are an elegant control-flow mechanism. However, they are not without danger. The switch and case expressions must be of the same type and have the same sign; otherwise, developers might suffer unwanted implicit conversions:

unsigned char c;
...
switch ( c ) {
    case -1:
    ... /* unreachable */
    case 256:
    ... /* unreachable */

A general coding guideline that "there shall be no unreachable code" provides a degree of protection. A rule guarding against implicit conversions is a more targeted means of avoiding this issue.

Casting away const. According to ISO C, a pointer can only be assigned to another pointer if "both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all of the qualifiers of the type pointed to by the right."

In the example, the pointer assignment fails for this constraint reason. With an appropriate cast, such an assignment will succeed, although it will be highly dangerous:

// pointer to int:
    int *pi;
// pointer to const int:
    const int *pci;
    ...
// constraint error
    pi = pci;
// dangerous but permitted
    pi = (int *)pci;

Suitable protection is encapsulated in the rule expressing that "no cast shall be allowed that removes const or volatile qualification from the type addressed by a pointer."

Reliable code protection
As can be seen from these code examples, the C and C++ languages benefit from defense against misuse of their features. These languages' well-publicized and documented undefined behaviors, including null pointer dereference, divide by zero, and array bounds exception, are central to language protection.

However, many other types of vulnerabilities require deep understanding of language syntax and semantics. Examining source code for a wide range of code defects and implementing coding best practices, preferably through an automated tool method, are equally important to achieve a high-quality and robust code base.

Fergus Bolger is CTO at PRQA, based in Hersham, Surrey, the United Kingdom. With nearly 30 years of experience in the hardware and software computing industry, Fergus has filled management and engineering roles in development and advanced testing at PRQA and Amdahl Corporation. He has extensive experience with mainframe, client-server, UNIX, and Windows platforms with special interest in software process, automated tooling, and maintainable software systems. Fergus earned his Master's in Engineering from University College Dublin.

About Author

Comments

blog comments powered by Disqus