The topic of coding standards is an emotive one among software
developers, whose divergent opinions raise questions that range from
"Why do we need such restrictions?" to "How could we possibly operate
without them?"
Software engineering has always wrestled with standards, and the
development of the C and C++ languages brought the issue into even
sharper focus. These flexible and powerful languages are now deeply
rooted in industrial and embedded environments. In the past decade,
developers have accepted the need to control and restrict these
languages for industrial, commercial, or other safety-conscious
purposes.
Many of the early attempts to define coding standards focused on
style rather than safety and reliability. However, recent collective
efforts such as the Motor Industry Software Reliability Association
(MISRA) C and C++ guidelines target bug detection, avoidance, and
prevention.
The primary intent behind these modern coding standards is to
prevent software misbehavior. Software languages generally contain
features that are rich beyond the needs of most software practitioners.
Developers are not expected to be experts in the full-language feature
set, and coding rules help protect them from language danger or misuse.
Undefined or bug?
Language dangers cover a broad range of issues involving language
specification and misuse. All languages have undefined outcomes from
unplanned usage.
Unfortunately, despite its flexibility and suitability to embedded
applications, the C language contains many of these issues. C++ has
inherited most of these issues, thanks to its full compatibility with
C.
Well-recognized instances of undefined behavior include
dereferencing a null pointer, dividing by zero in an expression, and a
function returning a handle to nonstatic local data. In C++, this
particular danger extends to returns of function parameters:
class A {...}
const A& Bad(const A& a)
{
return a;
// returns ref to
alias:
// undefined if it
exceeds
// aliased object
lifetime
}
It is not uncommon for a C/C++ coding standard to include a blanket
rule referencing adherence to the language standard and avoidance of
any undefined behavior. More specific guidelines, such as testing a
pointer's non-null property before each attempted dereference or
programmatically ensuring that a divisor cannot be zero, provide more
focused approaches.
A special subclass of language definition involves situations where
compiler vendors choose between several behaviors. Does a left-shift
protect the leftmost sign bit? How will a larger integer value be
represented when cast to an 8-bit character type? Are plain char and
bit field types signed or unsigned?
While these behaviors can be established from compiler documentation
or configuration settings, a safe practice coding rule would prescribe
that such outcomes be documented and restricted to only those that are
deterministic. The richest seam of buggy code is thus within the
developer's control. This includes a wide variety of coding errors
often brought about by suspect coding assumptions or lack of foresight.
Data
initialization. Data initialization is a sensible and safe
practice that is especially important if developers don't take full
advantage of C++'s member initialization semantics:
class A
{
public:
A (); // 'm_i' not init'ed
int getI() const
{ return m_i; }
private:
int m_i;
};
int j = A().getI();
// 'm_i' and subsequently j
// have indeterminate value
To avoid these initialization issues, the coding rule states that
constructors shall initialize (either through initial value or
constructor call) each base class and all nonstatic data members.
Name hiding.
Reusing a variable name in a different scope is a particularly
difficult bug. The identifier at the innermost scope hides any matching
name in an outer scope, whether intended or not. A coding rule stating
that "an identifier in an inner scope shall not hide an outer scope
identifier" guards against this. Consider this for-loop example:
void foo(void)
{
int i = 15;
int MyArray[10];
for (int i=0; i<10; ++i)
{
MyArray[i] = 0;
};
...
// whatever intended ..
MyArray[i-1] = 1;
} // ..out-of-bounds results
C++ encourages developers to declare the control variable (i) in the
loop statement, thus restricting its scope to that block. This example
could be legacy C or an early version of C++.
Boolean
expressions. With no Boolean type in the most popular version of
C (ISO 1990), developers have to work with quasi-Boolean concepts. The
resulting lack of type safety can lead to some subtle and pernicious
bugs:
x = ((a > b) & (c > d));
/* logical rather than
*/
/* bitwise AND intended? */
y = ((a + b) || (c - d));
/* odd: logical OR of two
*/
/* arithmetic expressions
*/
These can be neatly avoided with a coding rule to "prohibit the
mixing of arithmetic and logical (effectively Boolean) expressions."
Assign in
conditional. An assignment in a conditional expression, while
legal, can expose a typing error or a more complex logic issue:
// assign or test equality?
if (y=x) {...}
// conditional side-effect
if ((a == b) || (c = d)) {...}
All such behaviors can be elegantly avoided through a coding rule
"prohibiting assignment operators in effectively Boolean expressions."
Type
conversions. The type system in C has great flexibility in
handling conversions. While this enables powerful data manipulation in
expressions, it often betrays poor understanding of the underlying
compiler actions and occasionally reveals difficult, value-sensitive
bugs:
// unsigned 16 and 32 bits
uint16_t u16a = 40000;
uint16_t u16b = 30000;
uint32_t u32a;
// result: 70000 or 4464?
u32a = u16a + u16b;
C's balancing and promotion rules might result in either of these
values, depending on how these two types are defined. An integer size
of 16 bits will likely cause an erroneous result even if the result
size is set to a 32-bit type.
Switch
statements. Conversion issues can lurk in places developers
don't expect to be a problem. For example, switch statements are an
elegant control-flow mechanism. However, they are not without danger.
The switch and case expressions must be of the same type and have the
same sign; otherwise, developers might suffer unwanted implicit
conversions:
unsigned char c;
...
switch ( c ) {
case -1:
... /* unreachable
*/
case 256:
... /* unreachable
*/
A general coding guideline that "there shall be no unreachable code"
provides a degree of protection. A rule guarding against implicit
conversions is a more targeted means of avoiding this issue.
Casting away
const. According to ISO C, a pointer can only be assigned to
another pointer if "both operands are pointers to qualified or
unqualified versions of compatible types, and the type pointed to by
the left has all of the qualifiers of the type pointed to by the
right."
In the example, the pointer assignment fails for this constraint
reason. With an appropriate cast, such an assignment will succeed,
although it will be highly dangerous:
// pointer to int:
int *pi;
// pointer to const int:
const int *pci;
...
// constraint error
pi = pci;
// dangerous but permitted
pi = (int *)pci;
Suitable protection is encapsulated in the rule expressing that "no
cast shall be allowed that removes const or volatile qualification from
the type addressed by a pointer."
Reliable code protection
As can be seen from these code examples, the C and C++ languages
benefit from defense against misuse of their features. These languages'
well-publicized and documented undefined behaviors, including null
pointer dereference, divide by zero, and array bounds exception, are
central to language protection.
However, many other types of vulnerabilities require deep
understanding of language syntax and semantics. Examining source code
for a wide range of code defects and implementing coding best
practices, preferably through an automated tool method, are equally
important to achieve a high-quality and robust code base.
Fergus Bolger is CTO at PRQA, based in
Hersham, Surrey, the United Kingdom. With nearly 30 years of experience
in the hardware and software computing industry, Fergus has filled
management and engineering roles in development and advanced testing at
PRQA and Amdahl Corporation. He has extensive experience with
mainframe, client-server, UNIX, and Windows platforms with special
interest in software process, automated tooling, and maintainable
software systems. Fergus earned his Master's in Engineering from
University College Dublin.