Making late-night debugging an exception rather than the norm

by Rutul Dave , TechOnline India - November 03, 2011

Even after diligently testing and verifying every code change and every release, embedded systems software will have bugs that will require manual debugging efforts. However, by taking advantage of modern static analysis and techniques that provide value beyond a simple list of defects, one can make the late night the exception rather than the norm.

It's close to midnight and after hours of debugging you've finally identified the root cause of a defect. It's a nasty null pointer dereference that gets triggered after various conditional checks, and it's buried deep inside a code component that has not been touched in a while. The challenges of debugging pale in comparison with the fact that you still have a long road ahead in checking whether the bug exists in three other branches, merging the fix, and then unit testing the changes in all four branches to make sure you didn't break anything else, especially when you changed something in the legacy code component. Think about how many times you might have been in a similar situation right before code freeze for a major release or the night before a hot-fix is scheduled to go out?

Static analysis can help you avoid some of the late nights. In this article, I discuss the advantages of static analysis for finding and fixing the most common coding defects, the Agile programming techniques used in modern static analysis to identify precise defects that lead to actual crashes, and the technologies that enhance the analysis results, beyond just a list of defects, by providing valuable information such as where the defect exists in the different branches of code.

Along with other methods of testing and verification, many companies have taken advantage of the benefits of code testing with modern static analysis to identify defects early in development. During the past few years, various reports by embedded systems market research firm VDC Research indicate strong growth in companies adopting static analysis as a critical test automation tool. The immense growth in the size of code bases is one of the strongest reasons to use static analysis as a cost-effective and automated method to evaluate the quality of the code and eradicate common coding defects. In a survey done by VDC ("Automated Test & Verification Tools, Volume 2," January 2011, www.vdcresearch.com/market_research/embedded_software/product_detail.aspx?productid=2639), software engineers who use static analysis indicated that these tools reduced the number of defects and increased the overall quality of code. In addition, VDC cited efficiency as a major benefit (and return on investment).

Dataflow analysis

One powerful static-analysis technique is dataflow analysis. To find the defect in the Listing 1, modern static-analysis tools use dataflow analysis to identify the execution path during compile time.

 

First, a control flow graph is generated from the source code. In this case, the if statements could have four possible execution paths through the code. Let's follow one of those paths. When the value of x passed into the function is not zero, p is assigned a null pointer with p=0. Then, the next conditional check (x!=0) takes a true branch and in the next line p is dereferenced, leading to a null pointer dereference.

Interprocedural analysis

In addition to dataflow analysis, another useful technique that good static analysis employs is interprocedural analysis for finding defects across function and method boundaries, as in Listing 2.

 

 

In Listing 2, we have three functions: example_leak(), create_S(), and zero_alloc(). To analyze the code and identify the memory leak, the analysis engine has to trace the execution to understand that memory is allocated in zero_alloc(), initialized in create_S(), and leaked when variable tmp goes out of scope when we return from function example_leak(). This is known as interprocedural analysis.

False-path pruning

The third technique is called false-path pruning. One of the key quality-assurance requirements that developers are held accountable for is that the reported bugs are real. In other words, the bugs are true problems. This expectation is the same from static analysis--it should report critical defects and not false positives. One way to ensure that the reported defects are real is to analyze only the executable paths. Naïve static analysis will usually find defects on paths that can never be executed because of data dependencies. We can understand this with the code sample illustrated in Listing 3.

 

This example is slightly modified from Listing 1 discussed previously. In this case, we will look at an execution path that simply cannot be executed. Consider the case where the first conditional check (if (x != 0)) results in the false case being evaluated. This will assign variable p the value of 0. At the next conditional check, if the analysis engine looks at the true path, it will report a null pointer dereference defect. But that would be a false positive because the execution logic will never traverse this path. It is not possible to evaluate the same conditional check (if (x != 0)) in two different ways. By pruning a path that can never be executed (a false path), good analysis can report up to 50% fewer incorrect defects. This results in higher trust in the analysis reports and allows the development team to focus on the true positives instead of having to muddle through a long list of false positives.

Using a combination of techniques such as dataflow analysis, interprocedural analysis, and false-path pruning, effective static analysis has made a case for being an extremely valuable tool for developers. It's automated, achieves 100% path coverage, and does not require time intensive test cases to be written. We saw examples of a null pointer dereference and a memory leak. In addition, the analysis is able to identify other critical defects such as memory corruptions caused by incorrect integration operations, misused pointers, other resource leaks besides memory, invalid memory accesses, undefined behavior due to uninitialized value usage, and many more.

Why, what, and where?

In addition to analysis results, one of the major benefits of static analysis is to provide the developer answers to questions important in effective Agile development such as:

• Why does the defect exist?
• What impact will it have?
• Where does it need to be fixed?

To understand the context for a defect and validate it as a true defect, the developer needs to understand why the defect exists. A defect in code exists because the execution path went through a series of events and conditional statements that led to the error. In Listing 1 discussed earlier, defining the character pointer p is an event. The two if statements that checked the value of the x variable are conditional statements. By identifying the true or false path taken through those conditional checks, we could trace the execution path, which will show us that we dereferenced a null pointer *d, which is the defect.

Similarly, experienced software engineers are able to associate the impact that a null pointer dereference or a memory leak can have on the system running the software. However, identifying the impact that the defect has on the different branches that have been forked from the same code base is not always a straightforward task. Therefore, the answer to "what impact will a defect have?" sometimes can be more complex. Consider a team developing a new operating system for mobile smartphones. Because multiple mobile phone vendors (OEMs) need to be supported for this new operating system, every vendor in the source control management (SCM) system is assigned a development branch that has been forked from the same code base. Add each vendor's need for multiple branches for the different releases and product generations and this picture starts to get complex very quickly.

Static analysis performed on every branch produces a list of the critical defects. The development team can go over every identified defect and verify the reason why it exists. However, depending on when a defect is introduced, it could exist in all versions and branches or a subset. When looking at a single defect in isolation in a single branch, it's tough to gauge the severity of the defect without knowing where else it is present. A defect that is not limited to a single version or one OEM client might be considered more severe and fixing it would need to be prioritized over others. Figure 1 shows a duplicate defect duplicate caused by code branching and merging.

 

 

Finally, to answer the question of Where does a defect need to be fixed?, a developer writing the fix needs to know exactly which branches need to be checked. Analysis results that identify the various branches where a defect exists is highly valuable and can save hours of manual verification.

Another common case that embedded software engineers encounter is when code is designed to run on multiple platforms. Device drivers are typical examples of such software components. Listing 4 is a simple example based on code required to be compiled for 32- and 64-bit platforms.

The advantage of a good static-analysis solution in such cases is it can identify and report a defect in foo.c that gets triggered in both 32- and 64-bit binaries as a single defect. In this case, the code is not duplicated, but instead built in multiple ways. Hence the developer needs to evaluate the severity by understanding if it's going to get triggered in both 32-bit and 64-bit binaries.

Shared code

Another interesting case is when common code components are shared and used in more than one product. Take the example of a team developing the platform software for a family of networking switches. Since the functionality provided by the platform software must be implemented in all products in the product family, this code component will be shared as shown in Figure 2.

 

For developers working on this team, the best assessment of the severity of a defect reported by static analysis is not only the impact it will have on one switch product, but also information on all the products that use this platform software component. A product is usually created by combining many such shared components. Each component is not only a project itself, but also a part of various other projects using it. Thus the analysis result needs to identify that a defect in this shared component has an impact on the various projects using it. Such cases are especially valid when using open-source components shared among various projects and products. A library that parses a specific type of a network packet might be used in all the different networking products that the group is designing and developing.

Code branching is a critical aspect of developing software for embedded systems. So is compiling the codebase for multiple platforms and reusing a component in multiple projects and products. With static analysis being valued for its ability to find hard-to-detect critical defects due to common programming errors and being trusted for its ability to do so without a large number of false positives, the trend in adoption of such solutions is going to continue. And with the value in terms of efficiency and productivity that the analysis results provide, it might not be long before static-analysis implementations will be as common as an SCM system or a bug tracking system in the development workflow.

Unfortunately, heroic late night debugging marathons might still be very necessary. Even after diligently testing and verifying every code change and every release, embedded systems software will have bugs that will require manual debugging efforts. However, by taking advantage of modern static analysis and techniques that provide value beyond a simple list of defects, one can make the late night the exception rather than the norm.

About the author:

Rutul Dave is a senior product manager at Coverity, where he creates tools and technology to enhance the software development process. He received his masters in computer science with a focus on networking and communications systems from University of Southern California. He has worked at Cisco Systems and at various Bay Area-Silicon Valley startups, such as Procket Networks and Topspin Communications.

About Author

Comments

blog comments powered by Disqus