When good compilers go bad, or What you see is not what you execute

by Paul Anderson and Thomas W. Reps , TechOnline India - February 03, 2010

Getting rid of the mismatch between source code and compiled machine code may mean having to debug the machine code. Here's some of the latest research on finding a tool to combat the problem.

The source-code representation of computer programs is often thought of as the supreme authoritative, precise, and unambiguous specification of what a software program does when it executes. Of course, plenty of errors can occur in source code, and static-analysis tools that find such errors can be effective in pinpointing where the problems are.

However, tools for analyzing source code have a key weakness: computers don't execute source code; they execute machine-code programs that may be generated from source code. The WYSINWYX phenomenon (What You See Is Not What You eXecute) refers to the mismatch between what the source-code description seems to indicate and what is actually executed by the processor.1 The consequence of WYSINWYX is that source-code-analysis tools are fundamentally blind to some kinds of code weaknesses--weaknesses that can only be detected by directly analyzing the machine code.

Compilers introduce WYSINWYX effects for several reasons. Sometimes they're created by machine-code optimization. Another reason is that the compiler author may have interpreted the source-language specification in an unexpected way. Some effects can even be maliciously introduced. Finally, compilers are themselves fairly complex programs and as such may have their own bugs.

In this article, we'll give describe some of these effects and their consequences with a few real-world examples. The cure for WYSINWYX is to use tools that analyze machine code directly, an approach we've taken in our research. We'll discuss some of the challenges we faced.

Incurable optimizations
A classic example of the WYSINWYX effect was found during a security review at Microsoft and reported by Michael Howard.2 When writing secure code, a good guiding principle is to limit the lifetime of sensitive data. This reduces the risk that the data can be retrieved by an attacker or leaked by accident, such as in a crash dump. A common technique is to overwrite the sensitive data with zeroes. In this example, the requirements were that the code should read a password into a buffer, use it to authorize some kind of secret operation, and then scrub it from memory. The code (slightly simplified) looked something like the code in Listing 1.

Click on image to enlarge.

The programmer's intent was sound, and few reviewers would say this code had a problem. The problem is not in the source code. When compiled with optimization, the call to memset is removed, so the function returns without scrubbing the password, and this sensitive information is left sitting on the stack. The compiler removes that function call because it uses a standard optimization technique called dead-store removal. It observes that the buffer goes out of scope at the end of the function, and the zeroes written by the call are never read, so it concludes that the entire call to memset is redundant and removes it.

Howard suggests three solutions to prevent the optimizer from removing the call:

• Touch the password after the call to memset so the data appears used,

• Replace the call to memset with something that will not get optimized away, or

• Turn off optimizations for that code.

For the first solution, he suggests adding the source code in Listing 2.

Click on image to enlarge.

Unfortunately, this fix only helps a little! Some compilers are sophisticated enough to recognize that this statement only touches a single byte of the buffer and therefore conclude that the call to memset can be transformed into code that zeros out only the first byte.3

Other functions could be called to zero out the buffer, but some suffer the same disadvantage as memset. As a consequence of finding this problem, Microsoft introduced the SecureZero Memory() function, whose implementation guarantees that the buffer will be cleared.

A second example where the optimizer unwittingly introduced a vulnerability into code that was otherwise reasonable came to light recently when it was used to demonstrate an exploitable vulnerability in the Linux kernel.4 In this case, the offending code is shown in Listing 3.

Click on image to enlarge.

In this case, the source code is not entirely innocent. It contains contradictory assumptions: if tun could in fact be NULL, then the first line will (under normal circumstances) cause a null-pointer dereference and the kernel will crash. If tun could never be null, the entire if statement is redundant because the condition will never be true. This is a kind of weakness that can be found by some advanced static-analysis tools for source code (including CodeSonar, the tool we work on).

The WYSINWYX effect arises because the compiler notices this redundancy and eliminates the if statement entirely. Although usually reasonable, removing the if statement in this case introduces the vulnerability because an attacker can memory map the NULL address to user-space and from there take control of the kernel.


Language shortcomings
The C and C++ language specifications are riddled with inconsistencies and ambiguities. Properties as seemingly fundamental as the order in which expressions should be evaluated are left unspecified. This was once thought of as a good design approach--decisions on how to resolve ambiguities would be delegated to the compiler, which would choose a resolution based on optimizing the code for speed or space. The problem is that these choices may introduce or exacerbate WYSINWYX effects.

Many programmers will already be familiar with this problem. It's fairly common for a program to work fine when compiled without optimization but for it to fail when optimization is used. The usual cause of this failure is that the bug was always there but had no harmful effect when the code was compiled without optimization. Then, with optimization turned on, the compiler made different layout or order choices and as a consequence, exposed the latent bug.

When a source-code-analysis tool is used to analyze a program, it must make its own decisions about how to resolve the ambiguities in the source. One option is to perform the analysis for all possible resolutions of the ambiguity. However, this is generally infeasible to do in practice, and many source-code-analysis tools will choose one resolution and give results based on that. A good tool will try to interpret the code in a manner as close as possible to how the compiler interprets it.

With machine code, however, all of these ambiguities disappear because the compiler has already committed to a choice.

Threats of malicious code
The previous sections described how compilers can inadvertently undermine the intent of the programmer. There are also ways in which compilers can be exploited to generate machine code that has malicious intent. The potential for this kind of attack was first discussed 25 years ago by Ken Thompson in his Turing Award lecture, Reflections on Trusting Trust.5 Thompson described how to embed a Trojan horse in the Unix login program by changing the C compiler to recognize when it was compiling the login function and generate code that would accept either the real password or a special backdoor password instead. The backdoor is thus completely missing from the source code and can only be detected by analyzing the machine code. The attack requires that the compiler itself be modified, and code in the compiler that miscompiles the login function would stick out like a sore thumb and would undoubtedly be spotted and removed. However, Thompson pointed out that the malicious code in the compiler can itself be hidden. To do so, the compiler can be modified again so that it recognizes when it is compiling itself, and then it can generate the code that subverts the compilation of the login function. Once that is done, and the compiler is used to compile itself, the malicious code is entirely embedded in the machine code for the compiler and so the source code of the compiler can be restored to its original form.

The method Thompson describes is often thought of as a clever academic trick of little risk in the real world. However, only last August a new virus was discovered in the wild that demonstrated vividly that those ideas can be used to propagate malicious code.6 This virus, named W32/Induc-A, operates by attacking the Delphi compiler. When an infected program executes, it looks for installations of the Delphi compiler, and if found, it changes the installation so that all programs subsequently compiled by the compiler contain a copy of the virus in the generated machine code. W32/Induc-A has no payload, so it does no harm to infected computers other than propagate itself; it may have been written simply to demonstrate proof-of-concept.

Compilers are programs, too
The final source of WYSINWYX issues is that compilers, assemblers, linkers, and other tools in the development toolchain are themselves complex programs, and it's common for them to have their own bugs. Often these bugs are obvious because they cause the compiler to crash, but the obvious bugs are the ones that get fixed. A bug in code generation is much less likely to be obvious, especially if it occurs only in unusual circumstances. Unfortunately for embedded systems developers, the treatment of the volatile qualifier by many compilers seems to be especially prone to code-generation bugs. A recent paper by researchers from the University of Utah reported on a study of this issue and found problems in all 13 production compilers that they studied.7 In one compiler, they reported a problem with the compilation of the function shown in Listing 4.

Click on image to enlarge.

Correctly generated machine code should read from the watchdog register, write the same value back, and then return, as follows:

    movl WATCHDOG, %eax
    movl %eax, WATCHDOG

Unfortunately, one compiler generated the following code:

reset_watchdog:     ret

That is, the optimizer essentially ignored the volatile qualifier, concluded that the assignment was redundant, and removed it. Although this particular example is so bad that it's likely to have been spotted during testing, the other examples are much more subtle and could lead to failures that are difficult to diagnose. The statistics are alarming--the percentage of test cases where the code got miscompiled ranged from 0.037% up to 18.72%!


Analysis of machine code
The only way to find WYSINWYX problems like these is to either find them through testing or by looking at the machine code. Some sophisticated source-code–analysis tools may be able to find locations in source code that might be vulnerable to a subset of WYSINWYX problems, but real problems can only be identified for sure by tools that analyze machine code directly.

One popular tool for looking at machine code is the IDA Pro interactive disassembler. This is great for those who have a good understanding of how to read assembly language programs and have time to inspect their code in detail. A different approach to finding WYSINWYX problems is to adapt advanced static-analysis tools that are available for source (such as our own CodeSonar), so that they can also analyze machine code. This poses some significant challenges as we'll describe, but these are being overcome.

Static-analysis tools typically work in two phases. The first phase creates a model of the program to be analyzed, and the second phase performs queries and traversals on that model to compute properties of interest to the user. Static-analysis tools for machine code operate the same way, but whereas the second phase can be similar to that for source, the model-extraction phase is much more difficult than for source. Some of the challenges involved are the following:

• Source code has a clear separation between the parts that are executable code and the parts that are variables. In machine code, instructions can be treated as data and vice versa. It's fundamentally undecidable to distinguish them precisely in general.

• Procedures in source code have clear boundaries, with a single entry point and easily identifiable exit points. In machine code, no such boundaries exist in general, and control can flow between parts of the code in a much less orderly manner: a procedure can have multiple entry points; procedures can overlap; control can pass from the middle of one procedure into the middle of another.

• Procedure calls are easy to identify in source code. In machine code, a small set of instructions is usually used to implement procedure calls (for example, call in the IA32 instruction set), but there are many ways to perform a procedure call that do not involve those instructions.

• High-level languages have types. In machine code, all words may be treated as integers, strings, or pointers.

• With source code, many program elements are named. In machine code, symbolic information, such as variable or procedure names, may be completely stripped from the code.

• Other high-level constructs that are straightforward to analyze in source code may be compiled down into forms that are difficult to analyze. For example, large switch statements are usually compiled into indirect jumps.

• In machine code, the executable code can be self-modifying, which is impossible in source code.

One might think that machine code that has been compiled from a high-level language by a well-behaved compiler might not exhibit some of these issues. After all, compilers use standard procedure headers, and there are standard ways to call procedures and pass in parameters. Data is stored in a special segment or on the stack or in registers, and so for a given address, isn't it easy to distinguish code from data? Also, self-modifying code is rare. However, experience demonstrates that in practice, many of these problems do occur. For example, we found that a state-of-the-art disassembler made hundreds of errors in attempting to distinguish between code and data. In any case, assumptions about what the compiler might reasonably do go out the window when malicious code is involved, as authors of such code go to great lengths to avoid detection.

Any tool that is good at analyzing machine code must be capable of dealing with such problems and generating reasonable results. In our research at GrammaTech and the University of Wisconsin, we've been working to find ways to address these issues and to make the analysis of stripped, optimized, and potentially hostile machine code feasible.

We have adopted a two-pronged approach to the problem. At Wisconsin, we've been working on developing new static-analysis techniques for analyzing machine code. These techniques are designed to analyze the code very accurately and are prepared to sacrifice performance to achieve high-quality results. This approach is best for a highly detailed analysis of small quantities of machine code, such as might be found in a critical subcomponent of a system. The work has been incorporated into a tool called CodeSurfer/x86.8 At GrammaTech, we've been working to adapt our source-code static-analysis tool (CodeSonar) to make it capable of finding bugs in machine-code programs. This approach is designed to be highly scalable, albeit at the risk of failing to identify all problems, and is better suited for analyzing large quantities of code.

The essence of the approach being taken at Wisconsin is to do the following:

Given a machine-code program E, identify the procedures, data objects, types, and libraries that it uses, and:

• for each instruction I in E and its libraries,

• for each interprocedural calling context if I, and

• for each machine register and variable V in scope at I,

statically compute an accurate over-approximation to the set of values that V may contain when I executes.

Once this is done, the results yield lots of information about how the program works, including the targets of indirect calls and jumps, values of strings, and stack heights. Queries can then be formulated to find problems, including WYSINWYX effects.

There is a chicken-and-egg problem with the approach. For several of the reasons described earlier, it may not be possible to immediately identify all of the variables within the program. The solution is to bootstrap the analysis by starting with a first approximation and to iterate rounds of analysis, obtaining better and better results with each round.

CodeSurfer/x86 is still a research project, but the results so far are promising. We used the tool to analyze a virus in which the use of telltale system functions was obfuscated by calling them indirectly. The tool was able to correctly resolve 51 out of 54 of these indirect calls.

We also extended the analysis algorithms developed for CodeSurfer/x86 to create a prototype tool, called Device-Driver Analyzer for x86 (DDA/x86), for checking properties of stripped Windows device-driver executables.9 With DDA/x86 neither source code nor symbol-table/debugging information need be available (although DDA/x86 can use information such as Windows .pdb files if it's available). DDA/x86 was able to identify some known bugs (previously discovered by source-code-based analysis tools) along with useful error traces, while having a reasonably low false-positive rate: on a corpus of 17 device-driver executables, 10 were found to pass a certain kernel-API-usage rule described by a finite-state machine (in other words, those 10 definitely had no bug), five false positives were reported, and two were found to have real bugs--for which DDA/x86 supplied feasible error traces.

Our goal has been to adapt our advanced static-analysis tool, designed originally for source code, to work on machine code, too. This will be useful not just in its own right for finding WYSINWYX problems, but because it can help make analysis of source code more accurate. Programmers rarely have all the source code for their programs. They usually link against third-party or operating-system libraries for which source code is either nonexistent or entirely unavailable. Source-code-analysis tools typically deal with this in two ways: either they treat the nonsource components as having no effect, or they have models for those components. The former option means that the analysis will be less accurate--it will miss real bugs and report more false positives.

As a consequence, models are preferred, and most tools ship with models for standard libraries (for example, the C library). If models are unavailable, users can write their own, but this can be tricky and time consuming. However, if the tool can analyze machine code as well as source code, the need for explicit models is reduced, and the result is that the tool can find more problems more effectively.

As a demonstration of the effectiveness of our approach on machine code, see the screenshots in Figures 1 and 2. Figure 1 shows a buffer overrun detected by analysis of the source code. Figure2 shows that the same error can be detected by an analysis of the machine code.

Figure 1: An error in the source code of a program caused by a misplaced parenthesis.
Click on image to enlarge.

Figure 2: The same error as shown in Figure 1 can be found by an analysis of the machine code.
Click on image to enlarge.

WYSINWYX is a real effect with potentially damaging consequences. The risk to embedded systems developers is potentially more serious, due to the wide variety of compilers used in embedded systems development, and because embedded systems programmers tend to use some of the more esoteric language features. At present, the best defense is for developers to be aware of the potential for WYSINWYX effects in their code, especially for safety or security-critical applications. Tools for manually disassembling and manually inspecting machine code are available and these should be used where appropriate. More sophisticated tools are poised to emerge that will make automated detection of these problems easier.

1. Balakrishnan, G., Reps, T., Melski, D., and Teitelbaum, T. "WYSINWYX: What You See Is Not What You eXecute," in IFIP Working Conference on Verified Software: Theories, Tools, Experiments (VSTTE). 2005. Zurich, Switzerland: Springer. www.cs.wisc.edu/wpis/papers/wysinwyx05.pdf

2. Howard, Michael. "Some Bad News and Some Good News," Microsoft Developer Network, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure10102002.asp.

3. CERT Secure Coding Standards. "MSC06-CPP. Be aware of compiler optimization when dealing with sensitive data." http://tinyurl.com/ngymuh.

4. Zdrnja, Bojan. "A new fascinating Linux kernel vulnerability," Internet Storm Center, http://isc.sans.org/diary.html?storyid=6820.

5. Thompson, Ken. "Reflections on Trusting Trust," Communications of the ACM, 1984. 27(8): pp. 761-763. Available at http://cm.bell-labs.com/who/ken/trust.html.

6. Cohen, Richard. "Compile-a-virus--W32/Induc-A" posted on the SophosLabs blog, SophosLabs Canada, www.sophos.com/blogs/sophoslabs/v/post/6117.

7. Eide, Eric and John Regehr. "Volatiles Are Miscompiled, and What to Do about It," Proceedings of the Eighth ACM and IEEE International Conference on Embedded Software (EMSOFT). 2008. Atlanta, GA.

8. Reps, T., G. Balakrishnan, T. Teitelbaum, and J. Lim. "A next-generation platform for analyzing executables," Proc.3rd Asian Symposium on Programming Languages and Systems. 2009. Tsukuba, Japan.

9. Balakrishnan, G. and T. Reps. "Analyzing Stripped Device-Driver Executables," in International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 2008. Budapest, Hungary: Springer. pp. 124-140.

Thomas Reps is president of GrammaTech and a faculty member in the Computer Sciences Department at the University of Wisconsin. He is well known for his work on static-analysis algorithms, including the invention of the attribute-grammar paradigm for incremental static-semantic analysis. Reps has received numerous awards for his work, including the ACM Doctoral Dissertation Award, a Packard Fellowship, a Humboldt Research Award, and a Guggenheim Fellowship. He is also an ACM Fellow.

Paul Anderson is vice president of engineering at GrammaTech, a spin-off of Cornell University that specializes in static analysis. He received his B.Sc. from Kings College, University of London, and a Ph.D. in computer science from City University London. He manages GrammaTech's engineering team and is the architect of the company's static-analysis tools. His research on static analysis tools and techniques has been reported in numerous articles, journal publications, book chapters, and international conferences.


blog comments powered by Disqus