In the hierarchy of systems, the processor has always played a central role. When making architectural decisions on a system, one of the first considerations is which processor to use. Once chosen and configured, a designer can go about implementing the agree-upon structure. The key question is, how do you know when you’re done?That’s where things have gotten tough. The basic expectations of what it means to have a working chip have changed dramatically with the advent of the system on a chip (SoC). And, to the chagrin of designers, those expectations have gone way up. The verification required to meet this higher standard has evolved to the point where, before silicon is delivered, software must be executed. This burns through so many clock cycles that the only way to prove with any confidence – and in a
reasonable timeframe – that software works is through emulation.
Is it done yet?
With the old way of doing things, there was a giant chasm between the hardware world and the software world. Typically, a processor would be a separate chip. That chip could be breadboarded along with other components; once ready, a software engineer could start writing code.
The processor and software design realms were so far apart that the chip designer might work in a semiconductor corporation in one of the world’s usual technology hubs, while any guy with access to an electronics store, a soldering iron, a terminal, and an oscilloscope anywhere in the world could design software for an embedded system.
The processor designer’s mandate was straightforward: the chip was deemed working when it could execute the entire instruction set in the manner prescribed by the spec and at the designated performance. Mostly. If the designer was lucky, a few errors could be documented and tolerated (only to become part of the processor legacy, with future editions having to replicate those errors).
Life isn’t so easy anymore. For the most advanced, challenging embedded systems, the processor no longer has the luxury of its own chip. Now the chip is an entire system: not only does the processor have to work, but several processors may have to work. And they may not all be the same. And the interconnect structure has to deliver, as does the memory subsystem.How does he or she know when it’s all running properly? When it can run the kinds of fundamental software that will be executed under the targeted operating conditions, at the required performance and power levels. Exactly what that software is might vary dramatically between systems. But typical tests might include:
* Booting Linux
* Demonstrating that Linux provides all of the necessary services for the applications that will be written on top of it
* Showing that critical algorithms execute efficiently. This is especially true of “bare metal” systems, which don’t use an operating system at all. The target code must be shown to operate well.
What could possibly go wrong?
A processor alone is a complicated beast. In order for everything to work right, the pipeline must execute flawlessly. It has to decode and execute each of many (or very many, in the case of CISC) instructions. It has to access memory efficiently and do a good job managing its cache.
For a modern “hyperthreaded” architecture, it has to be able to swap contexts correctly and smoothly, never missing a beat or a byte. And it has to deal with unpredictable situations, handling interrupts correctly and with the proper priorities, especially where real-time requirements are in play.
For an SoC designer, that’s only the beginning (although, if smart about it, well-proven IP will add some confidence and reduce some core testing). In addition to all of the above, the system bus must gracefully handle all its traffic, causing as little waiting as possible. The entire memory subsystem, including memory managers, controllers, and on-chip memory, must work. Any on-chip peripherals have to be proven out. And these are all required for a single-core system.
For a multicore system, each processor must be shown to work – and for the many heterogeneous systems, that means something different for each core. The designer has to be able to demonstrate that there is no congestion on the bus – or network-on-chip (NoC) – or in the memory. He or she has to demonstrate that the cache coherency schemes work. The inter-processor communication must be solid, and any hardware synchronization mechanisms have to be proven out.
And, unlike single-processor systems, some multicore systems may not act in a simple deterministic fashion: the vagaries of scheduling, priorities, affinity, interrupts, and what’s running where at any given time can make a particular operation run differently each time it’s executed. That means running tests many different times many different ways.
Figure 1: All of these elements must be proven to work before the chip can be declared ready for tape-out
While key critical directed tests are an important part of the verification plan, the only real way to prove out such a complex set of interacting system pieces is to run real software that’s computing real data.
This puts hardware designers into new, and possibly uncomfortable, territory. Hardware folks are used to testing RTL code. That might mean simulation; it might mean assertions and checkers. But fundamentally, they work in the world of hardware, not software, languages.
Executing software can mean many different things, but for systems using an operating system, simply getting something like Linux to boot is a start. Fortunately, Linux is already written, so it’s not code that the hardware designer has to write. But it does have to be tailored to the system, with memory maps and drivers that take a simple system call and cause it to do real work on real hardware.
Someone has to write those drivers and make those kernel modifications. That’s typically not the hardware designer, but it has to happen in order for the hardware designer to declare mission accomplished.
There may be other critical software functions that, if not handled smoothly, will cripple the system. These may layer on top of the operating system, they may be reflected in the operating system drivers, or they may exist with no operating system at all. Examples include:
* Video processing
* Networking and packet processing
* I/O protocols
* Data capture (from sensors or antennae, for example)
* Dedicated processing algorithms – codecs, DSP functionality, and other low-level compute-intensive functions that either have to be efficient in software or might even be accelerated in hardware.
In addition, the designer has to show that multi-threaded programs work. For some systems, the multiple “threads” are actually turned into multiple processes running independently on different cores, with or without an OS.
Packet processing is probably the best example of this: it typically consists of a pipeline of processors, each of which performs a specific piece of the packet processing functionality in turn. All of the threads or processes have to work together seamlessly, passing data around as necessary and swapping contexts seamlessly.
How hard can that be?
We’ve established that proving that a chip is “good to go” requires the execution of real software running real data. This needs to be done before the chip tapes out in order to prove that the chip won’t require millions of dollars in mask rework. By definition, at the time the testing has to be done, there is no hardware on which the software can be run. What tools are available for doing this?
At the highest level, virtual platforms are finding increasing favor as a way of testing software in advance of hardware. And, in fact, for more general-purpose applications, those not requiring intricate interaction with the hardware, virtual platforms are an effective way of getting application software going as soon as possible.
However, the strength of a virtual processor is also its weakness: it can execute software quickly because it abstracts out much of the execution detail. That’s why it can run quickly.
Such abstraction is fine for higher-level code, but for code that touches the hardware, designers can’t afford to take that shortcut. The problems they are testing for are already hand-waved away in a virtual platform; it will be completely inadequate for proving that the chip is working because it eliminates the chip’s details.
That leaves two other options: simulation and emulation. And, unlike the virtual platform, both of these fit nicely into the familiar arsenal of hardware weapons. Simulation is the most widely used of these: it already forms the backbone of the chip verification plan, augmented by other tools like formal verification that dwell in the realm of the server farm. It’s natural to think that simulation can be extended to prove out just the critical software.
But some back-of-the-envelope calculations show that this is not a viable approach. It simply takes far too long. The only way to run software in any reasonable timeframe is to use an emulator.
As an example, we can look at what it takes to boot Linux on an emulator
and then extrapolate those results to simulation – which will demonstrate conclusively why a designer would not want to wait around for a simulated version of Linux to boot (and why we’re not discussing actual simulation results).
Bear in mind that having the OS boot normally just gets the designer to the starting line: the real software that is supposed to do the real work can only start when the OS is ready on most systems. Attempting to verify such applications would simply add more run time on top of the OS boot time.
When Linux was first released some 20 years ago, it was a modest package
comprising around 10,000 lines of code. Today, Linux has grown to around 14 million lines of code (demonstrating that open source is no cure for bloat). It requires somewhere in the range of 900 million to 1 billion CPU clock cycles to boot. With an emulator running around 1 to 1.5 MHz, that translates to about 15 minutes of run time.
A simulator, by contrast, can only simulate at the rate of about a 10 Hz CPU clock. Extrapolating the 15-minute boot time to this clock rate means that it could take on the order of 250,000 hours, or 10,400 days, or 28.5 years to boot Linux on a simulator.
Simulation bottlenecks are often overcome by bringing additional servers to the rescue. But that only works for test suites where different tests can be sent to different servers. Design teams can’t take the Linux boot process and distribute it over multiple servers. Even with 500 servers at their disposal, the teams couldn’t use them to bring the Linux boot time down to a still-hefty 21 days.
Other software, particularly that used in bare-metal configurations that don’t have to deal with an operating system, may involved far less than all those millions of lines of code. But the gap between simulation and emulation is still so enormous that it’s really impractical to validate any but the most trivial software through simulation.
A hardware team must bring emulation into practice in order to be able to meet today’s higher standard of completeness. That standard means the team has to prove that the critical software can run. An emulator is the only way that a team can demonstrate that they’ve done a good and thorough job in a timeframe consistent with today’s demanding product cycles.
About the author:
Donald Cramb is director of the Consulting Services Division of EVE-USA
in San Jose, Calif., and is responsible for customer services, applications and design solutions to support specific customer requests. Previously, he was a partner at ArchSilc Design Automation, a company focused on system-level verification solutions. He graduated with a bachelor’s degree in Electrical Engineering from the University of Edinburgh.