Using simulation and emulation together to create complex SoCs

by Laurent Ducousso , TechOnline India - March 24, 2011

Two trends are conspiring to make complex modern chip development more and more difficult: the increasing amount of software content required for what are essentially monolithic embedded systems, and ever-shortening market windows, especially for consumer-oriented products.

Two trends are conspiring to make complex modern chip development more and more difficult: the increasing amount of software content required for what are essentially monolithic embedded systems, and ever-shortening market windows, especially for consumer-oriented products. Writing more software in less time is a tall order; the most straightforward way to get things out earlier is to start writing sooner. But until you can test an overall system, including the software, there’s only so much you can do.

At STMicroelectronics, we find ourselves increasingly using emulation as an important tool for testing complex SoCs with lots of software content. But this only really works as part of an integrated process that starts at the architectural modeling stage and proceeds all the way through the final implementation of hardware.

We make chips that address various wall-plugged consumer applications, notably set-top boxes and digital TV. As such, we work in a world of rapidly changing technology and consumer expectations – the actual product life can easily be shorter than the time it takes to develop the chip. Mistakes and delays can kill an entire project – and with the cost of such projects these days, wasting that kind of expense is something even a large company like us can ill afford.

So we have invested in a process that attempts to minimize rework, focusing instead on gradual refinement of early models into working silicon. The process provides an example of how TLM technology, virtual platforms, and a good emulator make this possible.

The architecture

Our chips share a general architecture over a range of applications. The specific functionality will vary, as will performance and capacity, but, by using a common structure, we can leverage the work and the tool flow from project to project.

One way of looking at our architecture is to think of it as a template for application engines of various sorts. Depending on the end equipment being targeted, it might be a 2D or 3D graphic engine, a video encoder or decoder, or any of a number of other critical functions that consumers need. In general, control is handled by software on one or more CPUs, with the data being manipulated by hardware that’s typically on the order of a million gates.

In addition to the one or more CPUs, the computing side of things has private memory for each CPU, shared memory for inter-process communication, and, if performance requires it, a cache. This structure provides a starting point from which we can implement any number of consumer-oriented functions without having to start from architectural scratch each time.

Requirements for success

The process we use to get from concept to silicon has two critical requirements: we need to be able to develop software as early as possible, and we require smooth transitions between development phases as our ideas are realized ultimately on a chip.

The first requirement arises simply because modern chips are more or less the embedded systems of yesterday implemented in a single piece of silicon, and software is picking up more and more of the operational burden. Olden-day chips did everything through dedicated logic or analog circuitry; systems-on-chip (SoCs) get around this by including one or more processors for executing software.

One huge benefit of using software instead of hardware is flexibility. As bugs need to be fixed or as additional features are desired, swapping old code for new can bring these changes about without a single mask change. In addition, a processor uses much less silicon than would be required for the sum total of the software functions if they were rendered in hardware. So, increasingly, the silicon is acting simply as a platform for executing (and maybe accelerating) software, with additional pieces for getting the necessary signals to and from the processors.

The second requirement stems from the fact that our ideas start out broad and general and ultimately end up as transistors in silicon and bits in firmware. In between those steps are transaction-level models (TLM), register transfer language (RTL), and then the nitty-gritty steps of implementing polygons with dimensions far below the wavelengths of the light used to expose the masks. At each step of the way, a relatively abstract representation of the design becomes more concrete, and each of those transitions must be managed smoothly.

Putting the process in place

We manage all of this starting with our original architectural ideas, which can be modeled at the TLM level in order to experiment with various structures and configurations. An emulator system like EVE’s ZeBu platform has a role here for pieces of hardware IP that don’t come with a TLM model. Rather than suffering the slower speed of simulating hardware, such IP can be synthesized into the emulator, where, in this case, the emulator acts as a simulation accelerator. The high-level high-speed simulation afforded by TLM can now proceed apace without being overly burdened by the blocks having no TLM model, while still including those blocks in the simulations.

At this point, software is also simulated at the TLM level, although emulation becomes important when we need to get a more detailed picture of software execution at a cycle-accurate level. This gives a better picture of both the performance and power implications of the software as it executes.

The industry-standard SCE-MI interface provides the transaction-level handshake between the simulator and the emulator. This interface can become the critical bottleneck for verification; if it takes too long for data to go back and forth between the host side and the emulator side, ultimately everyone sits around waiting for data instead of finishing the test. We’ve had good experience with the performance of the interfaces we use, having been able to achieve throughput up to 200Mbit/s corresponding to an application displaying three SD video pictures per second.

Once the architect is happy with the result, the design is handed off to a modeling team that partitions the problem and hands off the pieces to three development teams: the hardware team, the virtual platform team, and the software team. Depending on the specific design work being done, there are two such processes that may work in parallel. The first is used for new application engine development; the engine is taken from architecture through design. The second takes available application engines as well as other supporting IP and integrates them into an SoC. A new engine feeding a new SoC feeds the first flow into the second.

                             

Figure 1: The architect pushes a system model down to the development teams; the hardware team continually refreshes the model. The virtual platform team has to work closely with the software team, but, unusually, the software and hardware teams also work closely together.

The hardware team takes the TLM models of the various modules that need to be designed and starts working on concrete implementations. Block by block, they turn abstract TLM definitions into RTL representations. They then return the RTL blocks to the modeling team, who do weekly integrations that replace the original TLM models in the overall system architecture with the resulting RTL blocks at the IP or application engine level.

The actually use two levels of TLM. The first is behavioral, and simply acts as a black box with a well-defined hardware register map or firmware API when doing architectural modeling. The second is structural, and is developed in parallel with the actual RTL by using IP-XACT to manage the use of TLM or RTL as well as high-level synthesis of RTL from a more abstract form.

As RTL replaces TLM at the IP or application engine level, the emulator becomes critical for verifying the overall system faster than can be done with simulation. Instead of slowing down the overall system verification by simulating the hardware blocks, TLM models once exercised on the host side can be pushed into the emulator as an actual hardware implementation of the logic. As the hardware developers submit each RTL representation of a piece of IP or engine to the modeling team, the modeling team can synthesize the RTL into the emulator, replacing the corresponding host-side TLM model.

With the size of hardware circuitry rapidly increasing, the capacity of the emulator – and, in particular, the ability of the emulator to perform at high speeds even when highly loaded – becomes critical. We have loaded up an emulator with as many as 10M gates, and, even with that, have achieved system clock speeds of 25 MHz. For more typical designs, where we use on the order of a million gates, we are able to maintain clock speeds as high as 5-10 MHz.

While the drivers are still maturing, we can maintain higher simulation performance by only including one application engine at a time in the emulator, swapping in whichever one is to be tested at the moment. Only once all of the drivers have settled down will we try to include all of the engines at the same time.

Meanwhile, the virtual platform team starts to implement the computing structures needed for testing software. That part of the process has to be completed quickly so that the software team can start testing their wares as soon as possible.

Software development generally remains on a virtual platform. By keeping things at an abstract level, the software can be verified very quickly, interacting via TLM with the hardware in the emulator. Other benefits of a host-side virtual platform are its availability early in the process, its low cost, and a familiar PC-based environment for coding.


                              
                   

Figure 2: Development starts with most models in the host, and gradually pushes hardware and the system down to the emulator, where the specific implementation can be validated at high speed and while executing real software.

At this point, the software and hardware teams take on the bulk of the burden for the remainder of the project, turning their modules over to the modeling team for system integration and continued testing.

At the very end, if the full system is to be tested before committing to silicon, even the virtual platform can be synthesized into the emulator. Now the same software can be run against a full hardware model of the IC, with I/O being piped in from real interfaces if desired. The only thing that will remain in TLM will be the host CPU, since the simulation system can run as high as 3 GHz, providing far faster simulation execution than would be possible in an emulator clocking in the range of 10s of MHz.

A bonus benefit

One of the interesting side effects of our process has to do with the way our hardware and software engineers interface. The traditional model is that a system architect will create a partition between hardware and software based largely on experience, increasingly aided by high-level modeling tools, and ultimately assisted by some best guesses.

The hardware and software jobs are traditionally parceled out to hardware and software teams, respectively, which go off and duly execute their tasks. Some months later the results are integrated by a system integrator, and hopefully they work. All too often they don’t, and iterating at this point is a long and painful task.

One of the main problems with this traditional flow is the fact that hardware and software teams don’t talk to each other. They operate within their silos, often in different parts of the world, and if it turns out that functionality could have been partitioned more efficiently, they don’t learn that until final integration.

What we have managed to achieve through our process is that the hardware and software teams do in fact interact throughout development, dramatically reducing the likelihood of a late-game reset.

The net effect

At STMicroelectronics, we have put together a process that takes a general architecture that can be replicated from project to project, and we have combined it with a process that allows the evolution of design concepts into concrete implementation through a flow that manages to hand off each stage of the design to the next with a minimum of thrash and rework. From the very beginning, TLM forms a critical part of the backbone holding this all in place. Good simulators and a virtual platform on the host side coupled with a good emulator that handles TLM well provide the critical infrastructure for realizing a robust, well-verified RTL representation of the design.

Running on top of this hardware is software, which is increasingly taking center stage. This software can be developed early, with repeated verification both as the software evolves and as the abstract models for hardware are replaced by the finished hardware design. Hardware and software engineers communicating together provide the finishing flourish that’s critical for ensuring that complex designs can be put in the hands of demanding consumers as quickly as possible.

About the author:

Laurent Ducousso is IP Verification Manager, Home Entertainment Group (HEG) R&D, at STMicroelectronics, Grenoble, France.

Ducousso has managed the Home Entertainment Group Verification Team since 2000. He has 20 years of experience in Digital Design and Verification. He joined STMicroelectronics in 1994 as a verification expert, and has since worked on CPU, microcontroller, and DSP projects.

Prior to STMicroelectronics, he contributed to CPU mainframe development at Bull S.A. for eight years. Laurent Ducousso holds a Ph.D. in computer sciences from Paris, France.


About Author

Comments

blog comments powered by Disqus