Combining prototyping solutions to solve hardware/software integration challenges

by Frank Schirrmeister, Cadence Design Systems , TechOnline India - December 15, 2011

System development without prototyping is too risky, and most companies demand the use of prototyping prior to silicon tapeout. There arel different types of prototypes that allow hardware/software integration and debugging. In addition, different users within a design team have potentially different needs for prototype capabilities.

Recognizing the importance of software, the electronics industry has accepted the fact that system development without prototyping is too risky, and today most companies demand the use of prototyping prior to silicon tapeout. There are, however, several different types of prototypes that allow hardware/software integration and debugging. In addition, different users within a design team have potentially different needs for prototype capabilities. Just exactly what type of prototype to choose is not always 100% clear, and the multitude of choices can make it hard for design teams to find the right combination of prototypes to support their needs.

While there are certainly differences in other application domains, let’s consider prototyping in applications for wireless communications. When comparing the block diagrams of some of the enabling systems on chip (SoCs) like the Qualcomm MSM8960 Snapdragon S4 processor, the TI OMAP4 platform, the NVIDIA Tegra 2, and ST Ericsson’s DB9500, a couple of common characteristics can be derived: All of these devices have multiple processing cores, 2D/3D graphics engines, video and audio acceleration engines, complex interconnect, and peripherals to connect to the environment. All these different components have different characteristics as they relate to processing and throughput needs, which significantly impacts the ways they can be prototyped.

Users, use models, and enabling prototyping characteristics

There are five main types of users who can benefit from prototypes, and they can be directly derived from the example design flow shown in Figure 1.



Figure 1: A chip design flow and its efforts

Figure 1 shows a classic design flow. The height of the blocks along the Y axis indicates the percentage of effort each of the tasks takes on average as measured by IBS over several example projects. For example, of the combined hardware/software (HW/SW) development effort, application software development consumes about 30% of the effort. The width of the blocks along the X axis indicates the time each of the tasks took as part of the overall project, measured as percentage of the time from RTL development to silicon tape out, which was necessary to normalize to the different project length. For instance application software development took an average 72% of the time taken to get from RTL to tape out.

The indicated classic waterfall flow starts with specification development executed by architects, followed by RTL development and verification performed by hardware verification engineers. Software development happens in two main categories: hardware-aware software development for OS porting and utility development, and application software development. While not clearly called out as an independent step, the integration of hardware and software needs to be validated by HW/SW validation engineers prior to tapeout and then on silicon once actual chip samples are available. As it becomes clear from Figure 1, starting software development as early as possible greatly contributes to schedule improvements.

Let’s compare the main use models and care-abouts that can be satisfied by the different characteristics of prototypes.

• Application software developers need a representation of the hardware as early as possible. It needs to execute as fast as possible and needs to be functionally accurate. This type of software developer would like to be as independent from the hardware as possible, and specifically does not need full timing detail. For example, detailed memory latency and bus delays are generally not of concern.

• Similarly, hardware-aware software developers also would like representations of the hardware to be available as early as possible. However, they need to see the details of the register interfaces and they expect the prototype to look exactly like the target hardware will look. Depending on their task, timing information may be required. In exchange, this type of developer is likely to compromise on speed to gain the appropriate accuracy.

• System architects care about early availability of the prototype, as they have to make decisions even before all the characteristics of the hardware are defined. They need to be able to trade off hardware versus software and make decisions about resource usage. For them, the actual functionality counts less than some of the details. For example, functionality can be abstracted into representations of the traffic it creates, but for items like the interconnect fabric and the memory architecture, very accurate models are desirable. In exchange, this type of user is willing to compromise on speed and does not typically require complete functionality as the decisions are often made at a sub-system level.

• Hardware verification engineers typically do need precise timing accuracy of the hardware, at least on a clock cycle basis for the digital domain. Depending on the scope of their verification assignment, they need to be able to model the impact of software as it interacts with the hardware. Accuracy definitely trumps speed, but the faster the prototype executes, the better the verification efficiency will be. This type of user also cares about being able to reuse testbenches once they have been developed.

• HW/SW validation engineers make sure the integration of hardware and software works as specified, and they need a balance of speed and accuracy to execute tests of significant length to pinpoint defects if they occur. This type of user especially needs to be able to connect to the environment of the chip and system to verify functionality in the system context.


Some characteristics are important to all users, but some of them are especially sensitive to some users. Cost is one of those characteristics. While all users are cost-sensitive, software developers may find that a prototype is not feasible in light of cheaper alternatives, even though the prototype may have the desired accuracy or early availability in the project flow. In addition, the extra development effort that prototypes require beyond standard development flows needs to be considered carefully and weighed against prototype benefits.

A variety of prototypes

The types of prototypes can be categorized easily by when they become available during a project. Prior to RTL development, users can choose from the following prototypes:

• Software development kits (SDKs) typically do not run the actual software binary but require re-compilation of the software. The main target users are application software developers who do not need to look into hardware details. SDKs offer the best speed but lack accuracy. The software executing on the processors, as in the SoC examples given earlier, runs natively on the host first or executes on abstraction layers like Java. Complex computation, as used in graphics and video engines, is abstracted using high-level APIs that map those functions to the capabilities of the development workstation.
• Architectural virtual platforms are mixed accuracy models that enable architecture decision making. The items in question—bus latency and contention, memory delays, etc.—are described in detail, maybe even as small portions of RTL. The rest of the system is abstracted as it may not exist yet. The main target users are system architects. Architecture virtual platforms are typically not functionally complete and they abstract environment functionality into their traffic. Specifically, the interconnect fabric of the examples given earlier will be modeled in full detail, but the analysis will be done per sub-system. Execution speed may vary greatly depending on the amount of timing accuracy, but normally will be limited to 10s to low-100s of KHz.
• Software virtual platforms run the actual binary without re-compilation at speeds close to real time—50s of MHz to 100s of MHz. Target users are software developers, both apps developers and “hardware-aware software developers.” Depending on the need of the developer, some timing of the hardware may be more accurately represented. This prototype can be also used by HW/SW validation engineers who need to see both hardware and software details. Due to the nature of “just-in-time binary translation,” the code stream of a given processor can be executed very fast natively on the host. This makes virtual prototypes great for software development, but modeling other components of the example systems—like the 3D engines—would result in significant speed degradation.


Once RTL has been developed, RTL-based prototypes offer more accuracy:

• RTL simulation: This is the standard vehicle for hardware verification engineers. Given its execution in software, it executes equally slowly—in the range of 100s of Hz—for all components in the system to be prototyped.

• Acceleration: When RTL simulation becomes too slow, acceleration allows users to bring performance to the next orders of magnitude—200KHz to 500KHz. Acceleration is a mix of software-based and hardware-based execution. Interfaces to the real world are added, but selectively.

• In-circuit emulation: Now everything hops into the emulator; testbenches are synthesizable and we get even more speed―1 to 2 MHz. Debug is great here, especially for hardware. More interfaces to the real world are added. For both in-circuit emulation and acceleration, the speed is much faster than basic RTL simulation. However, when it comes to pure software execution on a processor, transaction-level models (TLMs) of a processor on a PC will execute faster.

• FPGA-based prototyping: When RTL is pretty stable, users can utilize an even faster hardware-based execution environment. This works especially well for IP that already exists in RTL form. Real-world interfaces are now getting to even higher speeds of 10s of MHz. Similar to acceleration and in-circuit emulation, when it comes to pure software execution on a processor, TLMs of a processor on a PC will still execute faster.

Silicon-based prototypes:

• The chip from the last project can still be used, especially for apps development. It is like the SDK in the pre-RTL case. However, the latest features of the development for the new chip are not available until the appropriate drivers, OS ports, and middleware become available.

• There is an actual silicon prototype once the chip is back from fabrication. Now users can run at real speeds, with all connections, but debug becomes harder as execution control is not trivial. At that level the execution is also hard to control. Starting, stopping, and pausing execution at specific breakpoints is not as easy as in software-based execution, FPGA-based prototyping, and acceleration and emulation.

How prototypes measure up

Software developers often create SDKs. Virtual platforms are more and more becoming a by-product of the hardware design flow, but the market is still in early stages. The majority of the models are done by the IP providers and the hardware development teams. As Richard Goering pointed out in an article not too long ago, a new breed of a modeling engineer is developing.

The rest of the prototypes are clear by-products of RTL and chip development. It is important to note that for FPGA-based prototyping, acceleration, and emulation, more and more companies now have dedicated teams parallel to hardware development and verification. The modeling engineer at the system level is an extension of those dedicated teams.


Figure 2: Prototype time of availability

To understand the benefits associated with each type of prototype, it is important to summarize the actual care-abouts derived from the different users and use models:

• Time of availability during a project: When can I get it after project start?  Software virtual prototypes win here as the loosely timed transaction-level model (TLM) development effort is much lower than RTL development. Hybrid execution with a hardware-based engine alleviates re-modeling concerns for legacy IP that does not exist yet as a TLM.

• Speed: How fast does the prototype execute? Previous generation chips and actual samples are executing at actual target speed. Software virtual prototypes without timing annotation are next in line, followed by FPGA-based prototypes and in-circuit emulation and acceleration.

• Accuracy: How detailed is the hardware that is represented compared to the actual implementation? Software virtual prototypes based on TLMs, with their register accuracy, are sufficient for a number of software development tasks including driver development. However, with significant timing annotation, speed slows down so much that RTL in hardware-based prototypes often is actually faster.

• Capacity: How big can the executed design be? Here the different hardware-based execution engines differ greatly. Emulation is available in standard configurations of up to 2 Billion gates, and standard products for FPGA-based prototyping are in the range of 18 to 30 million gates, although multiple boards can be connected for higher capacity. Software-based techniques for RTL simulation and virtual prototypes are only limited by the capabilities of the executing host. Hybrid connections to software-based virtual platforms allow additional capacity extensions.

• Development cost and bring-up time: How much effort needs to be spent to build the prototype on top of the traditional development flow? Here virtual prototypes are still expensive because they are not yet part of the standard flow. Emulation is well understood and bring-up is very predictable, in the order of weeks. FPGA-based prototyping from scratch is still a much bigger effort, often taking three to six months. Significant acceleration is possible when the software front-end of emulation can be shared.

• Replication cost: How much does it cost to replicate the prototype? This is the actual cost of the execution vehicle, not counting the bring-up cost and time. Pricing for RTL simulation has been under competitive pressure and is well understood. TLM execution is in a similar price range, but the hardware-based techniques of emulation and FPGA-based prototyping require more significant capital investment and can be measured in dollars per executed gate.

• Software debug, hardware debug, and execution control: How easily can software debuggers be attached for HW/SW analysis and how easily can the execution be controlled? Debugger attachment to software-based techniques is straightforward and execution control is excellent. The lack of speed in RTL simulation makes software debug feasible only for niche applications. For hardware debug, the different hardware-based engines differentiate―hardware debug in emulation is very powerful and comparable to RTL simulation, while in FPGA-based prototyping it is very limited. Hardware insight into software-based techniques is great, but the lack of accuracy in TLMs limits what can be observed. With respect to execution control, software-based execution allows users to efficiently start and stop the design, and users can selectively run only a subset of processors enabling unique multi-core debug capabilities.

• System connections: How can the environment be included? In hardware, rate adapters enable speed conversions and a large number of connections are available as standard add-ons. RTL simulation is typically too slow to connect to the actual environment. TLM-based virtual prototypes execute fast enough and have virtual I/O to connect to real-world interfaces like USB, Ethernet, and PCI. This has become a standard feature of commercial virtual prototyping environments.

• Power analysis: Can users run power analysis on the prototype? How accurate is the power analysis? With accurate switching information at the RTL level, power consumption can be analyzed fairly accurately. Emulation adds the appropriate speed to execute long enough sequences to understand the impact of software. At the TLM level, annotation of power information allows early power-aware software development, but the results are by far not as accurate as at the RTL level.

• Environment complexity: How complex are the connections between the different engines? The more hardware and software engines are connected (like in acceleration), the complexity can become significant and hard to handle, and this needs to be weighed against the value.

Summary and examples of combinations

While it is hard to put all user needs and prototyping capabilities in context in one overview, Figure 3 shows some of them in one capability/need matrix. It also indicates why hybrid combinations are very attractive.


Figure 3: User needs vs. prototyping capabilities


Depending on the users and use models, some of the needs will be more important, but in general Figure 3 indicates a couple of combined use models:

• The combination of RTL simulation and virtual prototyping is especially attractive for verification engineers who care about speed and accuracy in combination. Software debug may be prohibitively slow on RTL simulation itself, but when key blocks including the processor can be moved into virtual prototyping, the software development advantages can be utilized and the higher speed also improves verification efficiency.

• The combination of emulation/acceleration and virtual prototyping is attractive for software developers and HW/SW validation engineers when processors, which would be limited to the execution speed of emulation or FPGA-based prototyping when mapped into hardware-based execution, can be executed on a virtual prototype. Equally, massive parallel hardware execution―as it is used in video and graphics engines central to the examples referred to above―is executed faster in hardware-based execution than in a virtual prototype. For the wireless communications devices mentioned earlier, this combination can be very advantageous, letting users call graphics functions in the virtual prototype and having them execute in emulation or FPGA-based prototyping.

• In the age of IP reuse, a lot of blocks may exist in RTL but not at the TLM level. The combination of virtual prototyping with RTL simulation, emulation/acceleration, or FPGA-based prototyping can help to lower the potentially high extra development effort when TLMs do not exist yet.


Prototyping is fast becoming a mandatory requirement for design flows. However, it is clear from the analysis above that no prototype will fit all requirements for all users. Standards like Accellera SCE-MI to connect the hardware and software-based execution domains are enabling the use of hybrid prototypes, which will allow users to reap the individual advantages of prototypes in combination with other prototyping techniques.
































blog comments powered by Disqus