Verification complexity coupled with short time to market windows and scarce engineering resources make the need for fast simulation run times increasingly critical. Already, functional verification consumes around 60 or even 70 percent of the overall design cycle. As electronic systems continue to become larger and more complex, verification productivity lags behind, creating a crippling bottleneck. For this reason, electronic systems companies demand dramatic improvements in verification productivity. Faster testbenches are needed that enable longer and more test cases to be run in less time, allowing more design requirements to be covered and more bugs uncovered.
A game changing technology has evolved to meet this urgent demand: co-emulation. Co-emulation employs a transaction-based methodology for writing testbenches that can be used not only for software simulation but also for hardware-assisted acceleration. Thus, adopters enjoy the productivity boost that verification methodologies like OVM or UVM have already delivered for developing testbenches, together with a dramatic speed up of testbench execution through emulation. For instance, think of viewing a full frame of graphics in a matter of minutes instead of a day of simulation.
Hardware-assisted testbench acceleration
With today’s advanced transaction-level testbenches, the best approach to hardware-assisted speed-up in testbench execution is to have certain testbench components — the lower pin-level components like drivers and monitors — synthesized into real hardware and running inside the emulator together with the DUT, while other non-synthesizable testbench components — the higher transaction-level components like generators, scoreboards, and coverage collectors — remain in software running inside the simulator. Communication between simulator and emulator is consequently transaction-based, not cycle-based, reducing communication overhead and increasing performance because data exchange is infrequent and information rich and high frequency pin activity is confined to run at full emulator clock rates.
The goal of this approach is to boost run-time performance while maximizing reuse between pure simulation-based verification and hardware-assisted acceleration. Co-emulation enables truly single source, transaction-level testbenches to work interchangeably for both simulation and acceleration. In acceleration mode substantial run-time improvements can be achieved while retaining all simulator verification capabilities and integrations, including in particular support for modern coverage- driven, constrained-random, and assertion-based techniques, as well as prevalent verification methodologies such as UVM.
Several requirements are at play when devising a transaction-based acceleration methodology. Firstly, it must adhere to the principles of co-emulation which implies the need to partition a testbench into synthesizable HDL side and a distinct HVL side handled by separate tools running on two different physical devices — i.e., emulator and workstation — and interacting at the transaction-level. The HDL side, then, must bear the limitations of modern day synthesis technology, and the communication with the HVL side must be fast and efficient so as to minimize impact on raw emulator performance.
Today’s transaction-based testbenches, like UVM testbenches, have a layered foundation that already exhibits a separation between timed and untimed (or partially timed) aspects of the testbench. As illustrated in Figure 1, a transactor layer forms the bridge between the cycle-accurate signal level of abstraction near the DUT and the transaction level of abstraction in the rest of the testbench. A co-emulation flow enforces this separation and requires that the transactor layer components are included on the HDL side to run alongside the DUT on the emulator. It further requires that the HDL and HVL sides are completely separated hierarchies with no cross module or signal references, and with the code on the HVL side strictly untimed.
This means that the HVL side cannot include any explicit time advancement statements, which may occur only on the HDL side. Abstract event synchronizations and waits for abstract events are permitted on the untimed HVL side, and it is still time aware in the sense that the current time as communicated with every context switch from HDL to HVL side can be read. As a result of the HVL-HDL partitioning, performance can be maximized because testbench and communication overhead is reduced and all intensive pin wiggling is targeted to run at emulation speeds.
Another important requirement is that the methodology yields single-source testbenches for both simulation and acceleration, as already indicated earlier. This means that the HVL-HDL partitioning must function the same whether it is in co-emulation or in simulation alone, yet without the use of hooks like compile-time or run-time switches that would disable entire branches of code and pretty well implement two separate code bases. It also implies that the benefits of using a high-level verification language like SystemVerilog and verification methodologies like UVM for creating modular, reusable verification components and testbenches must be preserved along with associated simulator capabilities for analysis and debug.
The key to achieving that proves to be the application of what is known in the object oriented world as a remote proxy design pattern. In this design pattern access to a remote object (e.g., a component on the HDL side) is controlled by a surrogate in the application domain (e.g., a component on the HVL side) through some indirect reference to uniquely access the remote object. Figure 2 illustrates this, where driver, responder, and monitor components act as proxies on the HVL side for the real transactors on the HDL side implementing synthesizable driver, responder, and monitor BFMs, respectively. Communication between each transactor and its proxy occurs through a remote procedure invocation mechanism using BFM-like task and function calls, which represent transactions. This modeling practice, in effect, enables a testbench acceleration methodology that is orthogonal to the verification methodology used (e.g., UVM).
Figure 2: Transaction-based testbench with transactor/BFM proxies
In summary, a hardware-assisted testbench acceleration methodology can be defined in terms of three high-level steps.
- Employ two distinct HVL and HDL top level module hierarchies;
- Identify the timed testbench portions and model for synthesis under the HDL top level hierarchy;
- Implement a transaction-level interface between the HVL and HDL top level hierarchies.
HVL and HDL Top Level Modules
As the conventional single top-level testbench architecture is not suited for co-emulation, the first step is to rearrange and create dual HVL and HDL top level module hierarchies. This is conceptually quite simple, as shown in Figure 3. The HDL side must be synthesizable and should contain essentially all clock synchronous code, namely the RTL DUT, clock and reset generators, and the BFM code for driving and sampling DUT interface signals. The HVL side should contain all other (untimed) testbench code including the various transaction-level testbench generation and analysis components and proxies for the HDL transactors.
Figure 3: Separated HVL and HDL top level module hierarchies
This modeling paradigm is facilitated by virtue of advancements made in synthesis technology across multiple tools. For example, Mentor Graphics’ Veloce TBX acceleration solution provides technology that can synthesize not only SystemVerilog RTL but also implicit FSMs, initial and final blocks, named events and wait statements, import and export DPI-C functions and tasks, system tasks, memory arrays, behavioral clock and reset specifications along with variable clock delays, assertions, and more. All supported constructs can be mapped on a hardware accelerator, thereby offering the maximum HDL modeling flexibility, and all models synthesized with Veloce TBX run at full emulator clock rate for high performance. Moreover, they can be simulated natively on any IEEE 1800 SystemVerilog compliant simulator. This synthesis advancement was a precursor to the SCE-MI 2 standard developed within Accellera to enable effective development of “emulation-friendly” transactors.
Beyond hardware-assisted acceleration, there are other good reasons to adopt a dual, top-level testbench architecture. For instance, it can facilitate the use of multi-processor platforms for simulation, the use of compile and run-time optimization techniques, or the application of good software engineering practices for the creation of highly portable, configurable VIP.
Forming the bridge between the timed signal level and untimed transaction level of abstraction, transactor layer testbench components convert “what is being transferred” into “how it must be transferred,” or vice versa, in accordance with a given interface protocol. The timed portion of such a component is reminiscent of a conventional BFM.
In SystemVerilog object-oriented testbenches, this is commonly modeled inside classes. The DUT pins are bundled inside SystemVerilog interfaces and accessed directly from within these classes using the virtual interface construct. Virtual interfaces thus act as the link between the dynamic object-oriented testbench and the static SystemVerilog module hierarchy.
With regard to co-emulation, BFMs are naturally timed and must be part of the HDL top level module hierarchy, while dynamic class objects are generally not synthesizable and must be part of the HVL hierarchy. In addition, a transactor layer component usually has some high level code next to its BFM portion that is not synthesizable either, for example a transaction-level interface to upstream components in the testbench layer. All BFMs must therefore be surgically extracted and modeled instead assynthesizable SystemVerilog HDL modules or interfaces, also referred to as HDL transactors or HDL BFMs.
With the HDL modeling constructs supported by Mentor Graphics’ Veloce TBX acceleration solution, it is possible without much difficulty to write powerful state machines to implement synthesizable BFMs.
Furthermore, when modeling these HDL BFMs as SystemVerilog interfaces, one can continue to utilize virtual interfaces to bind the dynamic HVL and static HDL sides. The key difference with conventional SystemVerilog object-oriented testbenches is that the BFMs have moved from the HVL to the HDL side and the HVL-HDL connection must now be a transaction-level link between testbench objects and HDL BFM interfaces. That is, testbench objects may no longer access signals in an interface directly, but only indirectly by calling functions and tasks declared inside an HDL BFM interface (and representing transactions). This yields the testbench architecture depicted in Figure 2. It works natively in simulation and has been demonstrated to work also in co-emulation — i.e., with Mentor Graphics’ Veloce TBX acceleration solution.
Transaction-Level HVL-HDL Interface
With the timed and untimed portions of a testbench thus fully partitioned, what remains is to model transaction-based communication between the two sides. As described above, the binding of virtual interface handles on the HVL side to concrete interface instances on the HDL side enables a flexible transaction transport mode for HVL-HDL communication provided that HDL BFMs are implemented as SystemVerilog interfaces in the HDL hierarchy. The flexibility stems from the fact that user-defined tasks and functions in these interfaces form the API.
Following the aforementioned remote proxy design pattern, components on the HVL side acting as proxies to HDL BFM interfaces can call relevant tasks and functions declared inside the BFMs via virtual interface handles to drive and sample DUT signals, initiate BFM threads, configure BFM parameters or retrieve BFM status. This remote task/function call mechanism is based for the most part on the established Accellera SCE-MI 2 function model, so it has the same kind of performance benefits as SCE-MI 2. By retaining the original transactor layer components as the BFM proxies — minus the extracted BFMs themselves — impact on the original object-oriented testbench is minimized. The proxies form a thin layer in place of the original transactor layer, which allows all other testbench layer components to remain intact. This offers the maximum leverage of existing verification capabilities and methodologies.
Functional verification takes up 60 to 70 percent of the overall design cycle, and design complexity continues to rise. Over the last several years these facts have driven the development and widespread adoption of advanced testbench methodologies like OVM or UVM. As described in this article, the Mentor Graphics’ Veloce TBX acceleration solution provides a compelling testbench reuse methodology that dramatically improves verification throughput and productivity. Again, think of viewing a full frame of graphics in a matter of minutes using simulation acceleration compared to a day of pure simulation. By implementing this straightforward single-source, dual-top testbench methodology, users can target either pure simulation or accelerated simulation — both based on IEEE standards. Significantly, this methodology delivers much higher throughput to meet today’s design complexity challenges and maintains the benefits of testbench methodologies and coverage-driven, constrained-random, and assertion-based verification techniques.