VMM based multi-layer framework for system level verification

by Ashok Chandran, Sajeev Thomas and Saj Kapoor, Analog Devices , TechOnline India - May 25, 2011

This article describes the traditional method used in system-level verification and how the new approach improves it, explaining the layered architecture with an overview of the advantages it presents. Techniques to improve runtime with proper threading and memory management are described, as well as methods to overcome issues of large compilation time using separate compilation and multi-core compilation techniques.

Introduction

Verification based on the Verification Methodology Manual (VMM) is a proven methodology for implementing block-level verification environments. Leveraging the block-level verification components at the system level provides significant improvement in verification quality and reduces the time required to meet coverage at system level. A system-level testbench brings a host of challenges that need to be addressed, including runtime, randomization quality, system memory management, multiple register access interfaces, clock domains and

random stability. Block-to-system reuse methodology should be simple and scalable.

For a system-on-chip (SoC) with proprietary cores and system interfaces, coding an assembly language test that can exercise different modes of each peripheral component is not a scalable solution. Also, this does
not fit well with the VMM-based flow, where there are several simulation threads accessing the peripheral together. For example, at the time that a peripheral is being configured, another thread may be reading

from the same register space to check the interrupt status. This behavior cannot be modeled in an assembly test, where we have only a single instruction stream per core. The methodology replaces the core with a bus functional model (BFM) directly driving onto the system interface bus. Each block testbench is handled in different threads and can access the respective peripheral components. Usage of the VMM register abstraction layer (RAL) ensures that the block-level testbench undergoes minimal behavioral change when migrating to system level.

A system testbench needs to program the system components as per the peripheral requirement. For example, a universal asynchronous receiver/transmitter (UART) block will need the direct memory access (DMA) engine to be configured and memories initialized before transmitting. Since the system architecture is common for all

peripherals, it makes more sense to provide a uniform platform and utility tasks that can configure the system as per the peripheral requirement. The multi-layer architecture ensures system support for each of the peripheral components. Each layer implements randomization and ensures maximum system and peripheral coverage.

The environment is optimized for performance by supporting thread management, conditional compilation and plug-and-play support for block-level testbenches. This means having a bottom-up approach where the block level needs to adhere to a basic, but broad, set of guidelines to ease integration.

This article describes the traditional method used in system-level verification and how the new approach improves it, explaining the layered architecture with an overview of the advantages it presents. Techniques to improve runtime with proper threading and memory management are described, as well as methods to overcome issues of large compilation time using separate compilation and multi-core compilation techniques. Finally, the article will touch on how this testbench avoids random stability issues using VMM record/playback method and supports test dumping in assembly formats.

Design details

The design considered here is a complex, general-purpose SoC containing multiple cores, an interrupt controller, a large set of proprietary peripheral modules, L1/L2/L3 memories, memory controllers, DMA engines and several intellectual property (IP) blocks. Some blocks can access system memory through inbuilt DMA engines. Others use system DMA engines to access memory. Even though not shown explicitly, there are several layers of arbitration between a peripheral and memory.

The chip also has multiple clocks, power domains and a system crossbar for arbitration as shown in the simplified representation in Figure 1. The external pins are shared across several peripherals using a multiplexing scheme.

 

                             

                                                               Figure 1 - Design overview

 

Verification challenges at system level

System-level verification aims to verify the different combinations of system configurations along with peripheral
modes. This reveals many interesting cases, of which a few are listed below:

• Are we exercising modes of the peripheral with all possible modes of system configuration that may affect it?
• Is anything missing on the connectivity that might be visible once under a particular configuration of system/peripheral?
• Are all modules hooked up to the proper clock domains?
• Are there any bandwidth issues in the system when multiple peripheral devices contest for system resources?
• Does each DMA/peripheral have read and write access to the entire memory space?
• Are all registers within the system accessible?
• Are traffic patterns being generated, which would validate that the system really supports the use cases?
• Has the block assumed a behavior that is not valid on the system, or is the block seeing a behavior for which it was not designed?

Verifying each of the above cases using directed tests is not viable for such a complex SoC due to the large number of combinations involved.

Using block-level VMM testbenches will provide good coverage at system level, but when the number of blocks goes up, a whole new set of challenges arises.

 

• System programming should be related to the peripheral mode. For example, the DMA needs to be programmed in memory-write mode for a peripheral in receive mode. System configuration information is
encapsulated within the peripheral transaction classes when the block-level testbench operates at system level. This layered approach provides a common methodology, which is implemented across all blocks.

• Large numbers of block sub-environments and threads running together will definitely slow down the simulation. Careful thread management is required to make sure that only active testbenches consume CPU cycles.

• A huge testbench and a complex design will cause massive compilation time, which is a severe development bottleneck. Advanced features of the tools have to be explored to overcome this obstacle.

• The register access mechanism has to be common across block and system testbenches. VMM RAL provides just the right solution, eliminating any issues when the physical interface for register access changes between the block and system levels.

• Reproducing the test case that caused the failure is a must for validating the fix.

Multi-layer framework

The testbench components operate at different layers. The layered architecture shown in Figure 2 enables block-to-system reuse and plug-and-play capability at system level. The different layers starting are described below, starting from the bottom.

VMM register abstraction layer

This layer handles the RAL-based register read and writes. The RAL plays a significant role in promoting block-to-system reuse of testbenches. The testbench is made independent of the physical bus protocol.
High-level RAL transactions are converted to physical layer transactions inside the RAL access layer. Physical layer transactions describe the  physical protocol (e.g., APB, AHB, and AXI). The physical layer uses the
transactions provided by RAL access layer to drive the bus signals.

System component layer

This layer manages all system components, such as DMA, memories and interrupt controllers. The layer is analogous to a software API and provides a set of tasks, objects, protocols and guidelines for managing the system. For example, a request of n data items is translated to proper DMA programming and memory initialization with required data, as well as handling of the DMA interrupts. The layer also adds data checks
to verify data paths and indicates interrupts to the peripheral layer.

Peripheral layer

Block-level subenv is managed by this layer. It also synchronizes system events like interrupts and configuration with the block-level subenv.

             
                                             

                                                         Figure 2 - Layered architecture

 

Methodology

This section elaborates on how the layered architecture is implemented and discusses other aspects that prove valuable in system-level verification.

Core agnostic testbench

The cores used in most SoCs are either fully verified or are used with minimal changes. Even if the changes are significant, their verification requires completely different approaches when compared to system-level verification. Therefore, it was decided to keep core verification separate from system verification. The testbench replaces the core, and directly drives into the system as shown in Figure 3.

This approach provides much more controllability and capability to support block-level VMM testbenches without changes and enables faster simulation due to reduced design overhead. It also allows coverage graded assembly tests to be generated and run on the real core.


                             

                                                   Figure 3 - Core replaced by BFM

 

Base class change from block- to system-level

The base class is transformed from vmm_data to a system-level base class as shown in Figure 4.
The base class chosen for the transformation is the one most closely related to the peripheral and system configuration. It should have parameters that affect the system configuration. These may include the peripheral direction and all interrupts that have been enabled. System features are now embedded inside this transaction class. In other words,the system layer is inserted into the generation stage; the coding style is illustrated in Example 1. A few noteworthy advantages include:

• Additional constraints can be added onto the random system-level classes to relate peripheral modes with the system modes, so the same set of classes can be used in block and system testbench without any code change.

• Use of VMM data macros ensures seamless change of class hierarchy.

• A unique peripheral_id identifies the peripheral within the system.

• The register functions within the system base class and usage of vmm_opts ensure that the code change is not required when the interrupt changes or the pad pin position is changed. This proves very effective in porting the testbench across projects and supporting multiple instances of the testbench in system.

 

                              

Example 1 – Coding style for peripheral base class

 

 

                              

Figure 4 - Base class change

 

 

Integrating and managing sub-environments

In the traditional VMM-based flow, all block-level sub-environments are added inside the top-level environment. Configure, start, stop, etc. of the subenv will then be called from vmm_env. This flow presents certain issues:

• The top-level environment gets cluttered for a large set of sub-environments
• Mixing up of initialization tasks of all sub-environments causes portability issues
• There is interdependency between sub-environments and performance degradation

In the new approach, the block-level sub-environments are instantiated hierarchically within the top-level VMM environment, as in the traditional flow. The difference is that the management of the sub-environment is taken up by a xactor derived from a common system-level class. The xactor forms the interface layer between the
peripheral layer and the system component layer. This xactor phases the sub-environment as it was in the block level. Also, system-level tasks are called from within the xactor to configure system component support
required by the peripheral. System interrupt notifications received within this xactor are passed to the block testbench by notifications or function/task calls. The advantages of this approach are:

• The top level is freed from the task of initializing sub-environments.
• Usage of VMM xactor iterators can provide plug-and-play capability. Once an xactor is instantiated, connectivity and consensus will be automatically added. The only change at the top level for instantiating is reduced to calling the constructor of the xactor and the sub-environment.
• There’s more portability for the testbench since the entire phasing of the sub-environment is completely within the xactor.

Example 2 shows sample code for implementation of the peripheral xactor:

                               

 

                                                     Example 2 – IO xactor implementation

 

 

Example 3 below demonstrates how adding the IO to system vmm_env is greatly simplified. There is no need to connect channels or start/stop the xactors. These will be done by VMM xactor iterators, which look for derived classes of the system xactor base class. 

 

                               

             

 Example 3 – Xactor plugged into env

 

End of test is decided upon by the state of the system, as well as the peripheral state dictated by the consensus passed from the sub-environment. The testbench looks at the consensus status of all active components before ending the test.  

 

Specifying cross relations

Using VMM multi-stream scenarios is an efficient way to specify relations between peripheral transactions of
different types or modes within the same peripheral transaction. Since all transactions are derived from a common base class, as described earlier, they can be passed through a common framework. A router class
(derived from vmm_broadcast) routes the transaction to the respective transactor, which will process the packet. This flow is shown in Figure 5.  

 

                               

Figure 5 - Transaction flow 

 

The router callbacks can also drop/modify packets as required. Modification would be needed when pins are shared across multiple blocks using a multiplexing scheme. In this case, a transaction that has a conflicting pin can be dropped. Since the router is aware of the transactions generated, it can decide to call start_xactor() for only those peripherals that will be active in the test, thereby avoiding unnecessary threads.

Functional coverage

Functional coverage is added for each system component. This ensures a good cross relation between the block and system function coverage. Since the block transactions and xactors are reused, functional coverage from block-level testbenches can be reused at system level.

Pipelined RAL access

For a multicore system, where any core can access the peripheral registers, there is no direct method to specify the core on which the peripheral operates during RAL read/write. An indirect approach is to specify the core via the data_id field of the read/write task, but the disadvantage is that this usage of data_id cannot be enforced on block level environments that may come from different sources.

Another solution is to randomly map the transactions to any one core; however, all transactions to a block must use the same core interface. This approach can be refined by making the assignment non-random. However, there is still a lack of efficiency since RAL access task execute_single() can take only one transaction at a time, even though multiple interfaces are available. The pipelined RAL is switched on to optimize the interface usage by allowing all to be used together.

Memory management using VMM-MAM

For an IP block with its own DMA engine, system memory has to be directly configured. This will require allocation of memory segment. Using the VMM Memory Allocation Manager (MAM) for system memory
allocation prevents conflicts between different blocks trying to access memory and helps in block-to-system reuse. By changing the memory pointer, the block testbench can access different system memories while
running at system level. The VMM-MAM also allocates random regions with required memory alignment.

VMM record/playback

Transactions can be recorded using the VMM record feature. This helps save a test case and play it back even when the testbench has undergone several changes. Without a record, the test may not hit the same scenario with the seed if thread ordering has been changed.

Test dumping using RAL callbacks

Configurations that are done dynamically can be printed out into assembly language format by appending RAL callbacks. This will generate a static version of the test being executed. The callbacks can be plugged in before the configuration starts and removed once the configuration is done as shown in Example 4 below. 

 

                              

                                                Example 4 – RAL dumping example

 

 

Fine-tuning performance

Performance is an important aspect for any system-level testbench. The huge increase in the size of the design
database coupled with the use of netlisted components reduces compile time performance as well. So the testbench implements several features to optimize compile and runtime performance:

VCS separate compile: Separately compiling the design and testbench provides a significant advantage during testbench development. The design need not be compiled if only the testbench has changes. Packages can also be specified as a separate partition. The comparison is shown in Table 1.

Pre-compiled tests: With vmm_test, the user can pre-compile all tests and select required one with +vmm_test=<testname> option. vmm_opts can be used to specify further runtime options.

Conditional compilation: Specific testbenches alone can be enabled by using Verilog defines. This helps to reduce testbench compilation time while working on individual blocks. It also enables subsystem verification at system level. The usage of VMM xactor iterators helps in simplifying conditional compile usage.

Thread management: The sub-environment and the xactor handling it are enabled only when the peripheral is used in the test. This ensures that only useful threads are active during simulation

Sparse memory models: The memory models used are associative arrays, sparse arrays (using the VCS sparse pragma) or PLI-based to reduce runtime memory usage. Usage of profiler helps point to potential memory issues.


                              
                                      Table 1 - Comparison of normal and separate compile

 

Conclusions

Since the design under verification was very complex, a block-to-system reuse methodology is needed to leverage the efforts undertaken at block level. Advanced features of the VMM-based methodology were utilized to build a scalable architecture for the testbench. Setting up such a testbench was challenging. With VMM library support, this task was simplified and streamlined in a couple of months. The platform has integrated more than six block level testbenches, and added assertions and monitors for several more. Each block-level testbench was developed by an independent team. Legacy non-VMM testbenches were also
incorporated with ease. Continued refinement of the flow has provided a stable verification platform for future generations of the design. The built-in test sequencer is also able to dump out a meaningful assembly language test, which can be run on the real core. The testbench also leverages advances in simulation technology to maximize the available CPU cycles while reducing development time by compile time reduction.


References


J Bergeron, E. Cerny, A. Hunter, A. Nightingale, Verification Methodology Manual for SystemVerilog
VMM user guide
VMM register abstraction layer user guide
VMM primer - Memory allocation manager
VMM primer – Performance Analyzer
Synopsys, VCS documentation
http://www.vmmcentral.org


About the authors:

Ashok Chandran is a Senior Design Engineer at Analog Devices. He has been associated with Analog Devices for the past 3 years and has worked on design and SoC level verification in the Blackfin DSP processor design group at Bangalore.

Sajeev Thomas is a Senior Technical Lead at Analog Devices. Has been associated with Analog Devices for the past 10 years and has worked on design and verification of various Blackfin and SHARC DSP processors.

Saj Kapoor is a Senior Engineering Manager at Analog Devices. He has been working with the DSP design group in Bangalore, India since 1995. During this time he has worked on multiple Blackfin and SHARC DSP products.

Comments

blog comments powered by Disqus