This paper introduces the basic concepts of debugging and briefly explains their implementation in a System on Chip (SoC), collectively known as debug architecture, with reference to Nexus and ARM CoreSight standards.
With growing System on Chip (SoC) complexity, software intricacy is also increasing to fully exploit the hardware advances; so debugging the software is increasingly becoming a major challenge for developers. In order to aid the developers in debugging their software, a hardware ecosystem is built inside the SoC, which is commonly known as the debug architecture of the SoC. Before delving into the details of the debug architecture, let us understand the requirements of debugging.
Broadly the requirements of debugging can be stated as below.
* Observability of system registers and processor states with the capability to modify them out of code execution flow.
* Ability to halt and run the processor as per need.
* Obtaining information of various software threads running in a SoC so as to debug and tune the software for better performance. Provision for triggering the collection of such information on occurrence of a particular runtime event.
* Securing the system from unauthorized access using debug resources.
* Ability to debug in various low power modes of the system.
Halting the processor to access various states and parameters of the system is known as Static/Halting Mode debugging and accessing the system without disturbing the normal code execution flow is known as Dynamic/Monitor Mode debugging. In Dynamic/Monitor Mode debugging, generally a monitor program gets executed on the occurrence of a debug event. This program then communicates with the external debugger to perform the requested access without halting the system. This mode of debugging finds its use in real-time systems like engine controllers and servo mechanisms in hard drive controllers.
Obtaining information about running threads and the associated program and/or data flow is known as Tracing. In this debugging mode, data is output on a dedicated parallel interface instead of on the debugging interface.
Traditionally software programs were debugged mainly using In-Circuit Emulators (ICE) and Cycle accurate software models of the system. In ICE mode of debugging, the component to be debugged (generally the processor) was replaced by an emulator component which allowed access to various internal states and registers of the component, simultaneously performing the job of the replaced component. Although such a scheme was fast and efficient, it was a costly due to the ICE module. The Cycle accurate software models, although cost effective, were slower compared to ICEs and the effort to develop such models increased greatly with increasing complexity of the system. In order to overcome the limitations of both of these methods, various standards (e.g. Nexus, CoreSight) were created to develop a dedicated hardware ecosystem/ debug architecture, as part of the complete system, offering assistance in debugging complex software and having the capabilities of ICE without too much associated cost.
The components of the debug architecture of a SoC can be classified into the following categories.
1. Debug Interface
2. Hardware ecosystem supporting debug
3. Trace mechanism
Debug interface is the port being used which a debug monitor running on a PC can view and/or modify the internal state of a SoC. This interface is built on a standard communication protocol for receiving debug commands and sending the required response. Two commonly used debugging protocol standards are:
* Background Debug Mode (BDM)
* Joint Test Action Group (JTAG)
Background Debug Mode (BDM) has a single wire bi-directional debug interface along with a Background Debug controller (BDC). The external interface pin is pseudo open drain with a pull up and the communication is asynchronous. This protocol provides almost all debugging features (e.g. halt, run, read/write memories, tracing, etc.) except boundary scan. This protocol is generally found in small chips with limited pin count. ARM uses a similar protocol known as Serial Wire Debugging, where the external debugger communicates with the system through a 2-wire interface.
Joint Test Action Group (JTAG) standard was initially developed for PCB manufacturing tests (e.g. continuity of tracks and connectivity of solder connections), where boundary scan architecture is used to test various on-board components (especially surface mounted components) which cannot be tested using a bed-of-nails tester. The JTAG interface generally consists of clock, data in, data out and mode select pins. The JTAG protocol is synchronous having instructions to control IC pin state (EXTEST), as well as core logic inputs and outputs (INTEST). By extending these JTAG instructions to access various modules of a SoC, the JTAG standard was used instead of ICEs. The JTAG Test Access Port (TAP) was suitably modified to accommodate the extended instructions which allow access to various internal nodes of the SoC.
Hardware ecosystem supporting debug
The principal purpose of a hardware ecosystem for debugging is to fulfill all the requirements of debugging without affecting the performance of the SoC. One of the most common methods of debugging is halting the processor or getting system state at a particular point of code execution or memory access. These requirements are realized by Breakpoints and Watchpoints. Whenever a breakpoint is encountered, the code execution halts. Generally breakpoints are used when executing software from RAM. When a breakpoint is put in the debug software running on a PC, the debugger inserts an instruction at the memory location of the code where the breakpoint is located; so that the processor will halt when it executes this inserted code.
Watchpoints on the other hand don’t halt the processor execution when access to a particular memory location occurs. When a processor encounters a watchpoint, normally a trace message is emitted or the system state at that point gets reflected on the debug software. Watchpoint support is implemented in SoC in the form of watchpoint registers, which are programmed with values of address, control and data signals at which a watchpoint should occur. Comparison and mask circuitry compares the current values of the signals with that programmed in the watchpoint register and generates an output in case of a match, indicating occurrence of watchpoint. Watchpoints can be programmed to act as breakpoints when executing code from ROM. As the watchpoint registers in a debug architecture are limited, only a few (normally 2 or 3) breakpoints can be realized when debugging code loaded in ROM.
The creation of a hardware ecosystem is generally governed by standards (e.g., Nexus and CoreSight). The following sections describe general debug architectures conforming to each of the above standards.
According to the Nexus standard, the debug architecture may consist of the following components.
1. JTAG Controller
2. Nexus Test Access Port (TAP) implemented on the SoC component to be debugged.
3. Nexus Port Controller
4. NPC Handshake
Figure 1, below, illustrates a simple Nexus based debug architecture.
Figure 1. Nexus Debug Architecture
The JTAG controller acts as the interface between the external debugger and the on chip debug support hardware. Different Nexus TAPs present in various modules of the SoC are accessed through this primary JTAG TAP. The JTAG controller is interfaced with the Nexus Port Controller (NPC) and other Nexus TAPs through the JTAG interface.
Nexus Test Access Ports are implemented on the SoC component required to be debugged. It consists of a JTAG interface for communicating with the JTAG controller, Breakpoint/Watchpoint control, trace control, memory for storing trace data and a module for transmitting the trace data to the Nexus Port Controller (NPC) through Auxiliary interface. This hardware module makes it possible to put breakpoint/watchpoint in the code, access various processor and IP register space and system memory. These tasks are accomplished by implementing the Nexus TAP in the processor and then accessing the rest of the system through the system bus connected to the processor. For a multi-core SoC, Nexus TAP can be implemented on each of the processors for synchronous and independent debugging of each processor. Synchronization among multiple processors is achieved through cross-trigger channels among Nexus TAPs present in the processors. Nexus TAPs support external event input for starting trace operation. Nexus TAPs can be connected to the system bus in order to gain direct access to the system memory for memory snooping and fast code download.
The Nexus Port Controller (NPC) controls the trace port used for transmitting trace data. It contains a JTAG interface for communicating with the JTAG controller and arbitration/muxing of different auxiliary ports. The NPC has an external message and event interface on which trace data is sent and occurrence of an event is communicated.
The NPC Handshake module is responsible for debug entry/exit across low power modes of the SoC.
All the above Nexus components are collectively known as Nexus Development Interface (NDI). The reset to the NDI is kept separate from the system reset and can be optionally controlled through an external pin (e.g. JCOMP). NDI has an internal censorship feature to prevent unauthorized system access through the debug interface. As the topology of a Nexus based ecosystem is fixed, the external debugger should have prior information about the debug support infrastructures present in a SoC.
CoreSight is an ARM standard for creating the debug architecture for an ARM based system. The following are the primary components used in this architecture.
2. Debug Access Port (DAP)
3. ROM Table
4. Cross Trigger Interface (CTI)
5. Cross Trigger Matrix (CTM)
6. Embedded Trace Macrocell (ETM)
7. Trace Funnel
9. Embedded Trace Buffer (ETB)
10. Trace Port Interface Unit (TPIU)
Figure 2 and Figure 3 illustrate a simple CoreSight based debug architecture.
Figure 2. CorSight Debug Architecture
Figure 3. CoreSight Trace Architecture
All the above listed components except for the Debug Access Port (DAP) and ROM table are collectively known as CoreSight Components. The ROM table, present as part of DAP, lists the memory mapped address of all CoreSight components present in a SoC. It is to be noted that one ROM table can point to another ROM table.
EmbeddedICE contains watchpoint control and status registers to facilitate watchpoint functionality on ARM cores which can also act as breakpoints when debugging from ROM or Flash.
DAP acts as a communication interface between the external debugger and the SoC while acting as a bridge between external debug clock and multiple domains for cores in the Soc. It accesses the cores present in a SoC through a Debug bus, thus communicating with them at highest possible frequency, rather than the slowest frequency in case of boundary scan. For ultra fast code download and memory mapped peripheral access, DAP can be connected to system bus, thus making the external debugger a bus master. In Figure 2, DAP is connected to AMBA AXI bus through AXI-APB and AHB-AXI bridges. The external interface of the DAP can be a full-fledged JTAG port or a reduced pin port known as Serial Wire Debug.
Cross Trigger Matrix (CTM) and Cross Trigger Interface (CTI) are present to facilitate synchronous starting and stopping of the cores. CTI provides an interface to communicate any trigger to the CTM, which then broadcasts the trigger to all other CTI present, to synchronize the operation among various cores.
Provisions are included in the architecture to debug from reset and partial power cycles, by separating the reset signal for the CoreSight components; and by ignoring power down signals or maintaining the components in a separate power domain respectively.
The CoreSight architecture also contains authentication registers to restrict/prevent access for unauthorized system access though debug interface.
In the CoreSight architecture, the topology of the CoreSight components present is detected by the external software. This is done by reading the ROM table present in the DAP to access the location of all present components. Thereafter the Peripheral ID of the ROM table is compared against the list of saved system descriptions. In case of a match, the saved system description is used. Otherwise the debugger identifies each component along with their supported interfaces (master or slave) and control mechanisms. Thereafter the debugger drives all the master interfaces of the components to detect all the slave components attached to them; thus detecting the topology of the CoreSight components present. After topology detection, the description is saved for later use. Any undetected component is ignored by the external debugger.
The Trace mechanism is present in almost all debug architectures to facilitate real-time debug, code profiling and tuning of embedded software. The following sections describe the trace mechanism present in the Nexus and CoreSight architectures.
Nexus Trace Mechanism
In Nexus architecture, all the tracing hardware is present in the Nexus TAP present in the modules of the SoC. Nexus TAPs consist of trace control, Message FIFO and Message transmitter blocks for supporting tracing functions. These traces are then sent to the Nexus Port Controller (NPC) through Auxiliary interface. The NPC accepts all the traces from various Nexus TAPs and sends out the traces of one Nexus TAP at a time, by prioritizing the trace inputs. The Nexus architecture provides for the following type of trace messages.
1. Program Trace
2. Data Trace
3. Ownership Trace
4. Watchpoint Match Message
Program traces display program flow discontinuities (direct and indirect branches, exceptions, etc.), allowing the external debugger to interpolate what sequence of events occurred between the discontinuities.
Data Traces display the data associated with a memory write/read. Such a provision helps in debugging data coherency errors and corruption of shared memory location.
Ownership traces provide visibility of various process ID or operating system tasks getting activated. This provision helps in debugging and profiling multi-threaded software and estimating the real time budget getting used by various threads.
Watchpoint Match messages are sent on a programmed watchpoint occurrence.
Normally Program trace, Ownership trace and Watchpoint Match message are implemented in a trace ecosystem due to the higher cost of Data trace implementation in terms of larger data capture module, requirement of more trace buffers and faster trace port.
CoreSight Trace Mechanism
The CoreSight architecture introduces the concept of software trace and hardware trace mechanisms, while allowing the flexibility of having both types of trace mechanisms under a unified architecture.
Software traces are generally generated by the software being debugged. In this scheme, the software dumps the trace data on a system memory location, while a separate process empties the trace data and sends them to the external debugger through available communication channel (e.g. JTAG, ARM Debug Comms Channel). The trace data collected by this method has a component of indeterminism due to the number of cycles taken for the emptying task to access the trace data on a system memory and sending the same over the communication channel. CoreSight architecture alleviates the problem by incorporating an Instrumentation Trace Macrocell (ITM). Using ITM, the software traces can be sent to a dedicated trace buffer (on-chip or off-chip) with deterministic cycle time. Such instrumentation traces also help in understanding the execution context of various software threads.
The Hardware trace mechanism consists of an Embedded Trace Macrocell (ETM) attached to the module to be traced. ETM can be designed to provide both program tracing and data tracing functionalities.
Various asynchronous, heterogeneous trace streams from different trace sources can be combined via a single trace port or trace buffer, using a Trace Funnel.
Trace streams are stored in trace buffers implemented on-chip (e.g. Embedded Trace Buffers) or sent through a interface (e.g., Trace Port Interface Unit) to be stored in an off-chip buffer. These buffers can be connected to a single trace stream by using a replicator, which replicates an incoming trace stream onto its two output ports.
Figure 3 depicts an example of a CoreSight Trace ecosystem.
This paper explains generic debug architectures based on established standards like Nexus and CoreSight. Based on the system, the debug architecture can be modeled as per need. Additional debug capabilities (e.g. like that of a logic analyzer) can also be implemented, which will allow program/data trace based on trigger events, nested trigger events and delay based trigger mechanisms.
1. Single Core or Multi Core: Debug Made Easy With Nexus
2. IEEE Nexus 5001 Standard Version 2.0
3. Debug and Trace for Multicore SoCs (ARM Whitepaper)
4. CoreSight Architecture Specification v1.0 (ARM)
About the authors: