Automated physical datapath helps meet custom power, performance and area goals with predictable results and shorter design schedule
Custom datapath designers designing microprocessors, digital signal processors (DSPs) and graphics processors have to meet aggressive Performance Power Area (PPA) targets. They use datapath techniques such as tiling to layout highly regular structures for commonly used building blocks such as adders, multipliers, coders, decoders, etc., used in processor designs.
In tiling, the designer partitions specific functions and arranges library cells for those functions into rows and columns. Each cell is placed relative to the other cell; as an example, cell A is on the same row and to the left of cell B and so on. Tiling enables custom layouts that can meet rigorous PPA targets. However, tiling requires a lot of manual work to make sure the structure is regular; the cells are optimal and address multi-mode multi-corner, SI and multi-voltage requirements.
Creating and maintaining the optimality of tiling throughout the design flow is time consuming and results in longer design cycles. At the same time, designers designing mobile and multimedia SoCs at 40nm and below are also using processor cores. These “mainstream” designers would like to target that same custom performance but without the penalty of longer design cycles and extensive manual intervention. Additionally, datapath techniques are being extended to design structures such as register banks, clock trees, multiplexers, etc. In summary, mainstream designers need an automated datapath solution that allows them to meet custom design objectives with predictable and shorter design schedules.
What are custom datapath designs?
The logic in digital designs is typically classified into two categories: control blocks and datapath blocks. Control blocks are random in nature and are handled well by standard synthesis and place-and-route tools. Datapath blocks perform Boolean (AND, OR, and XOR) or arithmetic (ADD, SHIFT, and MULTIPLY) operations. In datapath designs, Boolean or arithmetic bitwise data operations are performed in parallel on each bit of a bus. Each operation corresponds to a dedicated function, for example, adder, multiplier, register and multiplexer. Figure 1 shows an example of a datapath block.
Figure 1: Datapath block
Datapath blocks, shown in Figure 1, require regular structures that are not well supported by synthesis and place-and-route tools, as they do not consider the regularity of datapath cells as a cost function. Therefore, designers implement datapath blocks manually by using tiling structures where the blocks are partitioned into dedicated sections on the floorplans.
The designer creates a datapath structure and then places it in a bit-sliced style, as shown in Figure 2. Cells that are operating on one bit are placed in horizontal rows, and each row is repeated and abutted vertically for each bit. It is the designer’s responsibility to ensure that the regularity of the structure is preserved in all the physical design stages. This is not a trivial task.
Figure 2: Datapath structure placed in a bit-sliced style
Benefits of custom datapath design
Custom datapath structures, though non-trivial to create, help implement greatly optimized structures. The biggest benefits are better performance, lower power and smaller area. By maintaining the regularity of cells, designers are able to achieve multi-gigahertz clock frequency, reduce clock skew and power. For high-speed designs, reducing parasitic (RC) delays and skew is extremely critical.
Placing cells closer reduces area and wire length, and mitigates routing congestion.
Aligning cells and pins creates straighter wires and reduces vias and jogs. At advanced process nodes, this helps reduce cost and improve manufacturability. Custom datapath structures once created are quite frequently used as intellectual property (IP), with predictable results.
Costs of custom datapath design
It is quite common to see datapath designers use their unique flow to implement datapath structures. This flow, typically a combination of a proprietary language and GUI, helps meet PPA targets but is hard to share across teams and to provide as IP to external customers. This is because the combination of a standalone custom datapath stage and a traditional place-and-route (P&R) tool yields a complex and inefficient flow.
Designers first create a datapath structure and then need to ensure that it is preserved through the full place and route flow in the context of the rest of the design implementation. As shown in Figure 3, this flow requires many optimization loops to meet the targeted PPA. Changes in cell selection made in one part of the design affect other parts of the design and, in turn, might alter the structure of the physical datapath. Picking cells with the correct drive specifications and size (height/width) to fix timing or logic design rule constraints without disturbing the datapath structure requires extensive design and tool knowledge. The bottom line is that in the custom flow, the cost of design changes is very high.
Design knowledge requirements
Custom datapath designers need to have detailed knowledge of datapath blocks, which requires access to the design at an early stage of the flow. Not every function or block is suitable for structured placement. Designers need to know the dataflow, input-output connections and loads when making major cell placement decisions. Typically datapath designs have large buses (64/128/256 bit); therefore knowing the fan-in logic of input and output loads and the size of the buses (64/128/256 bit) is important from a routing resources standpoint.
Implementing high-performance custom datapath design is time consuming, primarily due to the handcrafted placement stage and the iterations between custom datapath and traditional place and route. Depending on the design complexity, the number and size of datapath blocks, and the technology node, custom datapath design teams’ development schedules are long and harder to predict.
What do designers want?
Designers developing cores for processors, DSP or other applications design them knowing their IP will be used in multiple products at different process nodes. These designers want IP that is portable, easy to use, and predictable and meets the PPA targets of the end users. Mainstream designers, especially those developing mobile and multimedia applications, require powerful processing engines and the capacity to handle increasing graphics content.
To meet these requirements, they use on-chip processor cores, DSP cores and small memories. Mainstream designers want a solution that helps them create datapath structures for the IP blocks as well as handle standard cells in the design seamlessly. These designers also want a solution that can help them create register banks, clock structures, multiplexers, crossbar switches, etc. that are more efficient and help meet PPA objectives. In summary, designers want an automated solution based on traditional place and route tools that delivers custom PPA but with a shorter and predictable design schedule.
Automated datapath solution
Physical datapath technology in IC Compiler provides logic designers with a predictable, production proven flow in a single unified environment. With the support of Design Compiler during the front-end synthesis phase, RTL designers can create datapath structures and then pass them to IC Compiler for place and route. As needed, designers in IC Compiler can also create datapath structures to specify the relative column and row positions of instances by using simple built-in Tcl commands. These specific commands are called relative placement (RP) constraints. During placement, legalization and optimization, datapath structures are physically preserved and are placed as a single entity in the context of the rest of the design.
Relative placement is usually applied to datapath blocks and registers but can also be applied to any cells in the design, which require some regularity. Examples of commonly used datapath structures are adders, register banks, coders, decoders, multiplexers, etc. Figure 4 displays the design matrix as an RP group after placement.
Figure 4: Structured placement of physical datapath using RP constraints
Physical datapath examples
ARM cores are used extensively in high-performance mobile and communication products. The ARM Cortex-A8 processor’s NEON unit process multimedia applications and include blocks such as video encode/decode, 2D/3D graphics, gaming, audio processing, image processing, etc. In the NEON unit, the 19 blocks highlighted in Figure 5, used IC Compiler RP constraints for a physical datapath implementation that reduced total negative slack (TNS) by 20x.
In this example, RP constraints were used during clock tree synthesis to build clock structures with straighter routes. Figure 6 shows stacked groups of flops driven by the same leaf nets. The clock structure described using RP constraints reduced clock buffer area by 30 percent, the number of clock buffers by 10 percent and clock switching power by 8 percent.
Here, RP constraints were used to create a 32-to-1 multiplexer, using 4-to-1 and 2-to-1 multiplexers. The 128-bit wide 32-to-1 multiplexer structure with structured placement and straight routes shown in Figure 7 delivered 50 percent better timing and 30 percent smaller area.
Figure 7: Physical datapath - Crossbar switch
While datapath designers have been using custom techniques such as tiling to achieve aggressive PPA targets, flow complexity and long project schedules make custom techniques unsuitable for mainstream designers. IC Compiler’s Physical
Datapath capability has enabled designers to achieve custom PPA targets with predictable and shorter time-to-results. Physical datapath has found increasing usage with mainstream designers as evidenced by the design examples presented.
With the semiconductor industry moving toward 40/32/28nm nodes, aggressive PPA targets, greater use of on-chip IP, and shorter design schedules, it is expected that the usage of physical datapath technology will continue to grow and proliferate.
About the author:
Jafar Safdar is a product marketing manager for IC Compiler at Synopsys. Over the past 15 years, he has held various application engineering and marketing positions at Synopsys. He holds a master’s in electrical engineering from the California State University Northridge.