With designs becoming increasingly complex by the day and transistor geometries shrinking, almost all the functional domains across SoC design teams are having a hard time to signoff their functions and Static Timing Analysis (STA) timing closure is also no exception. STA Timing closure is always an important and critical part of SoC design and lower technology nodes have only compounded the challenges for STA teams.
As the VLSI industry has entered the epoch of a lower technology node, diminishing transistor sizes and interconnect lengths have disturbed the ratio of cell and interconnect delays. This leads to requirement of signing off the SoC at multiple corners. After timing signoff at multiple Processes, Voltage, Temperature (PVT) corners, the silicon fabricated at submicron technology nodes shows appreciable increase in yield in terms of meeting timing specifications of the design.
However, timing closure at multiple PVT corners is in itself a huge challenge for the physical design team. This article will discuss these challenges and touch upon methodologies available to overcome them. We will discuss in detail, our solution to reduce the number of optimization corners in order to achieve efficient and coherent timing closure in minimum time. But before this, let us discuss in brief, the need to have multiple PVT corners for timing signoff.
Cell delays and interconnect delays are governed by manufacturing Process (P), operating Voltage (V) and ambient Temperature (T) properties of dies. These factors determine the physical properties of cells and interconnect like W/L ratio of cells and Resistance (R) and Capacitance (C) value of interconnects. At the 180-nm technology node and above, timing signoff at worst and best standard cell PVT corners with 2 RC extraction corners, namely, Cmax Rmin (Cmax) , and Cmin Rmax ( Cmin) was sufficient. On similar lines at 90 nm node 2 additional process corners Best Hot (Best process, Voltage at max temperature) and Worst cold (Worst process, voltage at min temperature) were introduced for the robust timing signoff, specifically for hold timing signoff as hold is skew dependent . The RC corners for these 2 process corners were similarly Cmax at min temperature and Cmin at max temperature respectively.
In 90-nm technology and above, a timing path is predominantly governed by cell delays. However below 90nm node, the contribution of interconnect delay in a timing path is significant and the Coupling Cap component (Cc) in net delay can significantly alter slack values at an endpoint of a timing path. The RC corners have to be split up wherein the contribution of each component Ground Capacitance (Cg) and Coupling Capacitance (Cc) has to be accounted separately. So on top of the 2 conventional RC corners Cmax and Cmin we have 2 more foundry specified RC corners:
- XTALK ( Cc is max , Cg is min , R is min)
- Delay (Cc is min ,Cg is maximum, R is max) .
Another important thing to note here is that interconnects and logical cells on single die can have different process. (For e.g., Interconnects manufactured with worst process and logical cells with best process and vice-versa). The possible cell corners and RC corners are shown in tables below.
S,o to summarize, working at technology nodes below 90-nm requires timing signoff at 4 PVT cell corners (Worst hot, worst cold, Best hot and Best cold) and 4 RC extraction corners (Cmax, Cmin,Xtalk and Delay). In all we have 4 X 4 = 16 corners for a single Timing Mode/View. If we have 8 STA modes for a design, then in all we have 8 X16 = 128 runs for the design.
The first solution to avoid such an enervating analysis for a single mode is to look for a corner that forms a superset of the reset of corners. However a graphical distribution of slack values for a design block across all the 16 corners shows that none of the 16 corners was a complete superset over the others, thereby leaving us with no other option but to signoff the design at 16 corners.
However this is too simple to be stated than actually doing it.
This simple calculation of the number of runs for a block poses serious challenges for the Physical Design team in following ways.
* The memory and run time requirement for the Placement and Routing (PnR) tool for Multi mode multi corner optimization for these STA runs would be huge.
* It would be very vexing for an STA engineer to analyze the same mode across 16 process /RC corners.
* Increasing iteration time between STA and PnR thus increasing the design cycle time.
There exist good methodologies that focus on reducing the number of signoff corners by either increasing signoff uncertainty or by increasing signoff clock and data derate numbers and then signing off timing in selected corners only. While these approaches have definite advantages in reducing number of signoff corners these approaches however may have disadvantages of either over–fixing (if we aggressively increase uncertainty/derates) or under-fixing (if we don’t keep sufficient margins to cover other corners). Moreover, we may not have significant reduction in number of signoff corners by using the above approaches.
A silver lining amid all challenges listed above is that the situation is not that bad for setup timing analysis. Setup timing violations are primarily dependent on the delay of the timing path (cell delays and interconnect delays, combinational and sequential arcs). These delays are significantly different for cell PVT corners (worst corners have delays considerably greater than the best corners) . For setup timing where worst corners are a complete superset over the best corners, the choice is between worst cold and worst hot standard cell corners to find out most critical corner for setup analysis. (Conventionally, worst hot corner has more delays but at lower technology nodes, worst cold can have more delays because the threshold voltage of MOS comes into picture and transistor gets slower at lower temperature due to temperature inversion phenomena).
When it comes to RC extraction corners, cmin is never more critical than other 3 RC corners. So for multi mode multi corner optimization for setup we can select 2 worst corner cell corners and cmax RC extraction corner (xtalk corner also if necessary) for meeting most of the setup paths in the design.
But the situation is completely different for Hold Timing. As hold is skew driven, it is very difficult to judge which combination of process cell corner and RC extraction corner out of the 16 combinations would have most of the hold violations in the design. As the slack distribution plots for hold violations show, none of the 16 combination is a superset over the other (4 plots have been shown here for convenience). The challenge is to find the optimum number of optimization corners so as to ensure that appreciable numbers of violations are fixed without compromising the memory and runtime requirements of timing and placement tool. This task becomes more daunting as extraction corners depend heavily on design layout. Even in the same design, different blocks are found to have different RC combinations that yield maximum violations, and so is the case across different designs. The graphs shown below represent slack distribution of
a design in two different RC Corners while keeping cell corner common.
Here each graph shows the slack at each endpoint for the corner combination specified in x and y axis. The frequency of blue dots both above and below the unity slope line indicates that some endpoints are more critical for x axis corner while an equally considerable number are more critical for y axis corner. Thus no RC corner is superset over other RC corner.
So our focus here is to find a generic approach that help us in deciding few optimization corners out of all signoff corners such that by fixing timing violations in only these few corners by APR tool, most of the timing violations are fixed in one go. Our methodology is to find the optimum number of corners for hold timing signoff and Multi Mode Multi Corner hold optimization.
We took 2 design blocks and did a comprehensive hold analysis across all 16 corners individually.
It isn’t necessary that selecting the top most critical corners for optimization would solve this issue but instead we can look for finding out corner that have the maximum common violations with the other 15 corners . The magnitude of violations could be taken care by adding extra pessimism in the optimization runs through uncertainties.
1. For this we prepared a 16 X 16 matrix where an element of the matrix m (i,j) showed the number of common violations between ith and jth combination corner.
2. For each corner (row/column), the highest (in red) and second highest number of violations (in green) were considered .(excluding the diagonal elements for which (i = j). For eg for best cmax corner we checked for corner having most number of common violations with best cmax and marked this row-column with red color. Similarly for corner having second highest number of common violations with best cmax is marked with green color. This exercise was repeated for all 16 corners to find out two corners having highest common number of violations.
3. In the next step we considered one best process corner, among the 8 (highlighted in blue color) having most number of common violations with each of the 8 worst process corners, for example best xtalk (in blue) has the maximum number of common violations with each of the 8 worst corners and similarly we considered one worst process corner, among the 8 (highlighted in purple) having most number of common violations with each of the 8 best cases. As shown in the figure worst cold xtalk (in purple) has the maximum number of common violations with each of the 8 best corners. Please note that this case can be already covered under Step 2 listed above but in our case violations in worst process and best process violations were not correlating. In some designs one of best corner can have most common number of violations with worst corner and can be marked with different color code.
Now for each row/column the corner with the maximum number of red,green and (blue/puple) elements would be out best choice for hold optimization. In our case, this gave us the hold optimization corners as “best xtalk” and “worst cold xtalk”.
After that we fixed hold violations in these two corners best xtalk and worst cold xtalk. Again a 16 X 16 matrix was made with the same rules as the first.
Corners fixed : Best cmin and worst cold xtalk
Again step 2 was followed and this time the worst corner with maximum common violations was found to best cmin.
The first 2 set of fixes plus a third set of fixes on best cmin were sourced across all corners to give us extremely positive results.
Corners fixed : Best cmin ,Worst_cold_xtalk, Best Xtalk
The matrix formed after this third level of hold fixing showed us that on an average more than 98 % of each of the 16 corners ‘s original violations were found to be fixed. The only violations remaining were the uncommon or mutually exclusive violations.
We were able to narrow down from 16 corners to 3 corners which can be a part of the MMMC hold optimization thereby reducing tool run time/memory requirement and also reducing the number of hold violations to a far extent. The exercise can be repeated further to improve the percentage of fixed hold violations. The same methodology can be extended across multiple STA modes also to find mode and corner combinations having most common violations among multiple modes and multiple corners. Moreover if after any stage of an iteration a diagonal element (i=j) is found to be 0 , then that particular corner combination can be removed from the list of final timing signoff corners because this means that hold violations in that particular corner have been covered in some other corners. The biggest advantage of above methodology is that it is not based upon any presumption that what would be most critical corner for a design.
Hold violations are very much skew dependent, so in different designs, hold violations can be more critical in any of signoff corners. With number of signoff corners increasing below 90nm technology node, fixing hold violations is proving to be big challenge for designers. Muti mode Multi corner optimization is always solution to this problem but Current EDA tools are not very efficient in terms of optimization quality and run time as number of optimization corners increases. So we need a methodology that can reduce number of optimization corners. The above methodology of fixing hold violation with most number of common violations, is very generic and technology independent. Using this methodology, hold violations across multiple corners and modes can be fixed in very efficient way.
About the authors:
Ashish Goel is a Lead Design Engineer at Freescale Semiconductor, India. He has 11 years of industry experience in various fields of VLSI such as Static Timing Analysis, RTL Design, Physical design and Formal technologies. He has been with Freescale since last three years. Ashish also holds multiple patents in the field of FPGA architecture.
Amol Agarwal is working with Freescale as Senior Design Engineer and has experience of more than 5 years. He is working in physical design team at Freescale with STA & Synthesis as area of specialization. He has been involved in several block-level and chip-level designs in technology ranging from 250-nm to 40-nm.
Prateek Gupta is working with Freescale as Design Engineer for more than a year and has been actively engaged in Timing Closure activities for various design blocks at 90 & 55 nm technology nodes.