Digital blocks contain combinational and sequential circuits. Sequential circuits are the storage cells with outputs that reflect the past sequence of their input values, while output of the combinational circuits depends only on the present input. Latches and flip flops are the commonly used storage elements.
This paper is divided into 4 parts. The first part of the paper will discuss the advantages and disadvantages of latches compare to Flip-Flop. The next part describes some unique properties of latches that make them useful in high-frequency design. Third part of the paper will talk about the timing analysis complexities for latch-based design and how to deal with this complexity during the course of design. Finally, the paper will discuss challenges with latch-based design to hierarchical timing closure and partitioning, and some solutions.
Latches Vs Flip Flops
A latch is a level-sensitive storage cell that is transparent to signals passing from the D input to output Q when enabled, and that holds the values of D on Q as of the time enable goes False. The enabled state is also called transparent state. Depending on the polarity of the enable input, we call latches positive-level or negative-level.In comparison, the flip-flop is an edge-triggered device that changes state on the rising or falling edge of an enable signal such as a clock. In a rising-edge-triggered flip-flop, the flip-flop samples its input state only at the rising edge of the clock. This sampled value is then maintained until the next rising edge of the clock. Generally designers prefer flip flops over latches because of this edge-triggered property, which makes the behavior of the timing simple and eases design
interpretation.Latch-based designs have small die size and are more successful in high-speed designs where clock frequency is in GHz. In flipflop-based high speed designs, maintaining clock skew is a big problem, but latches ease this problem. Hence use of flipflops can limit the design performance, in cases where the frequency of the design is limited by the slowest path. Also when process variation is considered, latch-based design is dramatically more variation-tolerant, resulting in a better
yield and/or allowing a more aggressive clocking than the equivalent design with flipflops.
Time Borrowing with Latches
The unique property which enables above advantages is time borrowing. A
level-sensitive latch is transparent for the duration of an active clock pulse. Time borrowing technique can relax the normal edge-to-edge timing requirements of synchronous designs. A combinational path which is long enough and is determining the maximum frequency of the design can borrow some time from a shorter path in subsequent latch-to-latch stages to meet its timing (Figure 1).
Figure 1. Timing paths in a sample circuit
The figure 1 has 2 timing paths: Path 1 from the positive-triggered register (1) through logic A, to a negative-level latch (2), while Path 2 is from the latch, through logic cloud B, to a positive edge triggered register (3). Let us examine this simple example to illustrate borrowing time to compensate for the delay through the logic cloud A. Depending on the delay incurred by the logic A in Path 1, we can have two scenarios of timing analysis that will decide the time we can borrow
(Figures 3 and 4.)
Figure 2. Timing relationships set by the system clock
In Case A, data arrives from logic A at Latch 2 before the falling edge of the clock at the Latch. In this case, the behavior of the latch is similar to that of a flipflop, and analysis is simple. We do not need to borrow any time to achieve our timing goal.
Figure 3. When Logic A is fast enough, no borrowing is necessary
In Case B, the negative clock edge enables the latch before the arrival of the signal from logic A at the input of the latch. So the latch will go to transparent mode and transmit an undefined state from Logic A through to Register B for a while. But that is fine. What matters is that the new state from Logic A reaches Logic B and passes through it in time to meet the set-up requirements of Register 2. So if the propagation delay of Logic B is short, we can, in effect, let Logic A
have some of the time reserved for Logic B, and the circuit will still work. We say that Logic A borrows this extra time in order to complete its propagation delay. When path 2 is timed, the timing analysis will consider the end of the borrowed time as the start point for analyzing Logic B’s delay.
Figure 4. Logic a borrows time from Logic B
While doing STA, Timing reports will be generated according to Case A and Case B. In reality, the timing when the latch is enabled is the same as if the latch were simply a transparent delay element (Figure 5).
Figure 5. When the latch is enabled, it essentially becomes a passive delay
Time-borrowing in OCV/Xtalk timing analysis
In an ideal scenario, time given to the startpoint should be equal to the time borrowing of the latch. But as the technology is shrinking, there are on-chip variation (OCV), signal-integrity, and other uncertainty factors that come into the picture. To make the analysis more accurate, we also use common-path pessimism removal (CPPR). These factors make the relationship between time borrowing and time given to startpoint a bit complex. As a result timing analysis of latches becomes more challenging.
Let’s consider the case B again. Including the above factors, there is an interesting relationship that comes out between time borrowing and time given to start point. The variables which make it interesting are clock uncertainties, clock path pessimism due to OCV, clock derates etc.
If U is the uncertainty in clock timing considered while timing Path 1, C is the CPPR between register and latch, And T is the time borrowed by Path 1 while constraining logic A then, while timing Path 2.
Applied uncertainty in Path 1 is uncertainty for the clock path of the latch, which actually won’t be the part of pessimism when latch is transparent. So we remove that pessimism about latch clock uncertainty from the start time. Similarly, pessimism due to CPPR is recovered in the time given for the start point, as we consider the same path type (early or late) of latch launch clock path in Path 2. If we have applied clock derates in the design, while timing Path 2 for setup, instead of late, early derates will be considered for the clock path of the latch, to make it again the same as the capture clock path of the latch in Path 1.
Normally EDA tools show a very pessimistic behavior while timing Path 2, because they don’t consider clock path pessimism removal, But applying that pessimism is not correct. CPPR of path 1 is also removed while calculating time given to startpoint. Again retaining this pessimism will not be correct. In actuality, while the latch is transparent, it would be acting as combinational cell, and CPPR should be considered between the start point of path1 and the endpoint of path2. But tools here show extra-pessimistic results by not considering the correctness
of any pessimism.
Another approach which can be considered here is, considering the least value between CPPR of Path 1 and Path 2. This won’t be the most accurate way, but it gives another level of pessimism removal which can be justified by following statement.
Comparing the common clock path of the register and the latch in timing Path 1 vs. the common clock path of the latch and the endpoint—the second register—in timing Path 2 can give an idea of the minimum possibility of the clock path between the register and the final endpoint.
The least preferred, most accurate and the best way to judge the timing of latches (once it is made sure that latch will be transparent while timing this path) is to make the latch transparent by putting a case analysis on the enable pin of the latch. After this, the EDA tool will be able to time the two segments as one complete path. The reason we prefer this least is that the latch may not always be transparent while timing Path 1 in the best-case condition where time borrowing is not needed. And the tool will miss all the paths which do not require time borrowing and hold time check at the latch endpoint.
Partitioning challenges with Latch Based Design
Other than the above explain Timing challenges, there are few challenges that occur in hierarchical design when the blocks have latch-based interfaces. The timing tools will require some help to understand when it is possible to borrow time across a block boundary.
Figure 6. A block interfaces to external latches
The first challenge is to enable time borrowing for the ports which are budgeted for timing. While timing a block, the ports that are coming from or going to latches at the top level of the SOC can be modeled by using their proper IO delays and the level_sensitive option in the EDA tool. Consider the case for Path 2 (Figure 6). Without the level_sensitive option, this path could be critical at block level. With the output delay at the output port defined with the level_sensitive option, the timing tool can borrow time from the input stage of the next block, and this will relax the timing on the output port.
Figure 7. A block using latches interfaces to external registers
Next, consider the case (Figure 7) when the latches are inside the block instead of outside it. For the path 1, there is nothing special to do for closing the block but
the designer must define the all types of clock latency—rise/fall and min/max—for the block pin CLK. This will help to correctly calculate the time assigned to the start point based on OCV and CPPR. By doing this there will be no surprises when the block is merged at top level.
Another challenge arises while using the timing models for top level execution. Our expectation would be enabling time borrowing through boundary latches. This can be done using grey box extracted timing model (ETM) models, which preserves the boundary latch while generating ETM library models.
Latches are very beneficial for high speed SoC designs but their use adds challenges in static timing analysis, especially with hierarchical design. Limitations of current EDA tools increase the complexities of latch-based design. Latches can be used more widely in SoCs by some careful analysis and applying some of the techniques listed in this paper. These techniques may not completely remove the complexities but definitely can reduce them.