Synthesis challenges with scan compression logic

by Abhishek Mahajan, Amol Agarwal and Vikramjeet Singh, Freescale Semiconductor India Ltd. , TechOnline India - August 10, 2011

Scan chains are designed for testing, connecting sequential elements of a chip in serial order. But, diminishing geometries and rising complexity in SoCs result in millions of transistors on to a single chip and so, the ratio of total number of sequential elements to total number of scan IOs available is increasing. So conventional scan structures are insufficient to support complex SoCs due to huge tester cost. Compression logic is developed as solution to above problem but it poses new challenges in scan stitching. We discuss these challenges and solutions to them.

All modern day SOCs use scan structures to detect any manufacturing faults in design .Scan chains are designed for testing, connecting sequential elements of chip in serial order. However, ever diminishing geometries and increasing complexities in modern days SOCs result in millions of transistors into single chip. As a result, the ratio of total number of sequential elements to total number of scan IOs available is continuously increasing.

Thus, conventional scan structures are not sufficient to support these complex SOCs because of huge tester cost (tester time). Compression logic is developed as solution to above problem but it imposes new challenges in scan stitching during logic synthesis stage. We will discuss these challenges and solution to these problems in scan stitching in detail but before this let us understand need of compression logic using an example.

Figure1, below, shows the simple scan structures where sequential elements are stitched in serial order.

 


 

 

 

                                                                       Figure 1

 
 
To discuss the various parameters that govern the scan structure, Consider the design having following configuration:-

Available tester memory = 1 Meg Vector per tester channel (Fixed
memory available on tester)

Available Scan Input/Output ports = 5 +5
Number of flip flops per chain = 200 (Total flops = 1000)
Number of patterns required to test the design completely = 2400

Therefore tester memory required = 200 * 2400 ~= 0.48 Meg per tester channel.

 
In the above case, the required tester memory is less than the available memory. So, the design is testable. But with the increase in size of modern day SOCs (i.e. increase in count of sequential elements in the design), the available tester memory may not be sufficient.  Consider another design with 20k flops and more scan IOs available on top.

Available tester memory = 1 Meg Vector per tester channel

Available Scan Input/Output ports = 10 + 10 (limited number of test pins on package)
Number of flip flops per chain = 2000 (Total flops = 20000)
Number of patterns required to test the design completely = 2400 (This is the most conservative number as number of patterns will also increase with increase in design size)

Therefore tester memory required = 2000 * 2400 ~= 4.8 Meg per tester channel.

 
In the above example, since the patterns do not fit in to the available tester memory the simple scan structure shown above is not sufficient to completely test the design. This problem can be solved by scan compression.
 

Compression logic concept


Compression Logic emerged as a solution to the problem of testing the chip for manufacturing related faults. In this structure chip level chains are divided in to internal chains, thus solving the problem of having  multiple scan in and scan out ports at chip level by compressing the stimulus (scan-in) and decompressing the result (scan out). From here onwards, we will refer to the name of this compression and decompression logic as CDL.

 

                            

                                                                  Figure 2:  CDL 

 

The compression logic solves the problem of larger tester memory as shown below: -

  (Compression factor = 10)

 Available tester memory = 1 Meg Vector per tester channel (Fixed memory available on tester)
Available Scan In ports = 10    (limited number of test pins on package)
Available Scan Out ports = 10 (limited number of test pins on package)
Total number of scan chains = 100
Number of flip flops per chain = 200 (Total flops = 20000)
Number of patterns required to test the design completely = 2400

Therefore tester memory required = 200 * 2400 ~=0. 48 Meg per tester channel.

As the required tester memory is less, the design is now testable.
 

What is the problem?

Scan compression logic is must to have feature in all modern complex SOCs. However introduction of this scan compression logic introduces new challenges in scan stitching during logic synthesis stage. As shown in Figure 3, Scan Chains are stitched from the output of CDL (scan-in pin) to the input of CDL (scan-out pin). The scan-in pin of compression logic is connected to scan input of flop.                                             

                              

                                                            Figure 3

 

As per the requirement of conducting successful DFT checks, only 1 capture should happen per window. Violating this condition would lead to lead to drop in test coverage because all flops are not independently controllable. For all the discussions in this article, we have considered the window as shown in Figure 4.

 

                             

                                                                                                              Figure 4

 

When the scan chains are stitched, HOLD violations may occur as flops of CDL and rest of the design may be clocked by clocks of different clock domain ( due to different functional clock domains in the design ) which may have wide uncommon clock path. To consider all the possible violations, following cases are listed with reference to Figure 5.

 

                              

                                                                                                       Figure 5: Launch-capture flop

 

 

 

                                                                                                             Figure 6

 

Therefore, violations will happen when:

1. Clocks reaching the launch and capture flops are skewed, this happens for Posedge-Posedge and Negedge-Negedge pair of flops in following 2 scenarios. This has been shown in Figure 6.

a) Flops stitched in a scan chain are clocked by same clock. It may happen that because of skew between the clocks reaching the launch and capture flop, clock reaches at capture flop quite late after it reaches at launch flop.

b) Two flops, of different clock domains in a scan chain are clocked at different clocks and due to OCV there may be enough skew between the clocks tht launch and capture occur in the same window. This is the most probable case when one of the flops is inside CDL and the other flop is outside it.       
                                                                                                                            2. Launch and Capture is done in single window. This case arises when launch flop is posedge flop and capture flop is negedge flop. In this case even if there is no skewing between the clock edges, 2 captures will happen in a single clock cycle (Figure 6). Since scan stitching is done after logical synthesis, so at the time of CDL coding the designer do not about the nature of the first or last flop (positive or negative edge triggered flop) in the scan chain.

What are the various techniques available?

Designers may employ various techniques to circumvent this problem. Some of them are described below:

1. Custom CDL: In this method, scan chains are stitched with a dummy CDL and depending on the first flop of scan chain CDL is modified so as to ensure that there are no violations. CDL is then synthesized separately and merged with the netlist created earlier.          

Advantage: This methodology will add no lockup flops at the boundary of CDL as CDL is configured as per the scan stitching every time.

Disadvantage: With the implementation cycle progressing, new flops are added and CDL has to be modified every time.
 
2. Feedback Method: In this method, Number of posedge and negedge flops in the design is calculated and then CDL is generated depending upon number of scan chains. Following example illustrates how this method is different from Custom CDL method. Consider a design that has 4000 flops – 3700 posedge flops and 300 negedge flops. Now scan chains (~100 flops/chain)are stitched and distribution (shown in Figure 7) is obtained.

 

                            

                                                                                        Figure 7

 

Now CDL will be generated for flop combination such that there is no posedge-negedge flop pair at the CDL interface and synthesis will be forced to accommodate the flops in scan chains as per the RTL. This can be done using some scripts.

Advantage: The advantage of this method over custom CDL method is that whole synthesis is done in one run.

Disadvantage: Once the feedbacks are implemented then RTL of CDL is more or less stable but in case there is abrupt change in the number of negative edge
triggered flops then this whole cycle has to be repeated again.  

3. Using flop already existing in Design: In this method, RTL of CDL does not change and every time synthesis is done with the same CDL. After scan stitching, scan chains are reordered so as to eliminate chances of violations.

Advantages: Even if there is abrupt change in number of flops with new RTL release, DFT team does not have to create a new CDL as CDL code is fixed.

Disadvantages: Efficient method compared to Custom CDL and Feedback Methods but there may be coverage loss when the scan chains are re-ordered to eliminate violations. “Coverage Loss” can be understood by following example:

 

 

                              

                                                             Figure 8                                                       

 

Whenever posedge to negedge flops are paired in that order, launch and capture violation occur as launch and capture would happen at edges 2 and 3 as shown in Figure 8. To remove this violation we can re-order or add a flop between these 2 flops. Though this will remove earlier violation but we will not be able to check the data received at the inserted flop and this would result in coverage loss.

4. Dummy Flop addition: This method nails all the disadvantages present in above methods. In this method, a dummy lockup flop is added wherever violations are expected.  In this case there is no effect of coverage loss being added and of problem discussed in Moving Design Flop methodology. 

Advantages: Highly efficient method as there is no extra effort involved for DFT and Synthesis team. Also, the problem of coverage loss is resolved.

Disadvantages: Extra cells are added and this can be a hindrance in extremely acute cases where power is so critical that leakage of these few cells contributes
significantly to total leakage of design.
 

Conclusion: The below table captures the summary of the various techniques discussed :

 

                            



Scan compression logic is must to have feature in complex SOCs. Adding compression logic adds challenges in scan stitching during synthesis. There are
many methodologies available to handle this challenges but Dummy Flop addition
methodology certainly has an edge over other methodologies. Though there is some trade-off as lockup flops are added. But we have observed that number of extra lockup flops are very few (<<0.1 % of total sequential elements) as lockup flops are added in only those chains where Capture Violations are expected. The above mentioned approach will help in minimum iterations between DFT and synthesis design teams and thus will result in faster design closure.

 

About the authors:

Abhishek Mahajan and Amol Agarwal are senior design engineers while Vikramjeet Singh is a design engineer, all of them with Freescale Semiconductor India Ltd.

Comments

blog comments powered by Disqus