



# **L15: Custom and ASIC VLSI Integration**



#### **Acknowledgement:**

J. Rabaey, A. Chandrakasan, B. Nikolic, "Digital Integrated Circuits: A Design Perspective" Prentice Hall, 2003.

**Curt Schurgers** 



# **Moore's Law**





In 1965, Gordon Moore was preparing a speech and made a memorable observation. When he started to graph data about the growth in memory chip performance, he realized there was a striking trend. Each new chip contained roughly twice as much capacity as its predecessor, and each chip was released within 18-24 months of the previous chip. If this trend continued, he reasoned, computing power would rise exponentially over relatively brief periods of time.





# Layout 101







# **Custom Design/Layout**





To register files / Cache

Hand crafting the layout to achieve maximum clock rates (> 1Ghz)
Exploits regularity in datapath structure to optimize interconnects



# The ASIC Approach







## Standard Cell Example





 Each library cell (FF, NAND, NOR, INV, etc.) and the variations on size (strength of the gate) is fully characterized across temperature, loading, etc.



# **Standard Cell Layout Methodology**



#### 2-level metal technology



#### **Current Day Technology**



Cell-structure hidden under interconnect layers

- With limited interconnect layers, dedicated routing channels between rows of standard cells are needed
- Width of the cell allowed to vary to accommodate complexity
- Interconnect plays a significant role in speed of a digital circuit



## Verilog to ASIC Layout (the push button approach)





L15: 6.111 Spring 2004



# The "Design Closure" Problem





# Wire-to-wire capacitance causes inter-wire delay dependencies









#### Iterative Removal of Timing Violations (white lines)





256×32 (or 8192 bit) SRAM Generated by hard-macro module generator

|  | 5644 |
|--|------|
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |
|  |      |

 Generate highly regular structures (entire memories, multipliers, etc.) with a few lines of code

 Verilog models for memories automatically generated based on size



# **Clock Distribution**





For 1Ghz clock, skew budget is 100ps. Variations along different paths arise from:

- Device: V<sub>T</sub>, W/L, etc.
- Environment: V<sub>DD</sub>, °C
- Interconnect: dielectric thickness
   variation

**Clock skew, courtesy Alpha** 



**IBM Clock Routing** 



# The Power Supply Wires are Not Ideal!





# The IR-drop problem causes internal power supply voltage to be less than the external source

#### **Courtesy David Blaauw**

## Analog Circuits: Clock Frequency Multiplication (Phase Locked Loop)



PLL





- Divider divides down VCO frequency
- Loop filter ⇒ extracts phase error information

#### Used widely in digital systems for clock synthesis (a standard IP block in most ASIC flows)

#### **Courtesy M. Perrott**





- There are a large number of implementations of the same functionality
- These implementations present a different point in the area-time-power design space
- Behavioral transformations allow exploring the design space a high-level

#### **Optimization metrics:**

- 1. Area of the design
- 2. Throughput or sample time T<sub>S</sub>
- 3. Latency: clock cycles between the input and associated output change
- 4. **Power** consumption
- 5. Energy of executing a task









# $\begin{array}{c} \textbf{Conventional Multiplication} & X_3 & X_2 & X_1 & X_0 \\ Z = X \cdot Y & & & \frac{Y_3 & Y_2 & Y_1 & Y_0}{X_3 \cdot Y_0 & X_2 \cdot Y_0 & X_1 \cdot Y_0 & X_0 \cdot Y_0} \\ & & & X_3 \cdot Y_1 & X_2 \cdot Y_1 & X_1 \cdot Y_1 & X_0 \cdot Y_1 \\ & & & X_3 \cdot Y_2 & X_2 \cdot Y_2 & X_1 \cdot Y_2 & X_0 \cdot Y_2 \\ & & & \frac{X_3 \cdot Y_3 & X_2 \cdot Y_3 & X_1 \cdot Y_3 & X_0 \cdot Y_3}{Z_7 & Z_6 & Z_5 & Z_4 & Z_3 & Z_2 & Z_1 & Z_0 \end{array}$

**Constant multiplication (become hardwired shifts and adds)** 



Transform: Canonical Signed Digits (CSD)

Canonical signed digit representation is used to increase the number of zeros. It uses digits {-1, 0, 1} instead of only {0, 1}.





# **Algebraic Transformations**















#### **Retiming is the action of moving delay around in the systems**

Delays have to be moved from ALL inputs to ALL outputs or vice versa



**Cutset retiming:** A cutset intersects the edges, such that this would result in two disjoint partitions of these edges being cut. To retime, delays are moved from the ingoing to the outgoing edges or vice versa.

Benefits of retiming:
Modify critical path delay
Reduce total number of registers



# **Retiming Example: FIR Filter**





<u>Note:</u> here we use a first cut analysis that assumes the delay of a chain of operators is the sum of their individual delays. This is not accurate.



#### Pipelining, Just Another Transformation (Pipelining = Adding Delays + Retiming)







# **The Power of Transforms: Lookahead**







# **Scan Testing**







### Trends: "Chip in a Day" (Matlab/Simulink to Silicon...)





#### Map algorithms directly to silicon - bypass writing Verilog!

#### **Courtesy of R. Brodersen**

# Trends: Watermarking of Digital Designs



**Fingerprinting** is a technique to deter people from illegally redistributing legally obtained IP by enabling the author of the IP to uniquely identify the original buyer of the resold copy.

The essence of the **watermarking** approach is to encode the author's signature. The selection, encoding, and embedding of the signature must result in minimal performance and storage overhead.



same functionality, same area, same performance watermark of 4768 bits embedded

#### (courtesy of G. Qu, M. Potkonjak)