# A Novel SST Transmitter with Mutually Decoupled Impedance Self-Calibration and Equalization

Shuai Chen<sup>1,2,3</sup>, Liqiong Yang<sup>1,3</sup>, Hua Jing<sup>1,2,3</sup>, Feng Zhang<sup>1,3</sup>, Zhuo Gao<sup>1,3</sup>

<sup>1</sup>Institute of Computing Technology Chinese Academy of Sciences Beijing, China

<sup>2</sup>Graduate University of Chinese Academy of Sciences Beijing, China {chenshuai, yangliqiong, jinghua, gaozhuo}@ict.ac.cn 3 Loongson Technologies Corporation Limited Beijing, China

*Abstract***—A low power source-synchronous source-seriesterminated (SST) transmitter (Tx) in 65 nm CMOS technology is presented. The Tx, comprised of nine data/control channels, a forwarded-clock channel and one PLL, merely dissipates 26.2 mW/channel while exhibiting a 750 mV differential eye height at 6.4 Gbps. The SST drivers can save ¾ output stage power of CML ones, and moreover, the proposed novel topology can independently control impedance self-calibration and equalization. To implement half-rate architecture, the PVTtolerant PLL provides a pair of quadrature clocks with 2.5 ps rms cycle to cycle jitters running at 3.2 GHz.** 

## I. INTRODUCTION

As the bandwidth of high speed serial links required in the processor interconnect technology such as HyperTransport [1] has increased aggressively up to 51.2GB/s, the power consumption has become a major concern of the SerDes system design. In many SerDes circuits, current mode logic (CML) drivers have been applied [2-3], whereas they have drawbacks such as the static power dissipation and the limit to provide a large range of termination voltages. Source-seriesterminated (SST) drivers can overcome these disadvantages [4], which only consume ¼ output stage power of CML drivers and support high-swing termination voltage. As well as attaining low power operation, maintaining signal integrity is another key point in SerDes circuit design. To maintain good signal integrity and minimize reflections, the driver's impedance needs to be calibrated to match the transmission line impedance in spite of process and temperature variations. Besides, forward feedback equalizer (FFE) is commonly used to mitigate the effect of the channel attenuation. However, it is still a challenge in SST driver design to implement the impedance calibration and the equalization independently and efficiently.

We present a novel SST transmitter whose impedance selfcalibration and equalization control are mutually decoupled. The pull-up and pull-down impedances can be self-calibrated respectively to tolerate all process variations. The 2-tap FFE has eight programmable settings. And the power efficiency of our transmitter is as low as 4.1 mW/Gbps/channel at 6.4 Gbps.

This paper is organized as follows. Section II provides a discussion on previous SST transmitter works. In section III, the architecture of the presented SST transmitter is described. Finally, the results and conclusions are summarized in section IV and section V.

# II. PREVIOUS WORKS

The recent SST transmitter works have been presented in  $[5-7]$ , as shown in Fig. 1. The output stage of an SST driver contains a pull-up and a pull-down branch implemented with a PMOS or NMOS transistor followed by a series polysilicon resistor. The MOS to polysilicon resistance ratio is determined by the optimum trade-off between linearity accuracy and area.



Figure 1. Previous SST transmitter architecture

As illustrated in Fig. 1(a), Philpott *et al.* [5] employed 64 selectable resistors series-connected to all SST slices to adjust the impedance, but the additional FETs brought voltage

Supported by the National Basic Research 973 Program of China (No.2005CB321600), the National High Technology Development 863 Program of China (No.2008AA110901, 2009AA01Z125), and the National Natural Science Foundation of China (No.60803029, 60673146, 60736012, 60801045).

headroom penalty [7]. Another SST driver shown in Fig. 1(b) was proposed by Menolfi *et al.* [6], which achieved impedance matching by enabling a certain number of SST slices. It controlled equalization by supplying the enabled slices with different data taps. A disadvantage of this topology was that the equalization tuning was affected by the number of the enabled slices, which meant the equalization tuning and the impedance matching were interdependent. Kossel *et al.* [7] presented an improved method based on [6], which controlled equalization inside each slice shown in Fig. 1(c). The equalization was not affected by the number of the enabled slices any more. But it had two drawbacks: 1) the impedance calibration was unable to carry out automatically; 2) both pullup and pull-down resistances were simultaneously adjusted smaller or larger, which was incorrect in some process corners such as PMOS in the fast corner but NMOS in the slow corner.

## III. TRANSMITTER DESIGN

We propose an SST transmitter which can overcome the disadvantages mentioned above, and its architecture is illustrated in Fig. 2. The transmitter contains a forwardedclock channel, nine data/control channels, one PLL and an impedance calibration cell. The forwarded-clock channel (CLK) is identical to the data/control channel (CAD/CTL) to provide the Tx skew/jitter tracking between the data and the receiver sampling clock. In each channel, the 2-tap FFE block generates differential main tap and post-cursor tap data streams; the level shifter divides the transmitter to two voltage domains for the sake of the low power. The thick-oxide SST output stage operates at 1.2 V (VDD) and the other thin-oxide devices work at 1 V core voltage.



Figure 2. Proposed transmitter architecture

The PLL provides two quadrature clocks, one of which is used to serialize the data streams and another one is sent to the receiver as a forwarded clock. The 90-degree phase difference is kept between the forwarded clock and the data to simplify the clock recovery in the receiver, which is required in [1]. The impedance calibration cell accomplishes the calibration automatically, and 10-bit pull-up/down (5-bit each) calibration codes are routed to all channels. In the following paragraphs, the key components such as the SST output stage, the impedance calibration cell and the PLL are described in details.

## *A. SST Output Stage*

As shown in Fig. 3(a), the SST output stage consists of 15 identical parallel slices which are partitioned into four segments. These slices are all enabled in our topology, which is different from the partially-enabled scheme in [6-7]. Because the total parallel output impedance maintains  $50\Omega$  to match the transmission line impedance, the single slice impedance needs to be adjusted to  $750\Omega$  which is 15 times of 50 $\Omega$ . Each slice comprises a fixed SST branch (4x) and five programmable binary-weighted SST branches (sized from 1x to 16x) whose input signals are controlled by NAND/NOR gates . The always-enabled fix branch constrains the maximum output impedance value in order to refine the calibration accuracy. The five programmable branches, which are controlled by the calibration codes U code  $0:4$  and  $D_{\text{code}} < 0.4$ , are used to adjust the slice impedance to  $750\Omega$ . These codes are obtained from the impedance calibration cell.



Figure 3. SST output stage (a) and equalization control (b)

The two-tap equalization is implemented by assigning the four segments with either the main tap or the post-cursor tap data steam, as shown in Fig. 3(b). The slice input multiplexers control the eight equalization settings. When we employ different equalization settings, the impedance of each slice is not changed and the total output impedance remains  $50\Omega$ .

Equalization tuning is achieved by controlling the slice input signals meanwhile the impedance adjustment is done inside the slice, so this architecture removes the dependency between the two functions.

#### *B. Impedance Calibration Cell*

The impedance calibration is carried out automatically during the transmitter initialization. The calibration circuit, as depicted in Fig. 4, makes use of the mirror current topology to compress the power supply noise. The resistances of pull-up and pull-down dummy branches are calibrated to  $750\Omega$ respectively to tolerate all process variations. The calibration principle is as follows: a reference current  $I_1$ , immune to PVT variations, is produced according to the external reference resistance Rext. The Ucodes/Dcodes generated by counters control the changes of  $V_{mid, 2}$ . When there is a set of code making  $V_{mid, 2}$  equal  $V_{DD/2}$ , this set is just what impedance matching needs, and these codes are latched. It is because that at this time  $I_1$  is copied by the mirrors to the pull-up/down dummy branches accurately and the resistances of the dummy branches are equivalent to  $R_{ext}$  (750 $\Omega$ ). After the dummy branches are calibrated to  $750\Omega$ , the cell is powered down to save power. The latched codes are sent to the pull-up/down branches (identical to the dummy branches) of each slice in all channels. With these codes, the total output impedance of the SST transmitter is adjusted to  $50\Omega$ .



Figure 4. Impedance calibration cell

# *C. PLL*

The architecture of the PVT-tolerate PLL is shown in Fig. 5. We use a low dropout regulator (LDO) to avoid 1.8V power supply noise for analog blocks such as the bandgap, the current array and the voltage-current converter. We also apply a current controlled oscillator (ICO) to overcome the sensitivity of the power supply noise and temperature. The switched low-pass filter (SLPF) controls the PLL switching between the calibration state and the work condition.



The PLL open-loop calibration is performed during the starting up of the PLL to remove the impact of process variation on the oscillator. The calibration flow is shown in Fig. 6. First, the PLL loops are cut off and the VCO control voltage is preset to  $V_L$  or  $V_H$ . (The values of  $V_L$  and  $V_H$  need to guarantee that the oscillator can work properly in spite of VT variation after calibration.) Then  $f_{div}$  compares with  $f_{ref}$  in the ICO calibration cell. If  $f_{div}$  is lower than  $f_{ref}$  at both  $V_L$  and V<sub>H</sub>, the ICO\_ctrl<0:4> codes add one and the curve moves to the upper one. The calibration is finished until  $f_{div}$  is lower than  $f_{ref}$  at  $V_L$  and larger at  $V_H$ , otherwise, the compare and the curve shifting are repeated. The calibration can reduce the VCO gain from 3GHz/V to 350MHz/V. The post layout simulation also shows that our PLL achieves 2.5 ps rms cycle to cycle jitter at 3.2 GHz and the power dissipation is less than 5mW.



I-CLK is sent to each CAD/CTL channel with a clock tree structure. This structure minimizes the skew between the channels caused by clock propagation delay to less than 5ps in the post layout simulation. Besides, to keep the 90-degree phase difference between I-CLK and Q-CLK, the identical distribution is applied to Q-CLK for the CLK channel.

## IV. SIMULATION RESULTS

We use S-Parameters measured by Agilent VNA to characterize the 20cm PCB trance. The transfer response  $(S_{21})$ of the trance is plotted in Fig. 7. The loss at 3.2 GHz (Nyquist frequency for 6.4 Gbps data) is 6.6 dB.

| View<br>File<br>Channel<br>Stimulus | Sweep.                                                                                                 | Calibration<br>Trace | Scale<br>Stop 9.000000000 GHz | Marker<br>System<br><b>Ford</b><br>Start | Window<br><b>Stop</b> | Help.<br>Corster |                        | Span             |
|-------------------------------------|--------------------------------------------------------------------------------------------------------|----------------------|-------------------------------|------------------------------------------|-----------------------|------------------|------------------------|------------------|
| 10.00dB/<br>-20.0dB LogM            | 30.00<br>49.921<br>20.00<br>10.00<br>0.00<br>----<br>$-10.00$<br>20.00 <sub>h</sub><br>-30.00<br>40.00 |                      |                               | $\mathbf{L}$                             | - tempora             | $-10$            | 3.2000 to GHz<br>James | 6.6155.46        |
|                                     | 50.00<br>$-0.00$<br>-70.00<br>Ch1: Start 300,000 kHz                                                   | _                    |                               |                                          |                       |                  |                        | Stop 9,00000 GHz |

Figure 7. Transfer response  $(S_{21})$  of the channel

The post layout simulation eye diagram at the far end of the trace is shown in Fig. 8. The differential eye height with – 4.4 dB equalization is approximately 750 mV and the horizontal eye opening is 0.93 UI.



Figure 8. Post-layout simulation eye diagram with -4.4dB equalization

The simulation results in different corners for the impedance calibration cell are shown in Fig. 9. Before the calibration starts, the Ucodes and Dcodes are initialized with '11111' and '00000' respectively. During the pull-down calibration,  $V_{mid}$  decreases as the Dcodes add one step by step. Once  $V_{\text{mid}}$  becomes lower than  $V_{\text{DD}/2}$ , the proper Dcodes are obtained and latched. The Ucodes are obtained in the similar way. After the calibration, Ucodes and Dcodes are set to '11111' and '00000' again to power down the cell. The calibration errors of the impedance are less than  $\pm 2\%$ .



Figure 9. Pull-up and pull-down resistance self-calibration curves

Table I shows the design summary of our transmitter and the comparison between the related works.

TABLE I. DESIGN SUMMARY AND COMPARISON

| Reference                               | $\lceil 2 \rceil^{1}$ | [5] <sup>2</sup> | $[6]^{2}$       | $[7]^{2)}$      | Our<br>work <sup>2</sup> |
|-----------------------------------------|-----------------------|------------------|-----------------|-----------------|--------------------------|
| <b>CMOS</b><br>technology               | $90 \text{ nm}$       | $65 \text{ nm}$  | $65 \text{ nm}$ | $65 \text{ nm}$ | $65 \text{ nm}$          |
| Max data rate<br>[Gb/s]                 | 10                    | 20               | 16              | 8.5             | 6.4                      |
| Differential eye<br>height [V]          | 0.9                   | 0.3              | 0.5             | 1               | 0.75                     |
| Power efficiency<br>[mW/Gbps]           | 17.4                  | 8.3              | 3.6             | 11.3            | 4.1                      |
| Area <sup>3)</sup> $\lceil mm^2 \rceil$ |                       | 0.025            | 0.013           | 0.0648          | 0.032                    |
| Mutually<br>decoupled <sup>4</sup>      |                       | N                | N               | N               | Y                        |
| Respective <sup>5)</sup>                |                       | Y                | N               | N               | Y                        |

1) CML driver, 2) SST driver

3) The area of one transmitter channel

4) Impedance self-calibration and equalization are mutually decoupled

5) Pull-up and pull-down resistances are calibrated respectively

#### V. CONCLUSION

A low power high-swing source-synchronous SST transmitter is designed in ST micro 65nm CMOS technology. It outputs a differential 750 mVpp signal over a 20 cm PCB trace running at 6.4 Gbps and the power efficiency is as low as 4.1 mW/Gbps/channel. Moreover, compared with the previous SST works, our SST transmitter can 1) self-calibrate the pull-up and pull-down impedances respectively to tolerate all process variations; 2) control the impedance selfcalibration and the equalization independently. The transmitter will be used as the HyperTransport Link physical interface in our next generation processors.

#### **REFERENCES**

- [1] HyperTransport Specification 3.10 First Release [online]. Available: http://www.hypertransport.org/docs/twgdocs/HTC20051222-00046- 0028.pdf
- [2] A. Rylyakov and S. Rylov, "A low power 10 Gb/s serial link transmitter in 90-nm CMOS," in *Compound Semiconductor Integrated Circuit Symposium, 2005. CSIC '05. IEEE*, 2005, p. 4 pp.
- [3] H. Higashi*, et al.*, "A 5-6.4-Gb/s 12-channel transceiver with preemphasis and equalization," *Solid-State Circuits, IEEE Journal of,* vol. 40, pp. 978-985, 2005.
- [4] K. Abugharbieh*, et al.*, "An Ultralow-Power 10-Gbits/s LVDS Output Driver," *Circuits and Systems I: Regular Papers, IEEE Transactions on,*  vol. 57, pp. 262-269, 2010.
- [5] R. A. Philpott*, et al.*, "A 20Gb/s SerDes transmitter with adjustable source impedance and 4-tap feed-forward equalization in 65nm bulk CMOS," in *Custom Integrated Circuits Conference, 2008. CICC 2008. IEEE*, 2008, pp. 623-626.
- [6] C. Menolfi*, et al.*, "A 16Gb/s Source-Series Terminated Transmitter in 65nm CMOS SOI," in *Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International*, 2007, pp. 446- 614.
- [7] M. Kossel*, et al.*, "A T-Coil-Enhanced 8.5 Gb/s High-Swing SST Transmitter in 65 nm Bulk CMOS With 16 dB Return Loss Over 10 GHz Bandwidth," *Solid-State Circuits, IEEE Journal of,* vol. 43, pp. 2905-2920, 2008.