# A Low-Power 0.13µm CMOS OC-48 SONET and XAUI Compliant SERDES

R. Wadhwa, A. Aggarwal, J. Edwards, M. Ehlert, J. Hoehn, G. Miao, K. Lakshmikumar, J. Khoury

Multilink Technology Corporation Somerset, NJ, USA

## Abstract

The design of a continuous rate octal 1.0 to 3.2 Gb/s serializer/deserializer circuit that meets SONET and XAUI requirements is presented. The performance of the SERDES surpasses stringent OC-48 jitter generation and tolerance specifications. This is achieved with the use of a master-slave PLL tuning scheme and meticulous attention to layout and isolation techniques. Implemented in a 0.13 $\mu$ m digital CMOS technology, the part exhibits less than 5mUI r.m.s. jitter and the 1.2 mm<sup>2</sup> transceiver dissipates 160mW.

## Introduction

The building of networks to handle voice, data and Internet Protocol traffic has created a great need for multi-gigabit rate transceivers. This paper describes the design of an octal Serializer / Deserializer (SERDES) macro that covers a data rate ranging from 1.0 to 3.2 Gb/s. At 2.488 Gb/s the device exceeds OC-48 SONET jitter generation and jitter tolerance specifications. Novel design techniques are used in the chip, particularly in the phase lock loops (PLLs) of the transmitter and receiver circuits, to achieve low jitter generation in a harsh ASIC environment while dissipating low power. The device is implemented in a standard 0.13µm digital CMOS process and has been laid out for integration with flip-chip compatible ASICs.

The block diagram of the SERDES is shown in Figure 1. A single master-slave transmit PLL drives all eight transmitters to minimize power dissipation and to facilitate synchronization of the transmitters. In the receive direction, each channel has an independent analog PLL based clock and data recovery (CDR) block followed by a demultiplexer. Use of analog CDRs eliminates the need to route multi-phase high-speed clocks across eight receiver channels as would be required by digital or analog phase rotator approaches [1,2]. The eight receive CDRs share a common receive master PLL to achieve low jitter generation as will be described later.

The octal SERDES contains programmable registers to configure and monitor the transceivers as well as activate modes for production testing and characterization. Since a high-speed low-jitter clock is central for any SERDES implementation, the architecture of a PLL to generate such a clock is described first in section I. The serializer is described in section II, and the deserializer in section III. Layout considerations for flip-chip packaging are discussed in section IV followed by experimental results in section V and conclusions in section VI.

## I. PLL Architecture

One of the biggest challenges in the PLL design is to achieve low jitter while maintaining a wide lock range from 1.0 to 3.2 Gb/s over all process conditions and temperatures. In any CMOS process this requirement is challenging, but in 0.13 $\mu$ m technology the challenge is even greater because the voltage tuning range that can be applied to the voltage-controlled oscillator (VCO) is less than 1 V. A wide lock range requires the VCO to have a large gain, however, low jitter generation necessitates a small VCO gain. These conflicting requirements can be solved in many different ways.

One approach is to use a digitally programmable VCO as discussed in [3] to perform a coarse tuning at power up to remove process variations that affect the center frequency of the VCO. However, the gain of the VCO must still remain relatively high to accommodate temperature variations.

In this design, a continuous master-slave calibration scheme is The linearized model of the master-slave PLL used. technique is depicted in Figure 2. The master PLL shown on top is designed to have a very high gain VCO so that it can lock over the entire 1.0 - 3.2 GHz frequency range for all process, temperature and power supply conditions. The tuning signal of the master VCO serves as the pedestal or continuous coarse tuning control for the slave VCO, which is similar to the master VCO but with a lower gain. The numerical subscripts 1 and 2 in Figure 2 and the equations below refer to the master and slave, respectively. The pedestal signal from the master places the slave VCO at nearly the correct operating frequency. Thus, the slave PLL only needs to fine tune the VCO via a low gain path. The gain of the slave VCO can then be made much smaller than that of the master VCO. The effect of any noise from the master affecting the slave can be analyzed with the help of Figure 2. Perhaps the single largest source of noise in a ring oscillator based PLL is the phase noise of the VCO indicated as  $\theta_n$ . The pedestal output signal from the master PLL will have a noise component  $V_{n1}$  primarily from the inherent phase noise  $\theta_{n1}$  of the master VCO. The resulting transfer function is

$$\frac{V_{n1}}{\theta_{n1}} = \left[\frac{s}{K_{o1}}\right] \left[\frac{2\xi_1 \omega_{n1} s + \omega_{n1}^2}{s^2 + 2\xi_1 \omega_{n1} s + \omega_{n1}^2}\right]$$
(1)

# 26-1-1

IEEE 2003 CUSTOM INTEGRATED CIRCUITS CONFERENCE

where,  $K_{o1}$ ,  $\omega_{n1}$  and  $\xi_1$  are the VCO gain, natural frequency and damping coefficient of the master loop.

The overall transfer function from the master VCO phase noise to the output of the slave is,

$$\frac{\theta_{o2}}{\theta_{n1}} = \frac{K_{o2}}{K_{o1}} \frac{s^2}{s^2 + 2\xi_2 \omega_{n2} s + \omega_{n2}^2} \frac{2\xi_1 \omega_{n1} s + \omega_{n1}^2}{s^2 + 2\xi_1 \omega_{n1} s + \omega_{n1}^2}$$
(2)

Equation (2) shows that the transfer function is a product of a high-pass and a low-pass response. If the low-pass corner frequency is lower than that of the high-pass corner frequency, the contribution of  $\theta_{n1}$  will be heavily attenuated. On the other hand, if the low-pass has a higher corner frequency than the high-pass,  $\theta_{n1}$  will be shaped by a small band-pass region and attenuated by  $K_{o2} / K_{o1}$ . Further attenuation of phase noise and other sources of noise on the pedestal signal can be achieved by adding a low-pass filter along the path from the master to the slave as shown with dotted lines in Figure 2. As the experimental results below will show, the master-slave PLL arrangement provides a clock with extremely low jitter. The area and power penalty of the master PLL is small when amortized over multiple channels.

#### II. Serializer

The transmit path consists of a serializer and a high-speed clock source provided by the frequency synthesizing PLL (FSPLL) as shown in Figure 3. The PLL structure uses a master-slave PLL approach as previously described. In this octal design, the eight transmitters share a single master-slave FSPLL. In Figure 3, the slave PLL shown is a conventional design using a phase-frequency detector (PFD), charge pump, resistor-capacitor loop filter and a voltage-to-current converter to drive the three-stage ring oscillator. The master PLL, not shown, provides a pedestal signal that is scaled by  $\beta$  and added to  $\alpha$  times the slave control signal. In this particular design, the effective gain of the slave VCO was reduced by a factor of five.

The serializer is implemented with a tree of 2:1 multiplexers as shown in Figure 3 to create the serialized data stream. Power savings occur in this tree structure because the final 2:1 multiplexer operates on both edges of a half-rate clock. Further, each preceding level of multiplexing employs a clock that is half the rate of the following one. This half-rate serializer architecture is perhaps the best compromise between speed and performance [4]. The one risk of using a half-rate architecture is that any duty cycle distortion in the clock introduces deterministic jitter on the output serial data. For this reason, a duty cycle correction circuit has been added in the clock path. Experimental results show no deterministic jitter due to the half-rate approach. If the bandwidth of the medium on which the serial data is transported is low, inter-symbol interference (ISI) will be introduced resulting in partial closure of the data eye at the receiver. To alleviate ISI, a programmable first-order symbolrate pre-emphasis function has been added. The transfer function of the equalizer is

$$H(z) = 1 - \lambda z^{-1} \tag{3}$$

where  $\lambda$  can be programmed from 0 to 0.375. As shown in Fig. 3 a scaled version of the previous data bit is subtracted from the present one in the final stage of the output buffer.

## III. De-Serializer

The block diagram of the receiver is shown in Figure 4. The CDR approach uses a master-slave PLL structure as previously described to minimize the CDR's jitter generation. All eight receivers share the master PLL and the eight CDRs are slaves. The master produces eight copies of the pedestal signal, in this case a current. During frequency acquisition, the CDR loop is configured as a frequency synthesizer using a conventional PFD and locks to a reference clock that is provided externally. Once the VCO frequency is within 250 ppm of the desired value, the data phase-detector (PD) replaces the PFD and the CDR locks to the data stream. The PD is a classical Alexander type circuit [5], but operating at 1/4 the data rate. The VCO generates eight equally spaced phases that samples 4 data bits at their transitions and center regions. Early/late decisions are filtered and applied to the VCO to track the data. Simultaneously, four data bits are retimed by the PD thus forming the first step of the deserialization process. The second step consists of a 4:8 demultiplexing. This 1/4<sup>th</sup> rate architecture was chosen to minimize power dissipation.

The received data is terminated on chip and then applied to the input buffer. Although an on-chip resistor may not be as precise as an external one, the advantage of having a termination on chip far outweighs any small degradation in return loss. The bond wire and/or solder bump inductance with the capacitance of ESD structures form an LC low-pass filter that introduces ISI. Since the on-chip terminating resistor appears in parallel with the parasitic capacitance it reduces the quality factor of the LC filter and improves the frequency response and hence reduces ISI.

The input buffer has to have a high gain-bandwidth product to amplify the incoming data before applying to the CDR. To minimize power dissipation and to provide some equalization, active inductor loads are used rather than resistors as shown in Figure 5. The output impedance of the buffer can be approximated as

$$Z(s) = \frac{1 + sRC_{gs2}}{s^2 RC_{gs2} C_L + s(C_{gs2} + C_L) + g_{m2}}$$
(4)

For nominal circuit parameters, the transfer function exhibits a zero and a pair of complex conjugate poles resulting in a small amount of boost near the band edge. This reduces the rise and fall times of the data at the CDR input thus improving timing jitter and jitter tolerance. The simulated worst-case bandwidth of the buffer is 4 GHz and the passband gain is 8 dB.

### IV. Chip Layout

The octal SERDES macro is laid out for flip-chip packaging to be compatible with large digital VLSI devices. A flip-chip layout imposes certain constraints that normally do not affect conventional wire-bond layouts. The solder bumps in a flipchip device are placed over the entire area of a chip in a regular array to provide mechanical rigidity to the package and to prevent any voids in the underfiil material. Although solder bumps only use top-level metal, to minimize sources of crosstalk and mismatch no circuitry was placed under the bumps since metal coverage over active devices degrades transistor matching [6]. As a result, the total area occupied by the SERDES was modestly increased. Also, to minimize channel-to-channel crosstalk, every transmitter and every receiver was provided separate solder bumps for power supply connections. The resulting layout is pad limited.

## **V. Experimental Results**

The overall octal macro is 2 mm x 6.75 mm. A test chip was fabricated with two transceivers to validate the design. The chip photo is shown in Figure 8. The solder bumps in the flip-chip layout were bussed on chip to pads for wire bond connections to the test package. The part was tested up to a data rate of 3.9 Gb/s. The transmit eye diagram for  $2^{31} - 1$  PRBS data, shown in Figure 6 has an r.m.s. jitter less than 3.5ps and the peak-to-peak jitter of 34 ps. The generated jitter of the transmitter, when integrated over a band from 12kHz to 10MHz, is 1.8 psec r.m.s., far exceeding SONET OC-48 requirements. A plot of the receiver's jitter tolerance at a bit error rate of 1e-11 far exceeds the OC-48 template as shown in Figure 7. Finally, the receiver operated without errors with a  $2^{31} - 1$  PRBS data input modulated by a 5MHz jitter source to close the eye by 80% at 3.125 Gb/s.

### VI. Conclusions

The design of an octal SERDES that exceeds OC-48 requirements as well as XAUI specifications is fundamentally tied to obtaining low jitter generation in the transmit and receive PLLs. The use of the master-slave PLL technique to lower the gain of the VCOs was key to achieving the performance obtained.

#### Acknowledgements

The authors would like to thank Ruben Recinos, Jonas Baublys, Carlos Carvalho, Jose Matos and Norbert Love for the physical layout and Robert Schell for help with the experimental results. They also thank Phil Johnson and Gong Gu for their contributions.

#### References

- D. Zheng, X. Jin, E. Cheung, M. Rana, G. Song, Y. Jiang, Y-Hsutu, B. Wu, "A Quad 3.125 Gb/s/Channel Transceiver with Analog Phase Rotators," *ISSCC 2002*, pp. 70-71,445.
- (2) F. Yang, J. O'Neill, P. Larsson, D. Inglis, J. Othmer, "A 1.5V 86 mW/ch 8-Channel 622-3125Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ration," *ISSCC 2002*, pp. 68-69, 445.
- (3) W. Wilson, U-K Moon, K. Lakshmikumar, L. Dai, "A CMOS Self Calibrating Frequency Synthesizer," *IEEE Journal of Solid-State Circuits*, vol. 35, No. 10, pp. 1437-1444, Oct. 2000.
- (4) J. Khoury, K. Lakshmikumar, "High-Speed Serial Transceivers for Data Communication Systems," *IEEE Communications Magazine*, vol. 39, No. 7, pp. 160-165, July 2001
- J.D.H. Alexander, "Clock Recovery from Random Binary Signals," Electronics Letters, vol.11, pp.541-542, October 1975.
- (6) H. Tuinhout, M. Pelgrom, R. Penning de Vries, M. Vertregt, " Effects of Metal Coverage on MOSFET Matching," *IEEE Electron Devices Meeting*, Dec. 1996, pp. 735-738.

| TABLE I                 |         |
|-------------------------|---------|
| IMMARY OF EXPERIMENTAL. | RESULTS |

SI

| Parameter                      | Experimental value     |
|--------------------------------|------------------------|
| Data rate range                | 1.0 – 3.9 Gb/s         |
| Input sensitivity at 1e-12 BER | 50 mV differential p-p |
| Transmitter Jitter Gen.(OC-48) | 4.47 mUI r.m.s.        |
| Transmit data eye opening      | 0.89 UI @ 3.125 Gb/s   |
| Minimum receiver eye opening   | 0.20 UI _ @ 3.125 Gb/s |
| Active area (per transceiver)  | 1.2 Sq. mm             |
| Power supplies                 | 1.2 V, 1.8 V           |
| Transceiver Power dissipation  | 160 mW @ 3.125Gb/s     |



Figure 1: Chip Block Diagram



Figure 2: Master-Slave PLL Architecture



Figure 3: Transmitter Block Diagram



Figure 4: Receiver Block Diagram



Figure 5: Receiver Input Buffer



Figure 6: Transmitter data eye for PRBS 2^31-1 patter at 3.2 Gb/s (vertical: 100 mV/div, horizontal: 50 ps/div 34 ps pp, 3.5 ps r.m.s.)





Figure 7: Receiver Jitter Tolerance for OC-48 applications

Figure 8: Chip Microphotograph