

# System Integration Issues

- Communicating FSMs
- Clocking, theory and practice

# Encoding numbers

6.111 Fall 2017 Lecture 7 1 6.111 Fall 2017 Lecture 7

# 1GB RAM





# Memory Controller



#### **FSM**



6.111 Fall 2017 Lecture 7 3 6.111 Fall 2017 Lecture 7

#### Glitchy Solution

```
module (
  input req, clk,
  output reg ras, mux, cas
                                               REO
reg [3:0] state, next_state:
parameter [3:0] STATE_0 = 0; // 0000
                                               RAS
parameter [3:0] STATE_1 = 1; // 0001
parameter [3:0] STATE_2 = 2; // 0010
                                               MUX
parameter [3:0] STATE_3 = 3; // 0011
parameter [3:0] STATE_4 = 4; // 0100
                                               CAS
                                                       State 0 State 1
                                                                               State 0
always @(posedge clk) state <= next_state;
                                                                   State
                                                                            State 4
always @ * begin
                                                                      State
  case (state)
                                                                       3
     STATE 0: next state = reg ? STATE 1 : STATE 0;
     STATE_1: next_state = STATE_2;
     STATE_2: next_state = STATE_3;
     STATE 3: next state = STATE 4;
     STATE_4: next_state = STATE_0;
     default: next state = state 0;
  endcase
end
assign\ ras\ =\ !((state==STATE\_1)\ |\ |(state==STATE\_2)\ |\ |(state==STATE\_3)\ |\ |(state==STATE\_4));\\
assign mux = (state==STATE_2)||(state==STATE_3)||(state==STATE_4);
assign cas = !((state==STATE_3)||(state==STATE_4));
endmodul e
```

6.111 Fall 2017 Lecture 7

# Toward FSM Modularity

• Consider the following abstract FSM:



• Suppose that each set of states  $a_x...d_x$  is a "sub-FSM" that produces exactly the same outputs.

Lecture 7

Can we simplify the FSM by removing equivalent states?
 No! The outputs may be the same, but the next-state transitions are not.

6,111 Fall 2017

• This situation closely resembles a procedure call or function call in software...how can we apply this concept to FSMs?

Acknowledgements: Rex Min

# Registered FSM Outputs are Glitch-Free



- Move output generation into the sequential always block
- Calculate outputs based on <u>next</u> state
- Delays outputs by one clock cycle. Problematic in some application.

6.111 Fall 2017 Lecture 6 6

# The Major/Minor FSM Abstraction



- Subtasks are encapsulated in minor FSMs with common reset and clock
- Simple communication abstraction:
  - START: tells the minor FSM to begin operation (the call)
  - BUSY: tells the major FSM whether the minor is done (the return)
- The major/minor abstraction is great for...
  - Modular designs (always a good thing)
  - Tasks that occur often but in different contexts
  - Tasks that require a variable/unknown period of time
  - Event-driven systems

6.111 Fall 2017 Lecture 7

# Inside the Major FSM



#### Variations:

- Usually don't need both Step 1 and Step 3
- One cycle "done" signal instead of multi-cycle "busy"

6.111 Fall 2017 Lecture 7

#### Inside the Minor FSM



6.111 Fall 2017 Lecture 7 10

# Optimizing the Minor FSM

Good idea: de-assert BUSY one cycle early





11

A Four-FSM Example TICK START Operating Scenario: Minor FSM A Major FSM is triggered BUSY by TICK START Minors A and B are started simultaneously Minor FSM B BUSY<sub>B</sub> Major FSM Minor C is started once both A and B complete TICKs arriving before the STARTC completion of C are BUSY<sub>C</sub> ignored Assume that BUSY and BUSY R BUSY +BUSY both rise before either minor  $BUSY_A + BUSY_B$ TICK FSM completes. Otherwise, we loop forever! BUSY BUSY R  $ST_{AB}$ TICK IDLE START BUSYABUSY BUSY  $ST_c$ BUSY<sub>C</sub> START  $\mathsf{BUSY}_{\mathcal{C}}$ BUSY<sub>c</sub> 6,111 Fall 2017 12

# Four-FSM Sample Waveform



# Clocking and Synchronous Communication



#### Clock Skew



# Low-skew Clocking in FPGAs



## Goal: use as few clock domains as possible

Suppose we wanted clocks at f/2, f/4, f/8, etc.: No! don't do it this way reg clk2,clk4,clk8,clk16; always @(posedge clk) clk2 <= ~clk2;</pre> always @(posedge clk2) clk4 <= ~clk4; always @(posedge clk4) clk8 <= ~clk16;</pre> always @(posedge clk8) clk16 <= ~clk16; No vsync! CLK CLK2 CLK4 CLK8 CLK16

#### Solution: 1 clock, many enables

Use one (high speed) clock, but create enable signals to select a subset of the edges to use for a particular piece of sequential logic

```
reg [3:0] count;
    always @(posedge clk) count <= count + 1;</pre>
                                                        // counts 0..15
    wire enb2 = (count[0] == 1'b1);
    wire enb4 = (count[1:0] == 2'b11);
                                                     always @(posedge clk)
    wire enb8 = (count[2:0] == 3'b111);
                                                       if (enb2) begin
                                                         // get here every 2<sup>nd</sup> cycle
    wire enb16 = (count[3:0] == 4'b1111);
                                                       end
                                                             10
                                                                 11
  ENB<sub>2</sub>
  FNB4
  ENB8
  ENB16
                   †= clock edge selected by enable signal
6.111 Fall 2017
                                        Lecture 7
```

# Using External Clocks

Very hard to have synchronous communication

Sometimes you need to communicate synchronously with circuitry outside of the FPGA (memories, I/O, ...)

between clk and clk16 domains

Problem: different delays along internal paths for DATA and CLK change timing relationship

#### Solutions:

6.111 Fall 2017

- 1) Bound internal delay from pin to internal rea; add that delay to setup time  $(t_{SU})$  specification
- 2) Make internal clock edge aligned with external clock edge (but what about delay of pad and clock driver)



# 1) Bound Internal Data Delay

Solution: use registers built into the IOB pin interface:



Virtex-II IOB Block

20

6,111 Fall 2017 Lecture 7 19 6.111 Fall 2017 Lecture 7

### 2) Align external and internal clocks



# Generating Other Clock Frequencies (again)

The labkit has a 27MHz crystal (37ns period). Use DCM to generate other frequencies e.g., 65MHz to generate 1024x768 VGA video.



Vivado uses a Clock Wizard to simplify clock generation.

### Example: Labkit ZBT interface

The upper DCM is used to generate the de-skewed clock for the external ZBT memories. The feedback loop for this DCM includes a 2.0 inch long trace on the labkit PCB and matches in distance all of the PCB traces from the FPGA to the ZBT memories. The propagation delay from the output of the upper DCM back to its CLKFB input should be almost exactly the same as the propagation delay from the DCM output to the ZBT memories.



The lower DCM is used to ensure that the fpga\_clock signal, which clocks all of the FPGA flip-flops, is in phase with the reference clock (clock\_27mhz).

6.111 Fall 2017 Lecture 7 22

# Verilog to generate 65MHz clock

```
// use FPGA's digital clock manager to produce a
// 65MHz clock (actually 64.8MHz)
wire clock_65mhz_unbuf,clock_65mhz;
DCM vclk1(.CLKIN(clock_27mhz),.CLKFX(clock_65mhz_unbuf));
// synthesis attribute CLKFX_DIVIDE of vclk1 is 10
// synthesis attribute CLKFX_MULTIPLY of vclk1 is 24
// synthesis attribute CLK_FEEDBACK of vclk1 is NONE
// synthesis attribute CLKIN_PERIOD of vclk1 is 37
BUFG vclk2(.0(clock_65mhz),.I(clock_65mhz_unbuf));
```

$$f_{CLKFX} = \left(\frac{24}{10}\right)(27MHz) = 64.8MHz$$

6.111 Fall 2017 Lecture 7 23 6.111 Fall 2017 Lecture 7 24

21

### RESETing to a known state

Just after configuration, all the registers/memories are in a known state (eg, default value for regs is 0). But you may need to include a RESET signal to set the initial state to what you want. Note the Verilog initial block only works in simulation and has no effect when synthesizing hardware.

Solution: have your logic take a RESET signal which can be asserted on start up and by an external push button:

6.111 Fall 2017 Lecture 7 25

# Encoding numbers

It is straightforward to encode positive integers as a sequence of bits. Each bit is assigned a weight. Ordered from right to left, these weights are increasing powers of 2. The value of an n-bit number encoded in this fashion is given by the following formula:

$$v = \sum_{i=0}^{n-1} 2^i b_i$$



Oftentimes we will find it convenient to cluster groups of bits together for a more compact notation. Two popular groupings are clusters of 3 bits and 4 bits.

6.111 Fall 2017



#### Debugging: making the state visible

To figure out what your circuit is doing it can be very useful to include logic that makes various pieces of state visible to the outside world. Some suggestions:

- turn the leds on and off to signal events, entry into particular pieces of code, etc.
- use the 16-character fluorescent display to show more complex state information
- drive useful data onto the ANALYZER pins and use the adapters to hook them up to the logic analyzer. Include your master clock signal and the configure the logic analyzer to sample the data on the non-active edge of the clock (to avoid setup and hold problems introduced by I/O pad delays). The logic analyzer can capture thousands of cycles of data and display the results in useful ways (including interpreting multi-bit data as samples of an analog waveform).

6.111 Fall 2017 Lecture 7 26

## Binary Representation of Numbers

How to represent negative numbers?

- Three common schemes:
  - sign-magnitude, ones complement, twos complement
- <u>Sign-magnitude</u>: MSB = 0 for positive, 1 for negative
  - Range:  $-(2^{N-1}-1)$  to  $+(2^{N-1}-1)$
  - Two representations for zero: 0000... & 1000...
  - Simple multiplication but complicated addition/subtraction
- Ones complement: if N is positive then its negative is  $\overline{N}$ 
  - Example: 0111 = 7, 1000 = -7
  - Range:  $-(2^{N-1}-1)$  to  $+(2^{N-1}-1)$
  - Two representations for zero: 0000... & 1111...
  - Subtraction is addition followed by end-around carry (subtraction is different from addition unit)

6,111 Fall 2017 Lecture 7

28

## Representing negative integers

To keep our arithmetic circuits simple, we'd like to find a representation for negative numbers so that we can use a single operation (binary addition) when we wish to find the sum of two integers, independent of whether they are positive are negative.

We certainly want A + (-A) = 0. Consider the following 8-bit binary addition where we only keep 8 bits of the result:

$$\begin{array}{c} 11111111\\ + \ \ \, \underline{00000001}\\ 00000000\end{array}$$

which implies that the 8-bit representation of -1 is 11111111. More generally



# Signed integers: 2's complement



8-bit 2's complement example:

$$11010110 = -2^7 + 2^6 + 2^4 + 2^2 + 2^1 = -128 + 64 + 16 + 4 + 2 = -42$$

If we use a two's complement representation for signed integers, the same binary addition mod  $2^n$  procedure will work for adding positive and negative numbers (don't need separate subtraction rules). The same procedure will also handle unsigned numbers!

By moving the implicit location of "decimal" point, we can represent fractions too:

$$1101.0110 = -2^3 + 2^2 + 2^0 + 2^{-2} + 2^{-3} = -8 + 4 + 1 + 0.25 + 0.125 = -2.625$$

6.111 Fall 2017 Lecture 7

# Sign extension

Consider the 8-bit 2's complement representation of:

$$42 = 00101010$$
  $-5 = \sim 00000101 + 1$   $= 11111010 + 1$   $= 11111011$ 

42 = 0000000000101010

What is their 16-bit 2's complement representation?

# Using Signed Arithmetic in Verilog

reg signed [63:0] data; wire signed [7:0] vector; input signed [31:0] a; function signed [128:0] alu;

16'hC501 //an unsigned 16-bit hex value 16'shC501 //a signed 16-bit hex value

#### Use care with signed arithmetic!

wire signed [7:0] total;

wire [3:0] counter; // max value 15, counting widgets off the mfg line wire signed [5:0] available;

assign total = available + counter; // does this give the correct answer? NO! counter = 4'b1111 is treated as -1. Need to "append" a leading zero

32

assign total = available + {1'b0, counter}; // or use \$unsigned()
assign total = available + \$unsigned(counter);

6.111 Fall 2017 Lecture 7 31 6.111 Fall 2017 Lecture 7

# Using Signed Arithmetic in Verilog

"<<<" and ">>>" tokens result in arithmetic (signed) left and right shifts: multiple by 2 and divide by 2.

Right shifts will maintain the sign by filling in with sign bit values during shift

```
wire signed [3:0] value = 4'b1000; // -8
value >> 2 // results in 0010 or 2
value >>> 2 // results in 1110 or -2
```

6.111 Fall 2017 Lecture 7 33

# Verilog Grading

#### Logistics

- Verilog submission with 2 days after lab checkoff. Lab must be checkoff first.
- Resubmission for regrade permitted for Lab 2 and Lab 3 only (email grader for regrading)

#### Grading

- Proper use of blocking and non-blocking assignments
- Readable Code with comments and consistent indenting
- Use of default in case statement
- Use of parameter statements for symbolic name and constants (state==5 vs state==DATA\_READY)
- Parameterized modules when appropriate
- Readable logical flow, properly formatted (see "Verilog Editors")
- No long nested if statements.
- 20% off for each occurrence.

6.111 Fall 2017 Lecture 7

# Nexys 4 - DDR



# Low Cost FPGA Boards



- Basys3
  - Artix-7 FPGA
  - 12 bit VGA
  - Switches/LEDs
  - **\$89** (9/2017)
  - Vivado Webpack



- PYNQ-Z1
  - 650MHz dual core Cortex A9
  - Artix-7
  - 512MB DDR3
  - **\$65** (9/2017)
  - Vivado Webpack

6,111 Fall 2017 Lecture 7 35 6,111 Fall 2017 Lecture 7