PUSH (R1) PUSH (R2) LD (BP, -12, R0) LD (BP, -16, R1) CMPEQ (R0, R1, R2) BT (R2, L1)When the CMPEQ is executed, assuming no interrupts, where does the value for R0 come from? How about the value for R1? (The choices would be from the register file or bypassed from one of the pipeline stages.)
loop: LD(R31, status, R0) BEQ(R0, loop, R31) ADD (R0, R1, R2)The following pipeline diagram illustrates the execution of this instruction sequence on a standard 5-stage pipelined Beta:.
ADD(R31,R31,R31) | NOP ADD(R1,R2,R1) LD(R1,4,R1) SUB(R1,R5,R6) ORC(R1,123,R1) SHL(R1,R1,R1)
. = 0x200 ADDC(R31,10,R0) ADD(R2,R0,R1) CMPLE(R0,R1,R2) BT(R2,Loop,R31)
. = 0x100 LOOP: ADD(R1,R2,R3) CMPLEC(R3,100,R0) BT(R0,Loop,R31) SHLC(R3,1,R3)
. = 0x60 LD(R31,124,R0) ADDC(R0,1,R0) ST(R0,124,R31)
. = 0x60 LD(R31,-1,R0) ADDC(R0,1,R0)
. = 0x100 . = 0x0 ADD(...) IHANDLER: ADDC(SP,4,SP) | PUSH(XP) MUL(...) ST(XP,-4,SP) SUB(...) | Interrupt here ...
ADDC(R31, 3, R0) SUBC(R0, 1, R1) MUL(R0, R1, R2) XOR(R0, R2, R3) ST(R3, 0x1000, R31)
ADDC(R31, 3, R0) | R0 = 3 SUBC(R0, 1, R1) | R1 = 2, R0 bypassed from ALU MUL(R0, R1, R2) | R2 = 6, R0 bypassed from MEM, R1 bypassed from ALU XOR(R0, R2, R3) | R3 = 5, R0 bypassed from WB, R2 bypassed from ALU ST(R3, 0x1000, R31) | 5 is stored in location 0x1000, R3 bypassed from ALU
ADDC(R31, 3, R0) | R0 = 3 SUBC(R0, 1, R1) | R1 = 2, R0 bypassed from ALU MUL(R0, R1, R2) | R2 = 6, R0 bypassed from MEM, R1 bypassed from ALU XOR(R0, R2, R3) | R3 = 6, R0 bypassed from WB (as 0), R2 bypassed from ALU ST(R3, 0x1000, R31) | 6 is stored in location 0x1000, R3 bypassed from ALU
LDR(.+8,LP) BR(f,r31) LONG(.+4)
foo: LONG( 0 ) LD( foo, R0 ) ADD( R1, R2, R3 )To execute this sequence correctly the pipeline diagram must look like this: The stall occurs when the ADD and the LD attempt to use the WB stage at the same time, forcing the ADD instruction to remain in a wait stage during t5.
S1: ADD(R1, R2, R3) SUB(R2, R3, R4) CMPLT(R3, R4, R5) S2: ADD(R1, R2, R3) NOP SUB(R2, R3, R4) NOP CMPLT(R3, R4, R5) S3: ADD(R1, R2, R3) NOP SUB(R2, R3, R4) CMPLT(R3, R4, R5)
ADDC(R31, 10, RO) SUBC(R0, 5, R1) ANDC(R0, 6, R2) ORC(R0, 7, R3) CMPLTC(R0, 11, R4)The CMPLTC will be the first instruction to fetch the new value of R0. All the preceding instructions will be using the previous value(s) of R0. The ADDC instruction is in the Write Back stage while ORC is in the Register File stage-so the new R0 is not written back in time for the ORC to read it. For the working Beta, S1, S2, and S3 all compute the same results. Initially: Reg[ R1 ] = -1, Reg[ R2 ] = 1, Reg[ R3 ] = 5, Reg[ R4 ] = -1
ADD( R1, R2, R3 ) Reg[ R3 ] = Reg[ R1 ] + Reg[ R2 ] = (-1) + 1 = 0 SUB( R2, R3, R4 ) Reg[ R4 ] = Reg[ R2 ] - Reg[ R3 ] = 1 - 0 = 1 CMPLT( R3, R4, R5 ) Reg[ R5 ] = (Reg[ R3 ] < Reg[ R4 ]) = (0 < 1) = 1so Reg[ R5 ] = 1 for all three cases. For the Buba (italics denote cases in which the Buba is different from a working Beta, in which the most recently calculated result is not being used):
S1: ADD( R1, R2, R3 ) Reg[ R3 ] = Reg[ R1 ] + Reg[ R2 ] = (-1) + 1 = 0 new value of Reg[R3] not available yet SUB( R2, R3, R4 ) Reg[ R4 ] = Reg[ R2 ] - Reg[ R3 ] = 1 - 5 = -4 new values of Reg[ R3 ] and Reg[ R4 ] not available yet CMPLT( R3, R4, R5 ) Reg[ R5 ] = (Reg[ R3 ] < Reg[ R4 ]) = (5 < -1) = 0 Reg[ R5 ] = 0 S2: ADD( R1, R2, R3 ) Reg[ R3 ] = Reg[ R1 ] + Reg[ R2 ] = (-1) + 1 = 0 NOP new value of Reg[ R3 ] not available yet SUB( R2, R3, R4 ) Reg[ R4 ] = Reg[ R2 ] - Reg[ R3 ] = 1 - 5 = -4 NOP new value of Reg[ R4 ] not available yet (but Reg[ R3 ] is available) CMPLT( R3, R4, R5 ) Reg[ R5 ] = (Reg[ R3 ] < Reg[ R4 ]) = (0 < -1) = 0 Reg[ R5 ] = 0 S3: ADD( R1, R2, R3 ) Reg[ R3 ] = Reg[ R1 ] + Reg[ R2 ] = (-1) + 1 = 0 NOP new value of Reg[ R3 ] not available yet SUB( R2, R3, R4 ) Reg[ R4 ] = Reg[ R2 ] - Reg[ R3 ] = 1 - 5 = -4 new values of Reg[ R3 ] and Reg[ R4 ] not available yet CMPLT( R3, R4, R5 ) Reg[ R5 ] = (Reg[ R3 ] < Reg[ R4 ]) = (5 < -1) = 0 Reg[ R5 ] = 0
ADD(R3, R4, R5) SUB(R5, R6, R7) ADD(R1, R2, R3) MUL(R7, R1, R2) ADD(R4, R3, R5) CMPLE(R7, R8, R9) DIV(R7, R8, R10) BEQ(R5, done) ADDC(R1, 1, R5)
ADD( R3, R4, R5 ) NOP NOP NOP | Reg[R5] has not yet been updated SUB( R5, R6, R7 ) ADD( R1, R2, R3 ) NOP NOP | Reg[R7] has not yet been updated MUL( R7, R1, R2 ) ADD( R4, R3, R5 ) CMPLE( R7, R8, R9 ) DIV( R7, R8, R10 ) NOP | Reg[R5] has not yet been updated BEQ( R5, done ) NOP ADDC( R1, 1, R5 )The NOP after the BEQ instruction is necessary so that ADDC will only be executed if the branch is not taken.
XAdr: ADDC(SP,4,SP) ST(R0,-4,SP) ...First, consider this code fragment:
. = 0x1234 start: CMPLTC(R1,0,R2) SUB(R3,R2,R3) XOR(R0,R3,R0) MUL(R1,R2,R3) SHLC(R1,2,R4)
skip: BR(NEXT) CMPLTC(R1,0,R2) ADD(R3,R2,R3) next: XOR(R0,R3,R0) MUL(R1,R2,R3) SHLC(R1,2,R4)Complete the diagram for normal execution of the instructions starting at skip.
X: BR(Y) Y: BR(X)