6.033 Discussion Suggestions (Coordinating Multiple Instruction Issue)

6.033--Computer System Engineering

Suggestions for classroom discussion of:

Coordinating multiple instruction issue
by J. H. Saltzer, May 7, 1996, from 1994 recitation notes
Modern high-performance RISC chips issue several instructions at a time
(for example, the PowerPC 604 issues six), leading to a requirement to
take measures to assure that there is no interference among the
parallel instructions.  The method used in these chips has not, to my
knowledge, been published, but it would be possible, and surprisingly
simple, to do this job using a version history algorithm, the
mark-point-sequential coordination scheme described in 6.033 chapter
on Atomicity.

The problem of multiple instruction issue is that instruction 4 may
take as input a register than is being modified by instruction 2; to
the extent that such dependencies exist the later instruction must
wait for the earlier one to complete.  To the extent that such
dependencies don't exist, the several instructions can be done
completely in parallel.

One way of thinking about register dependencies in a multiple-issue CPU
is to consider the several registers to be a data base and the several
parallel instructions to be atomic operations.  The usual correctness
goal (that the result be as if those operations had been done in some
sequential order) has to be strengthened slightly:

   the result must be the same as if those operations had been done
   in the order that they appeared in the original program.

The actual executions can be done in parallel or in any convenient order
so long as that requirement is met. With that model in mind, one can
bring to bear any appropriate apparatus of atomicity and coordination.

Suppose that there are six identical instruction processing units
(IPU's) and 32 registers.  Imagine that for each of the 32 registers
there is a daisy chain that runs through each of the IPU's, plus a 33rd
such daisy chain for control.  Each daisy chain starts at a source of
ONEs, then visits the IPU that will execute the earliest instruction in
the program, then the IPU that will execute the second instruction, and
so on.  There are pull-downs in each IPU such that if a chain is opened
by an IPU that it is traversing, it will carry a ZERO beyond that
point.

At the beginning of an instruction cycle, each IPU closes all of the
register daisy chains that pass through it and it opens the control
chain.  Then, each IPU decodes its incoming instruction far enough to
figure out which registers it will read or modify.  For each such
register, it opens the corresponding daisy chain. As soon as all
registers have been marked in this fashion, the IPU closes the control
daisy chain.

Having completed that step, the IPU is now free to proceed with
execution as soon as it sees that both the control chain and the
register chains for all registers from which it will read are carrying
ONEs. And as soon as an IPU has finished executing its instruction, it
is expected to close the register daisy chains corresponding to those
registers it used.

The first IPU is always free to proceed, because all of the daisy chains
feeding it are directly connected to sources of ONEs.  Later IPU's that
use input values from registers that no earlier IPU will use
are also free to proceed as soon as the control ONE reaches them, to
prove that all earlier IPU's have had a chance to mark the registers they
need.  An IPU that intends to use a register that one or more
earlier IPU's needs must wait until the register's daisy chain
once again carries a ONE, indicating that all IPU's earlier in the chain
that had marked it have completed their use.

This scheme is a direct hardware implementation of a variation of 6.033
chapter seven's mark-point-sequentional coordination method.  The
opening of a register daisy chain corresponds to marking a version in
the version history, and the closing of the control daisy chain is the
announcement that marking is complete. There are two significant
differences:

    - the sequential ordering of the operations is fixed by the routing
    of the daisy chains, rather than by the order of atomic operation
    creation.
    - because a single register stands in for the entire history, it is
    necessary to mark read as well as write intentions.
Comments and suggestions: Saltzer@mit.edu