## A Video Controller and Distributed Frame Buffer for the J-Machine <sup>1</sup>

Eric McDonald <sup>2</sup> ericldab@ai.mit.edu, NE43-610 June 7, 1993

We are currently in the test and assembly phase of a distributed, scalable video system for the J-Machine. The J-Machine is a fine-grain concurrent computer comprised of up to 65,536 36-bit Message Driven Processors (MDPs) which communicate through a low-latency network [1]. Our goal is to provide high-bandwidth, multi-buffered, high-resolution video output capability for the J-Machine. Furthermore, the video system is designed to be scalable so as to meet the varying demands of different J-Machine configurations and users.

The video system is comprised of two types of peripheral boards which connect to the edges of the main J-Machine processor array. Figure 1 shows one possible configuration of a small video system, consisting of an array of Pixel Storage Boards (PSBs) and a single Video Controller (VC) board.

The aggregate PSB array, or "Distributed Frame Buffer" (DFB), is responsible for receiving pixel data from the J-Machine processor array and storing it in video RAM (VRAM). Each of the PSBs is comprised of two independent processing nodes. The number of PSBs in a given system can vary from two to sixteen by powers of two (i.e. from four to thirty-two independent nodes).

The VC board is responsible for collecting the pixel data from the DFB and displaying it on an RGB monitor. The pixel data must be retrieved for every screen refresh cycle; no screen buffering is performed by the VC, although multiple buffers are maintained in the DFB. The target video specification provides for up to a 1280 x 1024 pixel screen with 24-bit pixels and eight bits of overlay per pixel. The RGB video is generated according to RS-343 sync standards [2]. There is only one VC board per system, but synchronization between multiple VC boards is possible if multiple video systems are installed in a single J-Machine. This permits the generation of high-resolution stereoscopic images in conjunction with a stereo-ready monitor and stereo glasses.



Figure 1: Small video system for the J-Machine

As shown in Figure 1, the entire DFB is ultimately responsible for providing four pixel streams. Most of the video system is able to work at one-quarter the pixel rate of 120 MHz because the VC latches in four pixels (128 bits) every 30 MHz clock cycle, instead of one pixel every 120 MHz cycle. The need to provide four pixel streams is why the minimal video system consists of four PSB nodes plus the VC. If more than four PSB nodes are used, each pixel stream is shared in a time-multiplexed fashion.

The block diagram in Figure 2 illustrates the various modules in a single PSB node. An MDP is used primarily to receive changing pixel data from the J-Machine network. An incoming message to the MDP causes the corresponding address and pixel value to be driven onto the External Memory Interface (EMI) bus. In its intended usage, the given data would be stored at the given address in external DRAM, but in a PSB node, the data and address pair are instead directed into a 64-entry FIFO.<sup>3</sup> The data is ultimately stored at the given address of the dual-ported VRAM.

A shift register on the serial port of the VRAM simultaneously provides pixel data from one row of the VRAM to one of the four pixel streams leading into

<sup>&</sup>lt;sup>1</sup> This research is supervised by Professor W.J. Dally and is supported in part by the Defense Advanced Research Projects Agency under contracts F19628-92-C-0045 and N00014-91-J-1698, and by a National Science Foundation Presidential Young Investigator Award, grant MIP-8657531, with matching funds from General Electric Corporation, IBM Corporation, and AT&T.

<sup>&</sup>lt;sup>2</sup>Initial design work on this project was done by Sasan Zamani.

<sup>&</sup>lt;sup>3</sup> A FIFO is needed because the VRAM might not be ready to accept a write operation, and the EMI interface wasn't designed to allow stalls.





Figure 2: PSB Node Block Diagram

Figure 3: VC Block Diagram

the VC. The serial port can be accessed in this way by the VC without interruption until the end of a scan line, at which point the VC broadcasts a new row address to all PSB nodes. The nodes must then transfer the indicated row of VRAM data into their respective serial port shift registers. A multiplexer is needed to provide either the pixel address from the FIFO or the address broadcast by the VC to the VRAM.

Each PSB node contains two banks of 256 KPixel VRAM at thirty-two bits per pixel. This results in a total VRAM storage of two megapixels in the minimal system, or up to sixteen megapixels in a full system. This storage will normally be segmented to provide multiple frame buffers useful in animation applications.

A block diagram for the Video Controller board is shown in Figure 3. Once again, an MDP is used as a communications bridge between the J-Machine and the peripheral board, and data destined for certain ranges of external memory are directed towards the FIFO instead. Depending on the address, any one of three modules will receive and interpret the data coming from the FIFO.

The XferAddr Lookup Block (XLB) maintains up to sixteen 1024-entry tables in SRAM. Each entry contains an eighteen-bit address that is broadcast to all PSB nodes before every scan line, requesting that they perform a particular row transfer in their VRAMs. The XLB is told how many tables are filled with valid addresses, and it sequences through the tables from frame to frame. If the application is using only a single frame buffer, only one table will be used.

The Serial Clock Module (SCM) is responsible for generating the clock signals that are fed into the serial ports of the VRAM banks. In configurations with

more than four PSB nodes, these signals also implicitly manage the pixel streams and eliminate bus contention. Because there are up to 64 VRAM banks in a system, up to 64 ECL clock signals are generated and individually hardwired to the different PSB nodes. The clock patterns, and therefore the DFB access patterns, are retrieved from an SRAM maintained in the SCM.

The Microprocessor Unit (MPU) interface is needed to communicate with the MPU contained in the video controller chip, a Brooktree Bt463. The Bt463 is the analog workhorse of the video system. It contains three 528 x 8 color look-up tables and three 8-bit D/A converters. The chip supports multiple window types (e.g. 24-plane true color, 8-plane pseudo color, and 12-plane true color, each with a separate color map) in a single frame. The part also supports bit plane masking and several testability features.

An initial design of the PSB board was fabricated, debugged, and a second version has now been fabricated. The first version of the VC board is scheduled to be sent out for fabrication within two weeks, and a functional system is anticipated within ten weeks.

## References

- [1] W. J. Dally et al., "The J-Machine: A fine-grain concurrent computer," in *Proceedings of IFIP 89 Conference*, 1989. VLSI memo 89-532.
- [2] EIA Engineering Dept., "Electrical Performance Standards For High Resolution Monochrome Closed Circuit Television Camera," Technical Report EIA-343-A, Electronic Industries Association, September 1969.