|I believe that future innovations in Computer Engineering are likely to be the result of developments in the three-fold areas of Parallel Computer Architecture, Wireless Communications and VLSI/RF Technology.|
|Undergraduate and graduate courses and associated projects were therefore selected in these three specific areas.|
Dealing With Complexity:
Autonomic and Intelligent On Chip Communication Networks
When you think of state-of-the-art
multicore processors and the effort and time involved in the
design and verification, the sheer complexity is immediately
The on-chip complexity grows as rapid transistor scaling occurs, and has reached the point where the design is rigid, intricate, and the power/performance is workload and environment dependent.
Paul Horn, IBM’s Senior Vice President, stated that the primary problem that can prevent the progression to the next era of computing is COMPLEXITY.
The improved performance expected for multicore processors is limited by the research and innovations addressing the communication efficiency.
The network-on-chip is a communication fabric that has proven to have many advantages over the traditional bus mechanism, but the network also doesn’t guarantee optimal power/performance optimization for each workload.
To address the “Dealing with Complexity” challenge, the on-chip communication network must tune and adapt to the workloads.
I believe, to realize this autonomic and intelligent computing goal, three areas of research must converge; machine intelligence and learning, bio-inspired autonomic computing, and parallel architectures and on-chip communication networks.
My research focus is at the convergence point. Specifically targeting a self-organizing, self-adaptive and self-tuning communication network in multicore systems.
|A 36-core Research Chip Prototype Demonstrating Snoopy Coherence On A Scalable Mesh NoC with In-Networking Ordering|
In the many-core era, scalable coherence and scalable
onchip interconnects are crucial for shared memory processors.
While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy intrinsically relies on ordered interconnects which do not scale well. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking which degrades performance.
Prior techniques for supporting ordering over arbitrary unordered networks suffer from practicality issues and are unsuitable for full multicore chip designs.
SCORPIO has an ordered mesh Network-on-Chip (NoC) architecture with a separate fixedlatency, bufferless network to ensure distributed global ordering. Message delivery on the network is decoupled from the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered.
With practicality, timing, area, and power as top concerns, the architecture is designed to plug-and-play with existing multicore IP and reap substantial performance benefits from snoopy coherence on a scalable network.
Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively.
The SCORPIO architecture is incorporated in an 11 mm-by- 13mm chip prototype, fabricated in IBM 45nm SOI technology and comprising of 36 commercial Power Architecture cores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers.
The chip prototype achieves a post synthesis operating frequency of 1 GHz (833MHz post-layout) with an estimated power of 28.8W (768mW per tile), while the network consumes only 10% of tile area and 19 % of tile power