As computer technology has progressed leaps and bounds over the last few decades, we have reached a point where further performance enhancements can't be achieved by feature size scaling alone due to the inherent physical limitations. The performance gains of traditional powerful computing systems by increasing the operating frequencies are now only giving diminishing returns due to the infamous memory wall, ILP (Instruction Level Parallelism) wall and power wall issues. Therefore, we are now into a phase where multiple computing cores are brought together into a single processor to boost performance by exploiting the inherent parallelism in complex applications. This has led to the shift in focus from computation to communication. The interconnection network between processors and memory defines the memory latency and memory bandwidth, which have greater bearing on the system performance than the compute power of processors themselves. Providing efficient communication infrastructure between the multiple cores and the off-chip memory has become more and more imperative. Network-on-Chip (NoC) paradigm enable efficient and scalable infrastructure for future interconnection networks. The worst-case network throughput is an important performance metric of interconnection networks and it is of particular concern to real-time systems. The existing model for throughput analysis assumes that the router nodes will not constrain the throughput, and hence the ideal throughput of the network is determined completely by congestion in the channels/links. In many recently proposed NoCs, however, the real router design demonstrates significant impact on the delivered throughput. This work thus re-examines the worst case throughput issue considering the router model. In the first part of this thesis, we present an extended framework for analytically evaluating the throughput constrained by both the routers and the channels with typical network and router settings, and commonly used routing algorithms including DOR and ROMM. Secondly, we address the major performance bottleneck in heterogeneous chip multiprocessors, namely, the memory bandwidth between the on-chip system and off-chip memory. We use multi-criteria optimization for efficient memory controller placement schemes to optimize off-chip memory access for general heterogeneous applications and extend it for domain-specific applications. Experimental results demonstrate that our method can accelerate such applications by improving the average network latency and link utilization with minimal change to network components over existing schemes.
| Date of Award | 2016 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Performance analysis and optimizations for network on chip paradigm
Mohan, V. (Author). 2016
Student thesis: Master's thesis