On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

Qiang Yang, Jian Fu, Raphael Poss, Chris Jesshope

Research output: Contribution to journalJournal Articlepeer-review

3 Citations (Scopus)

Abstract

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.

Original languageEnglish
Article number103
JournalTransactions on Embedded Computing Systems
Volume13
Issue number3s
DOIs
Publication statusPublished - Mar 2014
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2014 ACM.

Keywords

  • C.4.0 [performance of systems]: design studies
  • Design
  • Distributed cache
  • Experimentation
  • Hardware coherence
  • Many-core system
  • Massive parallelism
  • On-chip memory network
  • Performance
  • Write combination

Fingerprint

Dive into the research topics of 'On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches'. Together they form a unique fingerprint.

Cite this