Improved Dynamic Cache Sharing for Communicating Threads on a Runtime-Adaptable Processor

None, None; None, None; None, None; None, None

Improved Dynamic Cache Sharing for Communicating Threads on a Runtime-Adaptable Processor

Abstract (2017)

Author(s)

J.J. Hoozemans (TU Delft - Computer Engineering)

A.F. Lorenzon (Universidade Federal do Rio Grande do Sul)

AC Schneider Beck (Universidade Federal do Rio Grande do Sul)

Stephan Wong (TU Delft - Computer Engineering)

Research Group

Computer Engineering

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:09d29f01-2ea1-451d-90e0-ebd751ee499f

More Info

expand_more

Publication Year

2017

Language

English

Copyright

Research Group

Computer Engineering

Pages (from-to)

1-9

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abstract—Multi-threaded applications execute their threads on different cores with their own local caches and need to share data among the threads. Shared caches are used to avoid lengthy and costly main memory accesses. The degree of cache sharing is a balance between reducing misses and increased hit latency.
Dynamic caches have been proposed to adapt this balance to the workload type. Similarly, dynamic processors aim to execute workloads as efficient as possible to being able to balance between exploiting Instruction-level parallelism (ILP) and Thread-level parallelism (TLP). To support this, they consist of multiple processing components and caches that have adaptable interconnects
between them. Depending on the workload characteristics, these can connect them together to form a large core that exploits ILP, or split them up to form multiple cores that can run multiple threads (exploiting TLP). In this paper, we propose a cache system that is able to further exploit this additional connectivity
of a dynamic VLIW processor by being able to forward cache accesses to multiple cache blocks while the processor is running in multi-threaded (‘split’) mode. Additionally, only requests to global data are broadcasted, while accesses to local data are kept private. This will improve the hit rates similar to existing cache
sharing schemes, but reduce the penalty due to stalling the other subcores. Local accesses are recognized by distinguishing memory accesses relative to the stack frame pointer. Results show that our cache exhibits similar miss rate reductions as shared caches (up to 90% and on average 26%), and reduces the number of
broadcasted accesses by 21%.

Files

Improved_dynamic_cache_sharing... (pdf)

(pdf | 0.416 Mb)

License info not available