Design space exploration for a Local Object Store

More Info
expand_more

Abstract

Nowadays, modern Integrated Circuit (IC) technology allows processor manufacturers to produce complex designs with up to a few billions of transistors.Technology limitations and the end of voltage and frequency scaling forced computer architecture to multicore designs and more specialized solutions on hardware. These technology trends increased memory bandwidth pressure, exposing modern computer systems to bandwidth limitations. Therefore, a large and increasing fraction of the area is occupied to efficiently manage the available memory bandwidth, especially the off-chip bandwidth. In addition, the failure of Dennard Scaling along with the end of multicore scaling gave rise to the Dark Silicon era, in which the percentage of transistors in an IC that can be simultaneously powered-on is decreasing. Thus, under a fixed power and thermal budget, decreasing the main memory accesses of a processor relieves operational costs and rises the potential for increased computing performance. In order to tackle this problem, we investigate the possibilities to offload functionalities of the software to hardware due to the drop in the cost of hardware. Based on our observations on modern programming languages, we formulate the hypothesis that object caching presents potential to advance the main processor's functionalities in handling objects which are the most commonly used memory structure in modern languages. After formulating the concept of a Local Object Store (LOS), this master thesis presents our progress on developing the framework for our proof of concept. A LOS is expected to decrease the number of main memory references by leveraging the "infant mortality" property of objects in real-life applications. Our proposed design provides support for the basic object manipulation operations while maintaining reference counters for the local objects.Therefore, comparing LOS with a cache memory, it permits the construction and destruction of short-lived objects without polluting the rest of the memory hierarchy. Our framework consists of a Smart Pointer library, two modified tracing tools that allow us to obtain traces from microbenchmarks, and a basic simulator for a Cache-LOS memory unit. Moreover, we develop and trace 3 microbenchmarks and using our framework simulate a 39% average reduction in the main memory accesses for a Cache-LOS memory architecture compared to a cache-only memory configuration. These promising results show the benefits of an integrated Cache-LOS to manipulate object structures.