pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

Ferreira, Joao Dinis; Falcao, Gabriel; Gomez-Luna, Juan; Alser, Mohammed; Orosa, Lois; Sadrosadati, Mohammad; Kim, Jeremie S.; Oliveira, Geraldo F.; Shahroodi, Taha

doi:10.1109/MICRO56248.2022.00067

pLUTo

Title

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

Author

Ferreira, Joao Dinis (ETH Zürich)
Falcao, Gabriel (Universidade de Coimbra)
Gomez-Luna, Juan (ETH Zürich)
Alser, Mohammed (ETH Zürich)
Orosa, Lois (ETH Zürich)
Sadrosadati, Mohammad (ETH Zürich)
Kim, Jeremie S. (ETH Zürich)
Oliveira, Geraldo F. (ETH Zürich)
Shahroodi, Taha (TU Delft Computer Engineering)

Date

2022

Abstract

Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high performance and energy efficiency, but existing PuM techniques support a limited range of operations. As a result, current PuM architectures cannot efficiently perform some complex operations (e.g., multiplication, division, exponentiation) without large increases in chip area and design complexity. To overcome these limitations of existing PuM architectures, we introduce pLUTo (processing-using-memory with lookup table (LUT) operations), a DRAM-based PuM architecture that leverages the high storage density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The key idea of pLUTo is to replace complex operations with low-cost, bulk memory reads (i.e., LUT queries) instead of relying on complex extra logic. We evaluate pLUTo across 11 real-world workloads that showcase the limitations of prior PuM approaches and show that our solution outperforms optimized CPU and GPU base-lines by an average of 713 × and 1.2 ×, respectively, while simultaneously reducing energy consumption by an average of 1855 × and 39.5 ×. Across these workloads, pLUTo outperforms state-of-the-art PiM architectures by an average of 18.3 ×. We also show that different versions of pLUTo provide different levels of flexibility and performance at different additional DRAM area overheads (between 10.2% and 23.1%). pLUTo's source code and all scripts required to reproduce the results of this paper are openly and fully available at https://github.com/CMU-SAFARI/pLUTo.

To reference this document use:

http://resolver.tudelft.nl/uuid:f177de1e-892c-4aa4-81c7-80882cff03bb

DOI

https://doi.org/10.1109/MICRO56248.2022.00067

Publisher

IEEE

Embargo date

2023-07-01

ISBN

978-1-6654-6272-3

Source

Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022

Event

55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022, 2022-10-01 → 2022-10-05, Chicago, United States

Series

Proceedings of the Annual International Symposium on Microarchitecture, MICRO, 1072-4451, 2022-October

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Part of collection

Institutional Repository

Document type

conference paper

Rights

Files

PDF

pLUTo_Enabling_Massively_ ... Tables.pdf

1.4 MB

Close viewer