LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

Nallathambi, Abinand; Bose, Christin D.; Haensch, Wilfried; Raghunathan, Anand

doi:10.3389/frai.2024.1268317

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 7 - 2024 | doi: 10.3389/frai.2024.1268317

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

Provisionally accepted

Abinand Nallathambi ^1*

Christin D. Bose ¹

Wilfried Haensch ²

Anand Raghunathan ¹

¹ Purdue University, West Lafayette, United States
² Materials Science Division, Argonne National Laboratory (DOE), Argonne, Illinois, United States

The final, formatted version of the article will be published soon.

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9× latency and 11.8-19× throughput improvement at minimal (< 1%) degradation in accuracy.

Keywords: RRAM, in-memory computing, Analog accelerator, quantization, reinforcement learning, Mixed integer linear programming

Received: 27 Jul 2023; Accepted: 17 Jun 2024.

Copyright: © 2024 Nallathambi, Bose, Haensch and Raghunathan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Abinand Nallathambi, Purdue University, West Lafayette, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

ORIGINAL RESEARCH article

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

Select one of your emails

Notify me on publication