Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar : A Synergy of Large and Small

Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promi...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2024) vom: 18. Okt.
1. Verfasser: Wang, Zhehui (VerfasserIn)
Weitere Verfasser: Luo, Tao, Liu, Cheng, Liu, Weichen, Goh, Rick Siow Mong, Wong, Weng-Fai
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM379099411
003 DE-627
005 20250306192449.0
007 cr uuu---uuuuu
008 241019s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3483654  |2 doi 
028 5 2 |a pubmed25n1262.xml 
035 |a (DE-627)NLM379099411 
035 |a (NLM)39423084 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Zhehui  |e verfasserin  |4 aut 
245 1 0 |a Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar  |b A Synergy of Large and Small 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 22.10.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. Firstly, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Secondly, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERT Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39× in area overhead and 18× in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68× reduction in the area-delay product and a significant 69% energy consumption reduction 
650 4 |a Journal Article 
700 1 |a Luo, Tao  |e verfasserin  |4 aut 
700 1 |a Liu, Cheng  |e verfasserin  |4 aut 
700 1 |a Liu, Weichen  |e verfasserin  |4 aut 
700 1 |a Goh, Rick Siow Mong  |e verfasserin  |4 aut 
700 1 |a Wong, Weng-Fai  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g PP(2024) vom: 18. Okt.  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:PP  |g year:2024  |g day:18  |g month:10 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3483654  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 18  |c 10