TR2026-018
LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention
-
- , "LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention", AAAI Conference on Artificial Intelligence, January 2026.BibTeX TR2026-018 PDF Presentation
- @inproceedings{Koike-Akino2026jan,
- author = {{{Koike-Akino, Toshiaki and Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew}}},
- title = {{{LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention}}},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2026,
- month = jan,
- url = {https://www.merl.com/publications/TR2026-018}
- }
- , "LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention", AAAI Conference on Artificial Intelligence, January 2026.
-
MERL Contacts:
-
Research Areas:
Abstract:
Modern foundation models such as large language models (LLMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor decomposition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory- efficient LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.




