TR2026-018

LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention

- Koike-Akino, T., Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., "LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention", AAAI Conference on Artificial Intelligence, January 2026.
  BibTeX TR2026-018 PDF Video Presentation
  - @inproceedings{Koike-Akino2026jan,
  - author = {{{Koike-Akino, Toshiaki and Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew}}},
  - title = {{{LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention}}},
  - booktitle = {AAAI Conference on Artificial Intelligence},
  - year = 2026,
  - month = jan,
  - url = {https://www.merl.com/publications/TR2026-018}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

Modern foundation models such as large language models (LLMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor decomposition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory- efficient LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.

TR2026-018

LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention

MERL Contacts:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang

Pu
(Perry)
Wang

Matthew
Brand

Research Areas:

Abstract:

Related Video

MERL Contacts:

ToshiakiKoike-Akino

JingLiu

YeWang

Pu(Perry)Wang

MatthewBrand

Research Areas:

Abstract:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang

Pu
(Perry)
Wang

Matthew
Brand