TR2025-119

Investigating Continuous Autoregressive Generative Speech Enhancement

- Yang, H., Wichern, G., Aihara, R., Masuyama, Y., Khurana, S., Germain, F.G., Le Roux, J., "Investigating Continuous Autoregressive Generative Speech Enhancement", Interspeech, DOI: doi: 10.21437/Interspeech.2025-2335, August 2025, pp. 2360-2364.
  BibTeX TR2025-119 PDF
  - @inproceedings{Yang2025aug,
  - author = {Yang, Haici and Wichern, Gordon and Aihara, Ryo and Masuyama, Yoshiki and Khurana, Sameer and Germain, François G and {Le Roux}, Jonathan},
  - title = {{Investigating Continuous Autoregressive Generative Speech Enhancement}},
  - booktitle = {Interspeech},
  - year = 2025,
  - pages = {2360--2364},
  - month = aug,
  - publisher = {ISCA},
  - doi = {doi: 10.21437/Interspeech.2025-2335},
  - url = {https://www.merl.com/publications/TR2025-119}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Following the success of autoregressive (AR) language models in predicting discrete tokens, it has become common practice for autoregressive audio and speech models to use discrete to- kens generated by a neural audio codec. However, recent work has demonstrated that replacing discrete token probability modeling in an AR model with a continuous diffusion procedure can improve both model performance and efficiency for image generation. In this paper, we explore applying such a diffusion loss to replace discrete token modeling in an AR generative speech enhancement model. We explore several important design choices, including comparing standard AR models with masked AR models, and mel spectrograms with learned latents as the continuous feature representation. Our results demon- strate the potential of continuous AR speech enhancement, particularly in cases of severe noise.

TR2025-119

Investigating Continuous Autoregressive Generative Speech Enhancement

MERL Contacts:

Gordon
Wichern

Yoshiki
Masuyama

Jonathan
Le Roux

Research Areas:

Abstract:

MERL Contacts:

GordonWichern

YoshikiMasuyama

JonathanLe Roux

Research Areas:

Abstract:

Gordon
Wichern

Yoshiki
Masuyama

Jonathan
Le Roux