TR2025-119
Investigating Continuous Autoregressive Generative Speech Enhancement
-
- "Investigating Continuous Autoregressive Generative Speech Enhancement", Interspeech, August 2025.BibTeX TR2025-119 PDF
- @inproceedings{Yang2025aug,
- author = {Yang, Haici and Wichern, Gordon and Aihara, Ryo and Masuyama, Yoshiki and Khurana, Sameer and Germain, François G and {Le Roux}, Jonathan},
- title = {{Investigating Continuous Autoregressive Generative Speech Enhancement}},
- booktitle = {Interspeech},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-119}
- }
,
- "Investigating Continuous Autoregressive Generative Speech Enhancement", Interspeech, August 2025.
-
MERL Contacts:
-
Research Areas:
Abstract:
Following the success of autoregressive (AR) language models in predicting discrete tokens, it has become common practice for autoregressive audio and speech models to use discrete to- kens generated by a neural audio codec. However, recent work has demonstrated that replacing discrete token probability modeling in an AR model with a continuous diffusion procedure can improve both model performance and efficiency for image generation. In this paper, we explore applying such a diffusion loss to replace discrete token modeling in an AR generative speech enhancement model. We explore several important design choices, including comparing standard AR models with masked AR models, and mel spectrograms with learned latents as the continuous feature representation. Our results demon- strate the potential of continuous AR speech enhancement, particularly in cases of severe noise.