TR2019-099
WHAM!: Extending Speech Separation to Noisy Environments
-
- "WHAM!: Extending Speech Separation to Noisy Environments", Interspeech, DOI: 10.21437/Interspeech.2019-2821, September 2019, pp. 1368-1372.BibTeX TR2019-099 PDF
- @inproceedings{Wichern2019sep,
- author = {Wichern, Gordon and McQuinn, Emmett and Antognini, Joe and Flynn, Michael and Zhu, Richard and Crow, Dwight and Manilow, Ethan and Le Roux, Jonathan},
- title = {WHAM!: Extending Speech Separation to Noisy Environments},
- booktitle = {Interspeech},
- year = 2019,
- pages = {1368--1372},
- month = sep,
- doi = {10.21437/Interspeech.2019-2821},
- url = {https://www.merl.com/publications/TR2019-099}
- }
,
- "WHAM!: Extending Speech Separation to Noisy Environments", Interspeech, DOI: 10.21437/Interspeech.2019-2821, September 2019, pp. 1368-1372.
-
MERL Contacts:
-
Research Area:
Abstract:
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
Related News & Events
-
NEWS Jonathan Le Roux gives invited talk at CMU's Language Technology Institute Colloquium Date: December 9, 2022
Where: Pittsburg, PA
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader, Jonathan Le Roux, was invited by Carnegie Mellon University's Language Technology Institute (LTI) to give an invited talk as part of the LTI Colloquium Series. The LTI Colloquium is a prestigious series of talks given by experts from across the country related to different areas of language technologies. Jonathan's talk, entitled "Towards general and flexible audio source separation", presented an overview of techniques developed at MERL towards the goal of robustly and flexibly decomposing and analyzing an acoustic scene, describing in particular the Speech and Audio Team's efforts to extend MERL's early speech separation and enhancement methods to more challenging environments, and to more general and less supervised scenarios.
-
NEWS MERL Speech & Audio Researchers Presenting 7 Papers and a Tutorial at Interspeech 2019 Date: September 15, 2019 - September 19, 2019
Where: Graz, Austria
MERL Contacts: Chiori Hori; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.