TR2005-023

A Companding Front End for Noise-Robust Automatic Speech Recognition

- Guinness, J., Raj, B., Schmidt-Nielsen, B., Turicchia, L., Sarpeshkar, R., "A Companding Front End for Noise-Robust Automatic Speech Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2005, vol. 1, pp. 249-252.
  BibTeX TR2005-023 PDF
  - @inproceedings{Guinness2005mar,
  - author = {Guinness, J. and Raj, B. and Schmidt-Nielsen, B. and Turicchia, L. and Sarpeshkar, R.},
  - title = {{A Companding Front End for Noise-Robust Automatic Speech Recognition}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2005,
  - volume = 1,
  - pages = {249--252},
  - month = mar,
  - issn = {1520-6149},
  - url = {https://www.merl.com/publications/TR2005-023}
  - }
Research Area:

Speech & Audio

Abstract:

Feature computation models for automatic speech recognition (ASR) systems have long been modeled on the human auditory system. Most current ASR systems model the critical band response and equal loudness characteristics of the auditory system. It has been postulate that more detailed models of the human auditory system can lead to more noise-robust speech recognition. An auditory phenomenon that is of particular relevance to robustness is simultaneous masking, whereby dominant frequencies suppress adjacent weaker frequencies. In this paper we present a companding-based model that mimics simultaneous masking in the front end of a speech recognizer. In an automotive digits recognition task, the front end improves word error rate by 4.0% (25% relative ot Mel cepstra) at -5 dB SNR at the cost of a 1.7% increase at 15 dB SNR.

Related News & Events

NEWS ICASSP 2005: 4 publications by Anthony Vetro, Ajay Divakaran, Huifang Sun and others
Date: March 18, 2005
Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
MERL Contacts: Anthony Vetro; Huifang Sun
Brief
- The papers "Fast Adaptive Fuzzy Post-Filtering for Coding Artifacts Removal in Interlaced Video" by Nie, Y., Kong, H.-S., Vetro, A. and Barner, K., "Video Coding Using 3-D Dual-Tree Discrete Wavelet Transform" by Wang, B., Wang, Y., Selesnick, I. and Vetro, A., "A Companding Front End for Noise-Robust Automatic Speech Recognition" by Guinness, J., Raj, B., Schmidt-Nielsen, B., Turicchia, L. and Sarpeshkar, R. and "Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams" by Xie, L., Kennedy, L., Chang, S.-F., Divakaran, A., Sun, H. and Lin, C.-Y. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).

Research Area:

Abstract: