TR2015-138

Robust speech recognition in unknown reverberant and noisy conditions


    •  Hsiao, R., Ma, J., Hartmann, W., Karafiat, M., Grezl, F., Burget, L., Szoke, I., Cernocky, J., Watanabe, S., Chen, Z., Mallidi, S.H., Hermansky, H., Tsakalidis, S., Schwartz, R., "Robust Speech Recognition in Unknown Reverberant and Noisy Conditions", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), DOI: 10.1109/​ARSU.2015.7404841, December 2015, pp. 533-538.
      BibTeX TR2015-138 PDF
      • @inproceedings{Hsiao2015dec,
      • author = {Hsiao, R. and Ma, J. and Hartmann, W. and Karafiat, M. and Grezl, F. and Burget, L. and Szoke, I. and Cernocky, J. and Watanabe, S. and Chen, Z. and Mallidi, S.H. and Hermansky, H. and Tsakalidis, S. and Schwartz, R.},
      • title = {Robust Speech Recognition in Unknown Reverberant and Noisy Conditions},
      • booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
      • year = 2015,
      • pages = {533--538},
      • month = dec,
      • publisher = {IEEE},
      • doi = {10.1109/ARSU.2015.7404841},
      • url = {https://www.merl.com/publications/TR2015-138}
      • }
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.