TR2010-015

Ultrasonic Sensing for Robust Speech Recognition


Abstract:

In this paper, we present our work using ultrasonic sensing of speech for digit recognition. First, a set of spectral ultrasonic features are developed and tuned in order to achieve optimal performance for the digit recognition task. Using these features, we demonstrate an overall accuracy of 33.00% on a digit recognition task using HMMs with recordings from 6 speakers. The results indicate that ultrasonic sensing of speech is viable, but that further work is needed to achieve word accuracies that match those of audio. Finally, experimental results are presented which demonstrate that fusing information from ultrasound and audio sources show marginal improvements over audio-only performances.

 

  • Related News & Events

    •  NEWS    ICASSP 2010: 9 publications by Anthony Vetro, Shantanu D. Rane and Petros T. Boufounos
      Date: March 14, 2010
      Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
      MERL Contacts: Anthony Vetro; Petros T. Boufounos
      Brief
      • The papers "Privacy and Security of Features Extracted from Minutiae Aggregates" by Nagar, A., Rane, S.D. and Vetro, A., "Hiding Information Inside Structured Shapes" by Das, S., Rane, S.D. and Vetro, A., "Ultrasonic Sensing for Robust Speech Recognition" by Srinivasan, S., Raj, B. and Ezzat, T., "Reconstruction of Sparse Signals from Distorted Randomized Measurements" by Boufounos, P.T., "Disparity Search Range Estimation: Enforcing Temporal Consistency" by Min, D., Yea, S., Arican, Z. and Vetro, A., "Synthesizing Speech from Doppler Signals" by Toth, A.R., Raj, B., Kalgaonkar, K. and Ezzat, T., "Spectrogram Dimensionality Reduction with Independence Constraints" by Wilson, K.W. and Raj, B., "Robust Regression using Sparse Learning for High Dimensional Parameter Estimation Problems" by Mitra, K., Veeraraghavan, A.N. and Chellappa, R. and "Subword Unit Approaches for Retrieval by Voice" by Gouvea, E., Ezzat, T. and Raj, B. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
    •