TR2015-098

Uncertainty Propagation through Deep Neural Networks


    •  Abdelaziz, A.H., Watanabe, S., Hershey, J.R., Vincent, E., Kolossa, D., "Uncertainty Propagation Through Deep Neural Networks", Interspeech, September 2015, vol. 1 or 5, pp. 3561.
      BibTeX TR2015-098 PDF
      • @inproceedings{Abdelaziz2015sep,
      • author = {Abdelaziz, A.H. and Watanabe, S. and Hershey, J.R. and Vincent, E. and Kolossa, D.},
      • title = {Uncertainty Propagation Through Deep Neural Networks},
      • booktitle = {Interspeech},
      • year = 2015,
      • volume = {1 or 5},
      • pages = 3561,
      • month = sep,
      • isbn = {978-1-5108-1790-6},
      • url = {https://www.merl.com/publications/TR2015-098}
      • }
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Some frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)- based ASR systems is no yet clear. In this paper, we study the propagtion of observation uncertainties through the layers of the DNN-based acoustic model. Since this intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which his shown to further improve the ASR accuracy in the CHiME database relation to a highly optimized DNN baseline.