TR2016-114

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection


    •  Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., Takeda, K., "Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection", Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), September 2016, pp. 35-39.
      BibTeX TR2016-114 PDF
      • @inproceedings{Hayashi2016sep,
      • author = {Hayashi, Tomoki and Watanabe, Shinji and Toda, Tomoki and Hori, Takaaki and Le Roux, Jonathan and Takeda, Kazuya},
      • title = {Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection},
      • booktitle = {Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)},
      • year = 2016,
      • pages = {35--39},
      • month = sep,
      • url = {https://www.merl.com/publications/TR2016-114}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

In this study, we propose a new method of polyphonic sound event detection based on a Bidirectional Long Short-Term Memory Hidden Markov Model hybrid system (BLSTM-HMM). We extend the hybrid model of neural network and HMM, which achieved stateof-the-art performance in the field of speech recognition, to the multi-label classification problem. This extension provides an explicit duration model for output labels, unlike the straightforward application of BLSTM-RNN. We compare the performance of our proposed method to conventional methods such as non-negative matrix factorization (NMF) and standard BLSTM-RNN, using the DCASE2016 task 2 dataset. Our proposed method outperformed conventional approaches in both monophonic and polyphonic tasks, and finally achieved an average F1 score of 67.1 % (error rate of 64.5 %) on the event-based evaluation, and an average F1-score of 76.0 % (error rate of 50.0 %) on the segment-based evaluation.