TR2017-021

End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model

- Watanabe, S., Hori, T., Hayashi, T., Kim, S., "End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model", Acoustical Society of Japan Spring Meeting (ASJ), March 2017.
  BibTeX TR2017-021 PDF
  - @inproceedings{Watanabe2017mar2,
  - author = {Watanabe, Shinji and Hori, Takaaki and Hayashi, Tomoki and Kim, Suyoun},
  - title = {{End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model}},
  - booktitle = {Acoustical Society of Japan Spring Meeting (ASJ)},
  - year = 2017,
  - month = mar,
  - url = {https://www.merl.com/publications/TR2017-021}
  - }
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

This paper introduces Japanese end-to-end ASR system based on a joint CTC/attention scheme [1], which is an extension of attention-based ASR [2] by using multi-task learning to incorporate the Connectionist Temporal Classification (CTC) objective. Unlike the conventional Japanese ASR systems based on DNN/HMM hybrid [3] or end-to-end systems with Japanese syllable characters (i.e., hiragana or katakana) [4], this method directly predicts a Japanese sentence based on a standard Japanese character set including Kanji, hiragana, and katakana characters, Roman/Greek alphabets, Arabic numbers, and so on. Thus, the method does not use any pronunciation dictionary, which requires hand-crafted work by human. In addition, since it's based on character based recognition, it does not require a morphological analyzer to chunk a character sequence to a word sequence. Finally, attention mechanism itself holds a language-model-like function in the decoder network, unlike a Japanese end-to-end system based on CTC [5]. Therefore, it does not require a separate language model module, which makes system construction and decoding process very simple.

Research Areas:

Abstract: