TR2017-021
End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model
-
- "End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model", Acoustical Society of Japan Spring Meeting (ASJ), March 2017.BibTeX TR2017-021 PDF
- @inproceedings{Watanabe2017mar2,
- author = {Watanabe, Shinji and Hori, Takaaki and Hayashi, Tomoki and Kim, Suyoun},
- title = {End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model},
- booktitle = {Acoustical Society of Japan Spring Meeting (ASJ)},
- year = 2017,
- month = mar,
- url = {https://www.merl.com/publications/TR2017-021}
- }
,
- "End-to-end ASR without using morphological analyzer, pronunciation dictionary and language model", Acoustical Society of Japan Spring Meeting (ASJ), March 2017.
-
Research Areas:
Abstract:
This paper introduces Japanese end-to-end ASR system based on a joint CTC/attention scheme [1], which is an extension of attention-based ASR [2] by using multi-task learning to incorporate the Connectionist Temporal Classification (CTC) objective. Unlike the conventional Japanese ASR systems based on DNN/HMM hybrid [3] or end-to-end systems with Japanese syllable characters (i.e., hiragana or katakana) [4], this method directly predicts a Japanese sentence based on a standard Japanese character set including Kanji, hiragana, and katakana characters, Roman/Greek alphabets, Arabic numbers, and so on. Thus, the method does not use any pronunciation dictionary, which requires hand-crafted work by human. In addition, since it's based on character based recognition, it does not require a morphological analyzer to chunk a character sequence to a word sequence. Finally, attention mechanism itself holds a language-model-like function in the decoder network, unlike a Japanese end-to-end system based on CTC [5]. Therefore, it does not require a separate language model module, which makes system construction and decoding process very simple.