TR2022-024
Sequence Transduction with Graph-based Supervision
-
- "Sequence Transduction with Graph-based Supervision", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9747788, April 2022, pp. 7212-7216.BibTeX TR2022-024 PDF
- @inproceedings{Moritz2022apr,
- author = {Moritz, Niko and Hori, Takaaki and Watanabe, Shinji and Le Roux, Jonathan},
- title = {Sequence Transduction with Graph-based Supervision},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2022,
- pages = {7212--7216},
- month = apr,
- publisher = {IEEE},
- doi = {10.1109/ICASSP43922.2022.9747788},
- url = {https://www.merl.com/publications/TR2022-024}
- }
,
- "Sequence Transduction with Graph-based Supervision", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9747788, April 2022, pp. 7212-7216.
-
MERL Contact:
-
Research Areas:
Abstract:
The recurrent neural network transducer (RNN-T) objective plays a major role in building today’s best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed
CTC-like transducer achieves an improvement of 4.8% on the testother condition of LibriSpeech relative to an equivalent RNN-T based system.
Related News & Events
-
NEWS MERL presenting 8 papers at ICASSP 2022 Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & AudioBrief- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.