NEWS MERL presenting 8 papers at ICASSP 2022

Date released: May 24, 2022

NEWS MERL presenting 8 papers at ICASSP 2022
Date:

May 22, 2022 - May 27, 2022
Where:

Singapore
Description:

MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.

Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.

ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
External Link:

https://2022.ieeeicassp.org/
MERL Contacts:
Research Areas:

Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
- Related Publications
  Higuchi, Y., Moritz, N., Le Roux, J., Hori, T., "Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9746275, April 2022, pp. 7672-7676.
  BibTeX TR2022-026 PDF
  @inproceedings{Higuchi2022apr,
  author = {Higuchi, Yosuke and Moritz, Niko and {Le Roux}, Jonathan and Hori, Takaaki},
  title = {{Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {7672--7676},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP43922.2022.9746275},
  url = {https://www.merl.com/publications/TR2022-026}
  }
  Moritz, N., Hori, T., Watanabe, S., Le Roux, J., "Sequence Transduction with Graph-based Supervision", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9747788, April 2022, pp. 7212-7216.
  BibTeX TR2022-024 PDF
  @inproceedings{Moritz2022apr,
  author = {Moritz, Niko and Hori, Takaaki and Watanabe, Shinji and {Le Roux}, Jonathan},
  title = {{Sequence Transduction with Graph-based Supervision}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {7212--7216},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP43922.2022.9747788},
  url = {https://www.merl.com/publications/TR2022-024}
  }
  Slizovskaia, O., Wichern, G., Wang, Z.-Q., Le Roux, J., "Locate This, Not That: Class-Conditioned Sound Event DOA Estimation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9747604, April 2022, pp. 711-715.
  BibTeX TR2022-023 PDF
  @inproceedings{Slizovskaia2022mar,
  author = {Slizovskaia, Olga and Wichern, Gordon and Wang, Zhong-Qiu and {Le Roux}, Jonathan},
  title = {{Locate This, Not That: Class-Conditioned Sound Event DOA Estimation}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {711--715},
  month = apr,
  doi = {10.1109/ICASSP43922.2022.9747604},
  url = {https://www.merl.com/publications/TR2022-023}
  }
  Petermann, D., Wichern, G., Wang, Z.-Q., Le Roux, J., "The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9746005, April 2022, pp. 526-530.
  BibTeX TR2022-022 PDF Video Data Software
  @inproceedings{Petermann2022apr,
  author = {Petermann, Darius and Wichern, Gordon and Wang, Zhong-Qiu and {Le Roux}, Jonathan},
  title = {{The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {526--530},
  month = apr,
  doi = {10.1109/ICASSP43922.2022.9746005},
  url = {https://www.merl.com/publications/TR2022-022}
  }
  Chang, X., Moritz, N., Hori, T., Watanabe, S., Le Roux, J., "Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9747375, April 2022, pp. 7322-7326.
  BibTeX TR2022-021 PDF
  @inproceedings{Chang2022apr,
  author = {Chang, Xuankai and Moritz, Niko and Hori, Takaaki and Watanabe, Shinji and {Le Roux}, Jonathan},
  title = {{Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {7322--7326},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP43922.2022.9747375},
  url = {https://www.merl.com/publications/TR2022-021}
  }
  Peng, K.-C., "Iterative Self Knowledge Distillation -- From Pothole Classification To Fine-Grained And COVID Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Gan, W.-S. and Ma, K. K., Eds., DOI: 10.1109/ICASSP43922.2022.9746470, April 2022, pp. 3139-3143.
  BibTeX TR2022-020 PDF Video Presentation
  @inproceedings{Peng2022apr,
  author = {Peng, Kuan-Chuan},
  title = {{Iterative Self Knowledge Distillation --- From Pothole Classification To Fine-Grained And COVID Recognition}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  editor = {Gan, W.-S. and Ma, K. K.},
  pages = {3139--3143},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP43922.2022.9746470},
  issn = {1520-6149},
  isbn = {978-1-6654-0541-6},
  url = {https://www.merl.com/publications/TR2022-020}
  }
  Shah, A.P., Geng, S., Gao, P., Cherian, A., Hori, T., Marks, T.K., Le Roux, J., Hori, C., "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2022, pp. 7732-7736.
  BibTeX TR2022-019 PDF
  @inproceedings{Shah2022apr,
  author = {Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Marks, Tim K. and {Le Roux}, Jonathan and Hori, Chiori},
  title = {{Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  pages = {7732--7736},
  month = apr,
  publisher = {IEEE},
  issn = {1520-6149},
  isbn = {978-1-6654-0540-9},
  url = {https://www.merl.com/publications/TR2022-019}
  }
  Yu, J., Wang, P., Koike-Akino, T., Orlik, P.V., "Multi-Modal Recurrent Fusion for Indoor Localization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP43922.2022.9746071, April 2022.
  BibTeX TR2022-018 PDF
  @inproceedings{Yu2022apr,
  author = {Yu, Jianyuan and Wang, Pu and Koike-Akino, Toshiaki and Orlik, Philip V.},
  title = {{Multi-Modal Recurrent Fusion for Indoor Localization}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2022,
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP43922.2022.9746071},
  issn = {2379-190X},
  isbn = {978-1-6654-0540-9},
  url = {https://www.merl.com/publications/TR2022-018}
  }