EVENT MERL Contributes to ICASSP 2024

Date released: March 18, 2024

EVENT MERL Contributes to ICASSP 2024
Date:

Sunday, April 14, 2024 - , April 19, 2024
Location:

Seoul, South Korea
Description:

MERL has made numerous contributions to both the organization and technical program of ICASSP 2024, which is being held in Seoul, Korea from April 14-19, 2024.

Sponsorship and Awards

MERL is proud to be a Bronze Patron of the conference and will participate in the student job fair on Thursday, April 18. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns.

MERL is pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Stéphane G. Mallat, the recipient of the 2024 IEEE Fourier Award for Signal Processing, and Prof. Keiichi Tokuda, the recipient of the 2024 IEEE James L. Flanagan Speech and Audio Processing Award.

Jonathan Le Roux, MERL Speech and Audio Senior Team Leader, will also be recognized during the Awards Ceremony for his recent elevation to IEEE Fellow.

Technical Program

MERL will present 13 papers in the main conference on a wide range of topics including automated audio captioning, speech separation, audio generative models, speech and sound synthesis, spatial audio reproduction, multimodal indoor monitoring, radar imaging, depth estimation, physics-informed machine learning, and integrated sensing and communications (ISAC). Three workshop papers have also been accepted for presentation on audio-visual speaker diarization, music source separation, and music generative models.

Perry Wang is the co-organizer of the Workshop on Signal Processing and Machine Learning Advances in Automotive Radars (SPLAR), held on Sunday, April 14. It features keynote talks from leaders in both academia and industry, peer-reviewed workshop papers, and lightning talks from ICASSP regular tracks on signal processing and machine learning for automotive radar and, more generally, radar perception.

Gordon Wichern will present an invited keynote talk on analyzing and interpreting audio deep learning models at the Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), held on Monday, April 15. He will also appear in a panel discussion on interpretable audio AI at the workshop.

Perry Wang also co-organizes a two-part special session on Next-Generation Wi-Fi Sensing (SS-L9 and SS-L13) which will be held on Thursday afternoon, April 18. The special session includes papers on PHY-layer oriented signal processing and data-driven deep learning advances, and supports upcoming 802.11bf WLAN Sensing Standardization activities.

Petros Boufounos is participating as a mentor in ICASSP’s Micro-Mentoring Experience Program (MiME).

About ICASSP

ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 3000 participants.
MERL Contacts:

Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Hassan Mansour; Kieran Parsons; Joshua Rapp; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
External Link:

https://2024.ieeeicassp.org/
Research Areas:

Artificial Intelligence, Computational Sensing, Machine Learning, Robotics, Signal Processing, Speech & Audio
- Related Publications
  Bralios, D., Wichern, G., Germain, F.G., Pan, Z., Khurana, S., Hori, C., Le Roux, J., "Generation or Replication: Auscultating Audio Latent Diffusion Models", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10447705, March 2024, pp. 1156-1160.
  BibTeX TR2024-027 PDF
  @inproceedings{Bralios2024mar,
  author = {Bralios, Dimitrios and Wichern, Gordon and Germain, François G and Pan, Zexu and Khurana, Sameer and Hori, Chiori and {Le Roux}, Jonathan},
  title = {{Generation or Replication: Auscultating Audio Latent Diffusion Models}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {1156--1160},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10447705},
  url = {https://www.merl.com/publications/TR2024-027}
  }
  Masuyama, Y., Wichern, G., Germain, F.G., Pan, Z., Khurana, S., Hori, C., Le Roux, J., "NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10448477, March 2024, pp. 1016-1020.
  BibTeX TR2024-026 PDF Software
  @inproceedings{Masuyama2024mar,
  author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, François G and Pan, Zexu and Khurana, Sameer and Hori, Chiori and {Le Roux}, Jonathan},
  title = {{NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {1016--1020},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10448477},
  url = {https://www.merl.com/publications/TR2024-026}
  }
  Pan, Z., Wichern, G., Germain, F.G., Khurana, S., Le Roux, J., "NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10446333, March 2024, pp. 11456-11460.
  BibTeX TR2024-025 PDF
  @inproceedings{Pan2024mar,
  author = {Pan, Zexu and Wichern, Gordon and Germain, François G and Khurana, Sameer and {Le Roux}, Jonathan},
  title = {{NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {11456--11460},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10446333},
  url = {https://www.merl.com/publications/TR2024-025}
  }
  Sholokhov, A., Rapp, J., Nabi, S., Brunton, S., Kutz, N., Mansour, H., "Single-pixel imaging of dynamic flows using Neural ODE regularization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10447584, March 2024, pp. 2530-2534.
  BibTeX TR2024-024 PDF
  @inproceedings{Sholokhov2024mar,
  author = {Sholokhov, Aleksei and Rapp, Joshua and Nabi, Saleh and Brunton, Steven and Kutz, Nathan and Mansour, Hassan},
  title = {{Single-pixel imaging of dynamic flows using Neural ODE regularization}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {2530--2534},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10447584},
  url = {https://www.merl.com/publications/TR2024-024}
  }
  Yataka, R., Wang, P., Boufounos, P.T., Takahashi, R., "Radar Perception with Scalable Connective Temporal Relations for Autonomous Driving", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10446449, March 2024, pp. 13266-13270.
  BibTeX TR2024-023 PDF
  @inproceedings{Yataka2024mar,
  author = {Yataka, Ryoma and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei},
  title = {{Radar Perception with Scalable Connective Temporal Relations for Autonomous Driving}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {13266--13270},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10446449},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-023}
  }
  Fujihashi, T., Kato, S., Koike-Akino, T., "Implicit Neural Representation for Low-Overhead Graph-Based Holographic-Type Communications", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10445857, April 2024.
  BibTeX TR2024-022 PDF
  @inproceedings{Fujihashi2024apr,
  author = {Fujihashi, Takuya and Kato, Sorachi and Koike-Akino, Toshiaki},
  title = {{Implicit Neural Representation for Low-Overhead Graph-Based Holographic-Type Communications}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10445857},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-022}
  }
  Fernandez-Menduina, S., Rapp, J., Mansour, H., Greiff, M., Parsons, K., "Tracking Beyond the Unambiguous Range with Modulo Single-Photon Lidar", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10446835, March 2024, pp. 6-10.
  BibTeX TR2024-021 PDF
  @inproceedings{Fernandez-Menduina2024mar,
  author = {Fernandez-Menduina, Samuel and Rapp, Joshua and Mansour, Hassan and Greiff, Marcus and Parsons, Kieran},
  title = {{Tracking Beyond the Unambiguous Range with Modulo Single-Photon Lidar}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {6--10},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10446835},
  url = {https://www.merl.com/publications/TR2024-021}
  }
  Wang, P., Boufounos, P.T., "Monostatic DMG Passive Sensing with Hypothesis Testing", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10447134, March 2024, pp. 13381-13385.
  BibTeX TR2024-020 PDF
  @inproceedings{Wang2024mar,
  author = {Wang, Pu and Boufounos, Petros T.},
  title = {{Monostatic DMG Passive Sensing with Hypothesis Testing}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {13381--13385},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10447134},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-020}
  }
  Kato, S., Wang, P., Koike-Akino, T., Fujihashi, T., Mansour, H., Boufounos, P.T., "Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10445972, March 2024, pp. 13261-13265.
  BibTeX TR2024-019 PDF
  @inproceedings{Kato2024mar,
  author = {Kato, Sorachi and Wang, Pu and Koike-Akino, Toshiaki and Fujihashi, Takuya and Mansour, Hassan and Boufounos, Petros T.},
  title = {{Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {13261--13265},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10445972},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-019}
  }
  Liu, H., Baoueb, T., Fontaine, M., Le Roux, J., Richard, G., "GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10446058, March 2024, pp. 11611-11615.
  BibTeX TR2024-014 PDF
  @inproceedings{Liu2024mar,
  author = {Liu, Haocheng and Baoueb, Teysir and Fontaine, Mathieu and {Le Roux}, Jonathan and Richard, Gaël},
  title = {{GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {11611--11615},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10446058},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-014}
  }
  Baoueb, T., Liu, H., Fontaine, M., Le Roux, J., Richard, G., "SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10446830, March 2024, pp. 986-990.
  BibTeX TR2024-013 PDF
  @inproceedings{Baoueb2024mar,
  author = {Baoueb, Teysir and Liu, Haocheng and Fontaine, Mathieu and {Le Roux}, Jonathan and Richard, Gaël},
  title = {{SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {986--990},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10446830},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-013}
  }
  Hori, C., Wang, P., Rahman, M., Vaca-Rubio, C., Khurana, S., Cherian, A., Le Roux, J., "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10447600, March 2024, pp. 13296-13300.
  BibTeX TR2024-012 PDF
  @inproceedings{Hori2024mar,
  author = {Hori, Chiori and Wang, Pu and Rahman, Mahbub and Vaca-Rubio, Cristian and Khurana, Sameer and Cherian, Anoop and {Le Roux}, Jonathan},
  title = {{Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {13296--13300},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSP48485.2024.10447600},
  issn = {2379-190X},
  isbn = {979-8-3503-4485-1},
  url = {https://www.merl.com/publications/TR2024-012}
  }
  Wu, S.-L., Chang, X., Wichern, G., Jung, J.-W., Germain, F.G., Le Roux, J., Watanabe, S., "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP48485.2024.10447215, March 2024, pp. 316-320.
  BibTeX TR2024-028 PDF
  @inproceedings{Wu2024mar,
  author = {Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, François G and {Le Roux}, Jonathan and Watanabe, Shinji},
  title = {{Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation}},
  booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year = 2024,
  pages = {316--320},
  month = mar,
  doi = {10.1109/ICASSP48485.2024.10447215},
  url = {https://www.merl.com/publications/TR2024-028}
  }
  Pan, Z., Wichern, G., Germain, F.G., Subramanian, A., Le Roux, J., "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization", Hands-free Speech Communication and Microphone Arrays (HSCMA), DOI: 10.1109/ICASSPW62465.2024.10626914, April 2024, pp. 174-178.
  BibTeX TR2024-029 PDF
  @inproceedings{Pan2024apr,
  author = {Pan, Zexu and Wichern, Gordon and Germain, François G and Subramanian, Aswin and {Le Roux}, Jonathan},
  title = {{Late Audio-Visual Fusion for In-The-Wild Speaker Diarization}},
  booktitle = {Hands-free Speech Communication and Microphone Arrays (HSCMA)},
  year = 2024,
  pages = {174--178},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/ICASSPW62465.2024.10626914},
  isbn = {979-8-3503-7451-3},
  url = {https://www.merl.com/publications/TR2024-029}
  }
  Jeon, C.-B., Wichern, G., Germain, F.G., Le Roux, J., "Why does music source separation benefit from cacophony?", IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), DOI: 10.1109/ICASSPW62465.2024.10669899, March 2024, pp. 873-877.
  BibTeX TR2024-030 PDF Video
  @inproceedings{Jeon2024mar,
  author = {Jeon, Chang-Bin and Wichern, Gordon and Germain, François G and {Le Roux}, Jonathan},
  title = {{Why does music source separation benefit from cacophony?}},
  booktitle = {IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA)},
  year = 2024,
  pages = {873--877},
  month = mar,
  publisher = {IEEE},
  doi = {10.1109/ICASSPW62465.2024.10669899},
  isbn = {979-8-3503-7451-3},
  url = {https://www.merl.com/publications/TR2024-030}
  }
  Koo, J., Wichern, G., Germain, F.G., Khurana, S., Le Roux, J., "Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads", IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), April 2024.
  BibTeX TR2024-032 PDF
  @inproceedings{Koo2024apr,
  author = {Koo, Junghyun and Wichern, Gordon and Germain, François G and Khurana, Sameer and {Le Roux}, Jonathan},
  title = {{Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads}},
  booktitle = {IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA)},
  year = 2024,
  month = apr,
  url = {https://www.merl.com/publications/TR2024-032}
  }

Date:

Location:

Description:

MERL Contacts:

External Link:

Research Areas: