Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers

Jonathan
Le Roux

Toshiaki
Koike-Akino

Ye
Wang

Gordon
Wichern

Anoop
Cherian

Tim K.
Marks

Chiori
Hori

Michael J.
Jones

Kieran
Parsons

Jing
Liu

Daniel N.
Nikovski

Suhas
Lohit

Yoshiki
Masuyama

Matthew
Brand

Pu
(Perry)
Wang
Kuan-Chuan
Peng

Philip V.
Orlik

Diego
Romeres

Moitreya
Chatterjee

Hassan
Mansour

Siddarth
Jain

Petros T.
Boufounos

Radu
Corcodel

William S.
Yerazunis

Pedro
Miraldo

Arvind
Raghunathan

Yebin
Wang

Jianlin
Guo

Hongbo
Sun

Christoph
Boeddeker

Chungwei
Lin

Yanting
Ma

Bingnan
Wang

Stefano
Di Cairano

Saviz
Mowlavi

Anthony
Vetro

Jinyun
Zhang

Vedang M.
Deshpande

Kaen
Kogashi

Christopher R.
Laughman

Dehong
Liu

Alexander
Schperberg

Abraham P.
Vinod

Kenji
Inomata

Kei
Suzuki
-
Awards
-
AWARD MERL team wins the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge Date: April 7, 2025
Awarded to: Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Yoshiki Masuyama; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
The GenDARA Challenge was organized as part of the Generative Data Augmentation (GenDA) workshop at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), and held on April 7, 2025 in Hyderabad, India. Yoshiki Masuyama presented the team's method, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training".
The GenDARA challenge aims to promote the use of generative AI to synthesize RIRs from limited room data, as collecting or simulating RIR datasets at scale remains a significant challenge due to high costs and trade-offs between accuracy and computational efficiency. The challenge asked participants to first develop RIR generation systems capable of expanding a sparse set of labeled room impulse responses by generating RIRs at new source–receiver positions. They were then tasked with using this augmented dataset to train speaker distance estimation systems. Ranking was determined by the overall performance on the downstream SDE task. MERL’s approach to the GenDARA challenge centered on a geometry-aware neural acoustic field model that was first pre-trained on a large external RIR dataset to learn generalizable mappings from 3D room geometry to room impulse responses. For each challenge room, the model was then adapted or fine-tuned using the small number of provided RIRs, enabling high-fidelity generation of RIRs at unseen source–receiver locations. These augmented RIR sets were subsequently used to train the SDE system, improving speaker distance estimation by providing richer and more diverse acoustic training data.
- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
-
AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge Date: December 15, 2024
Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information SecurityBrief- The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS MERL hosts Boston AI Music Meetup Date: March 19, 2026
Where: Cambridge, MA
MERL Contact: Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL hosted the Boston AI Music Meetup on March 19, 2026, bringing together researchers, musicians, and technologists from the local community to explore the intersection of artificial intelligence and music. The event featured talks on emerging approaches in AI-driven audio and creative tools, including a presentation by Elena Georgieva (NYU MARL) on improving audio quality for singing and speech using CLAP-based methods, as well as a talk by Ashvala Vinay (NoneType) on creative workflows using infinite canvas systems. Following the presentations, attendees participated in a networking session, fostering discussion and collaboration across academia and industry.
The Boston AI Music Meetup has been held monthly since 2024 (including a presentation on MERL’s music source separation work in May 2025), and has grown to include over 1,200 subscribers, attracting attendees from across the Northeast. It provides a forum for knowledge exchange and collaboration within the rapidly evolving AI music ecosystem, with discussions spanning music information retrieval, generative AI, and machine learning for creative practice.
- MERL hosted the Boston AI Music Meetup on March 19, 2026, bringing together researchers, musicians, and technologists from the local community to explore the intersection of artificial intelligence and music. The event featured talks on emerging approaches in AI-driven audio and creative tools, including a presentation by Elena Georgieva (NYU MARL) on improving audio quality for singing and speech using CLAP-based methods, as well as a talk by Ashvala Vinay (NoneType) on creative workflows using infinite canvas systems. Following the presentations, attendees participated in a networking session, fostering discussion and collaboration across academia and industry.
-
NEWS Toshiaki Koike-Akino delivers an invited talk as a panelist at OFC 2026 Date: March 17, 2026
MERL Contact: Toshiaki Koike-Akino
Research Areas: Artificial Intelligence, Communications, Machine Learning, Signal ProcessingBrief- MERL researcher Toshiaki Koike-Akino will serve as a panelist at OFC 2026, the premier global event for optical communications and networking, to be held in Los Angeles, March 15–19.
Dr. Koike-Akino will participate in the special panel session titled “Machine Learning is Taking Over Optical Communications—But Which Algorithms Should We Use?” He will deliver a panel talk titled “Scaling AI with Light: AI Is Taking Over Optics — But Optics May Take Over AI.” His talk will discuss the growing synergy between AI and optical technologies, highlighting the emerging vision of leveraging optical physics not only as an application domain for AI, but also as a platform for scaling future AI systems.
- MERL researcher Toshiaki Koike-Akino will serve as a panelist at OFC 2026, the premier global event for optical communications and networking, to be held in Los Angeles, March 15–19.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Private, Secure, and Reliable Artificial Intelligence -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
CV0220: Internship - Visual Simultaneous Localization and Mapping (V-SLAM)
-
ST0238: Internship - Multi-Modal Sensing and Understanding
-
CI0213: Internship - Efficient Foundation Models for Edge Intelligence
See All Internships for Artificial Intelligence -
-
Openings
See All Openings at MERL -
Recent Publications
- , "Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations", IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), May 2026.BibTeX TR2026-035 PDF
- @inproceedings{Aihara2026may2,
- author = {Aihara, Ryo and Masuyama, Yoshiki and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-035}
- }
- , "SUNAC: Source-aware Unified Neural Audio Codec", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.BibTeX TR2026-032 PDF
- @inproceedings{Aihara2026may,
- author = {Aihara, Ryo and Masuyama, Yoshiki and Paissan, Francesco and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{SUNAC: Source-aware Unified Neural Audio Codec}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-032}
- }
- , "Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.BibTeX TR2026-040 PDF
- @inproceedings{Kato2026may,
- author = {Kato, Sorachi and Wang, Pu and Fujihashi, Takuya and Markham, Andrew},
- title = {{Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-040}
- }
- , "Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.BibTeX TR2026-033 PDF
- @inproceedings{Masuyama2026may,
- author = {Masuyama, Yoshiki and Germain, François G and Wichern, Gordon and Hori, Chiori and {Le Roux}, Jonathan},
- title = {{Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-033}
- }
- , "FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2026.BibTeX TR2026-034 PDF
- @inproceedings{Masuyama2026may2,
- author = {Masuyama, Yoshiki and Saijo, Kohei and Paissan, Francesco and Han, Jiangyu and Delcroix, Marc and Aihara, Ryo and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-034}
- }
- , "Evaluating Security Policy Compliance in Infrastructure as Code Generated by Large Language Models", International Symposium on Digital Forensics and Security, March 2026.BibTeX TR2026-036 PDF
- @inproceedings{Ryo2026mar,
- author = {Ryo, Hase and Wang, Ye and Koike-Akino, Toshiaki and Liu, Jing and Parsons, Kieran and Hato, Jumpei},
- title = {{Evaluating Security Policy Compliance in Infrastructure as Code Generated by Large Language Models}},
- booktitle = {International Symposium on Digital Forensics and Security},
- year = 2026,
- month = mar,
- url = {https://www.merl.com/publications/TR2026-036}
- }
- , "Recovering Pulse Waves from Video Using Deep Unrolling and Deep Equilibrium Models", IEEE Transactions on Image Processing, March 2026.BibTeX TR2026-031 PDF
- @article{Shenoy2026mar,
- author = {Shenoy, Vineet and Lohit, Suhas and Mansour, Hassan and Chellappa, Rama and Marks, Tim K.},
- title = {{Recovering Pulse Waves from Video Using Deep Unrolling and Deep Equilibrium Models}},
- journal = {IEEE Transactions on Image Processing},
- year = 2026,
- month = mar,
- url = {https://www.merl.com/publications/TR2026-031}
- }
- , "MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions", IEEE Winter Conference on Applications of Computer Vision (WACV), March 2026.BibTeX TR2026-029 PDF Video Data
- @inproceedings{Kogashi2026mar,
- author = {Kogashi, Kaen and Cherian, Anoop and Kuo, Meng-Yu Jennifer},
- title = {{MMHOI: Modeling Complex 3D Multi-Human Multi-Object Interactions}},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2026,
- month = mar,
- url = {https://www.merl.com/publications/TR2026-029}
- }
- , "Exploring Disentangled Neural Speech Codecs from Self-Supervised Representations", IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), May 2026.
-
Videos
-
Software & Data Downloads
-
MMHOI Dataset: Modeling Complex 3D Multi-Human Multi-Object Interactions -
Open Vocabulary Attribute Detection Dataset -
Long-Tailed Online Anomaly Detection dataset -
Group Representation Networks -
Task-Aware Unified Source Separation -
Local Density-Based Anomaly Score Normalization for Domain Generalization -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Neural IIR Filter Field for HRTF Upsampling and Personalization -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling -
MEL-PETs Defense for LLM Privacy Challenge -
Subject- and Dataset-Aware Neural Field for HRTF Modeling -
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
Learned Born Operator for Reflection Tomographic Imaging -
Embracing Cacophony
-