Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Kieran
Parsons
Daniel N.
Nikovski
Devesh K.
Jha
François
Germain
Suhas
Lohit
Matthew
Brand
Philip V.
Orlik
Pu
(Perry)
WangDiego
Romeres
Petros T.
Boufounos
Moitreya
Chatterjee
Siddarth
Jain
Sameer
Khurana
Jing
Liu
Hassan
Mansour
Kuan-Chuan
Peng
William S.
Yerazunis
Radu
Corcodel
Arvind
Raghunathan
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Pedro
Miraldo
Bingnan
Wang
Ryo
Aihara
Stefano
Di Cairano
Janek
Ebbers
Yoshiki
Masuyama
Saviz
Mowlavi
James
Queeney
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Ankush
Chakrabarty
Vedang M.
Deshpande
Dehong
Liu
Alexander
Schperberg
Wataru
Tsujita
Abraham P.
Vinod
Na
Li
-
Awards
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
See All Awards for Artificial Intelligence -
-
News & Events
-
TALK [MERL Seminar Series 2024] Samuel Clarke presents talk titled Audio for Object and Spatial Awareness Date & Time: Wednesday, October 30, 2024; 1:00 PM
Speaker: Samuel Clarke, Stanford University
MERL Host: Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & AudioAbstract- Acoustic perception is invaluable to humans and robots in understanding objects and events in their environments. These sounds are dependent on properties of the source, the environment, and the receiver. Many humans possess remarkable intuition both to infer key properties of each of these three aspects from a sound and to form expectations of how these different aspects would affect the sound they hear. In order to equip robots and AI agents with similar if not stronger capabilities, our research has taken a two-fold path. First, we collect high-fidelity datasets in both controlled and uncontrolled environments which capture real sounds of objects and rooms. Second, we introduce differentiable physics-based models that can estimate acoustic properties of objects and rooms from minimal amounts of real audio data, then can predict new sounds from these objects and rooms under novel, “unseen” conditions.
-
TALK [MERL Seminar Series 2024] Zhaojian Li presents talk titled A Multi-Arm Robotic System for Robotic Apple Harvesting Date & Time: Wednesday, October 2, 2024; 1:00 PM
Speaker: Zhaojian Li, Mivchigan State University
MERL Host: Yebin Wang
Research Areas: Artificial Intelligence, Computer Vision, Control, RoboticsAbstract- Harvesting labor is the single largest cost in apple production in the U.S. Surging cost and growing shortage of labor has forced the apple industry to seek automated harvesting solutions. Despite considerable progress in recent years, the existing robotic harvesting systems still fall short of performance expectations, lacking robustness and proving inefficient or overly complex for practical commercial deployment. In this talk, I will present the development and evaluation of a new dual-arm robotic apple harvesting system. This work is a result of a continuous collaboration between Michigan State University and U.S. Department of Agriculture.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction
-
-
Internships
-
CA0055: Internship - Human-Collaborative Loco-Manipulation Robots
MERL seeks graduate students passionate about robotics to contribute to the development of a framework for legged robots with manipulator arms to collaborate with human in executing various tasks. The work will involve multi-domain research including planning and control, manipulation, and possibly vision/perception. The methods will be implemented and evaluated in high performance simulators and (time-permitting) in actual robotic platforms. The results of the interns are expected to be published in top-tier robotic conferences and/or journal.
The internship should start in January 2025 (exact date is flexible) with an expected duration 3-6 months depending on agreed scope and intermediate progress.
Required Specific Experience
- Current/Past enrollment in a PhD program in Mechanical, Aerospace, Electrical Engineering, with a concentration in Robotics
- 2+ years of research in at least some of: machine learning, optimization, control, path planning, computer vision
- Experience in design and simulation tools for robotics such as ROS, Mujoco, Gazebo, Isaac Lab
- Strong programming skills in Python and/or C/C++
Additional Desired Experience
- Development of planning and control methods in robotic hardware platforms
- Acquisition and processing of multimodal sensor data, including force/torque and proprioceptive sensors
- Prior experience in human-robot interaction, legged locomotion, mobile manipulation
-
SA0045: Internship - Universal Audio Compression and Generation
We are seeking graduate students interested in helping advance the fields of universal audio compression and generation. We aim to build a single generative model that can perform multiple audio generation tasks conditioned on multimodal context. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are Ph.D. students with experience in some of the following: deep generative modeling, large language models, neural audio codecs. The internship typically lasts 3-6 months.
-
EA0070: Internship - Multi-modal sensor fusion
MERL is looking for a self-motivated intern to work on multi-modal sensor fusion for health condition monitoring and predictive maintenance of motor drive systems. The ideal candidate would be a Ph.D. candidate in electrical engineering or computer science with solid research background in signal processing and machine learning. Experience in motor drive system is a plus. The intern is expected to collaborate with MERL researchers to collect data, explore multi-modal data relationship, and prepare manuscripts for publications. The total duration is anticipated to be 3 months and the start date is flexible.
Required Specific Experience
- Experience with multi-modal sensor fusion.
See All Internships for Artificial Intelligence -
-
Recent Publications
- "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation", Advances in Neural Information Processing Systems (NeurIPS), December 2024.BibTeX TR2024-157 PDF
- @inproceedings{Chen2024dec,
- author = {Chen, Xiangyu and Wang, Ye and Brand, Matthew and Wang, Pu and Liu, Jing and Koike-Akino, Toshiaki}},
- title = {Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2024-157}
- }
, - "SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models", British Machine Vision Conference (BMVC), November 2024.BibTeX TR2024-156 PDF
- @inproceedings{Chen2024nov,
- author = {Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew and Wang, Guanghui and Koike-Akino, Toshiaki}},
- title = {SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models},
- booktitle = {British Machine Vision Conference (BMVC)},
- year = 2024,
- month = nov,
- url = {https://www.merl.com/publications/TR2024-156}
- }
, - "DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels", Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, October 2024.BibTeX TR2024-146 PDF
- @inproceedings{Cornell2024oct,
- author = {Cornell, Samuele and Ebbers, Janek and Douwes, Constance and Martin-Morato, Irene and Harju, Manu and Mesaros, Annamaria and Serizel, Romain}},
- title = {DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels},
- booktitle = {Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-146}
- }
, - "Analyzing Inference Privacy Risks Through Gradients In Machine Learning", ACM Conference on Computer and Communications Security (CCS), October 2024.BibTeX TR2024-141 PDF
- @inproceedings{Li2024oct,
- author = {Li, Zhuohang and Lowy, Andrew and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Malin, Bradley and Wang, Ye}},
- title = {Analyzing Inference Privacy Risks Through Gradients In Machine Learning},
- booktitle = {ACM Conference on Computer and Communications Security (CCS)},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-141}
- }
, - "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection", European Conference on Computer Vision (ECCV), Leonardis, A. and Ricci, E. and Roth, S. and Russakovsky, O. and Sattler, T. and Varol, G., Eds., DOI: 10.1007/978-3-031-73347-5_27, September 2024, pp. 475-491.BibTeX TR2024-130 PDF Video Presentation
- @inproceedings{Hegde2024sep,
- author = {{Hegde, Deepti and Lohit, Suhas and Peng, Kuan-Chuan and Jones, Michael J. and Patel, Vishal M.}},
- title = {Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- editor = {Leonardis, A. and Ricci, E. and Roth, S. and Russakovsky, O. and Sattler, T. and Varol, G.},
- pages = {475--491},
- month = sep,
- publisher = {Springer},
- doi = {10.1007/978-3-031-73347-5_27},
- issn = {0302-9743},
- isbn = {978-3-031-73346-8},
- url = {https://www.merl.com/publications/TR2024-130}
- }
, - "A Probability-guided Sampler for Neural Implicit Surface Rendering", European Conference on Computer Vision (ECCV), September 2024.BibTeX TR2024-129 PDF
- @inproceedings{Pais2024sep,
- author = {Pais, Goncalo and Piedade, Valter and Chatterjee, Moitreya and Greiff, Marcus and Miraldo, Pedro}},
- title = {A Probability-guided Sampler for Neural Implicit Surface Rendering},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-129}
- }
, - "TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement", International Workshop on Acoustic Signal Enhancement (IWAENC), September 2024.BibTeX TR2024-126 PDF Software
- @inproceedings{Saijo2024sep2,
- author = {Saijo, Kohei and Wichern, Gordon and Germain, François G and Pan, Zexu and Le Roux, Jonathan}},
- title = {TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement},
- booktitle = {International Workshop on Acoustic Signal Enhancement (IWAENC)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-126}
- }
, - "Few-shot Transparent Instance Segmentation for Bin Picking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024.BibTeX TR2024-127 PDF
- @inproceedings{Cherian2024sep,
- author = {Cherian, Anoop and Jain, Siddarth and Marks, Tim K.}},
- title = {Few-shot Transparent Instance Segmentation for Bin Picking},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-127}
- }
,
- "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation", Advances in Neural Information Processing Systems (NeurIPS), December 2024.
-
Videos
-
Software & Data Downloads
-
DeepBornFNO -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection (LTAD) Dataset -
neural-IIR-field -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling
-