Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers

Anoop
Cherian

Tim K.
Marks

Michael J.
Jones

Suhas
Lohit

Chiori
Hori

Jonathan
Le Roux

Kuan-Chuan
Peng

Moitreya
Chatterjee

Hassan
Mansour

Siddarth
Jain

Matthew
Brand

Pedro
Miraldo

Radu
Corcodel

Petros T.
Boufounos

Daniel N.
Nikovski

Ye
Wang

Anthony
Vetro

Gordon
Wichern

William S.
Yerazunis

Toshiaki
Koike-Akino

Dehong
Liu

Arvind
Raghunathan

Abraham P.
Vinod

Pu
(Perry)
Wang
Avishai
Weiss

Stefano
Di Cairano

Lalit
Manam

Yoshiki
Masuyama

Kaen
Kogashi

Yanting
Ma

Philip V.
Orlik

Joshua
Rapp

Alexander
Schperberg

Huifang
Sun

Kento
Tomita

Yebin
Wang

Kenji
Inomata

Jin
Kato

Jing
Liu

Kei
Suzuki
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
NEWS MERL Presents 7 Papers and 2 Workshops at CVPR 2026 Date: June 3, 2026 - June 7, 2026
Where: Colorado Convention Center, Denver, Colorado
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Kaen Kogashi; Suhas Lohit; Lalit Manam; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researchers are proud to present 7 papers, including two highlight papers (top 3.6% of submissions), and 2 workshops at CVPR 2026. CVPR, taking place from June 3-7 in Denver, CO, USA, is a premier international conference in computer vision.
Papers with MERL Authors:
1. Point4Cast: Streaming Dynamic Scene Reconstruction and Forecasting by Xinhang Liu, Pedro Miraldo, Suhas Lohit, Huaizu Jiang, Naoko Sawada, Yu-Wing Tai, Chi-Keung Tang, and Moitreya Chatterjee (Highlight Paper)
2. Parallel Rigidity Matters for Bundle Adjustment by Lalit Manam and Venu Govindu (Highlight Paper)
3. Revisiting Monocular SLAM with Spatio-Temporal Scene Modeling by Valter Piedade, Lalit Manam, Masashi Yamazaki, and Pedro Miraldo
4. AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects by Danrui Li, Jiahao Zhang, Bernhard Egger, Moitreya Chatterjee, Suhas Lohit, Tim K. Marks, and Anoop Cherian
5. LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction by Tianye Ding, Yiming Xie, Yiqing Liang, Moitreya Chatterjee, Pedro Miraldo, and Huaizu Jiang
6. SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification by Jun-Wei Hsieh, Ying-Hsuan Wu, Yi-Kuan Hsieh, Xin Li, Kuan-Chuan Peng, Ming-Ching Chang (CVPR Findings paper)
7. MMHOI: Complex 3D Multi-Human-Object Interaction Understanding by Kaen Kogashi and Anoop Cherian (PhysHuman Workshop paper)
Workshops Co-Organized by MERL:
1. Multimodal Algorithmic Reasoning Workshop by Anoop Cherian, Suhas Lohit, Kuan-Chuan Peng, Honglu Zhou, Kevin Smith, and Josh Tenenbaum
2. The Third Workshop on Anomaly Detection with Foundation Models by Kuan-Chuan Peng, Ying Zhao, and Abhishek Aich
- MERL researchers are proud to present 7 papers, including two highlight papers (top 3.6% of submissions), and 2 workshops at CVPR 2026. CVPR, taking place from June 3-7 in Denver, CO, USA, is a premier international conference in computer vision.
-
EVENT MERL Contributes to ICASSP 2026 Date: Monday, May 4, 2026 - , May 8, 2026
Location: Barcelona, Spain
MERL Contacts: Wael H. Ali; Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Yanting Ma; Hassan Mansour; Yoshiki Masuyama; Joshua Rapp; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Optimization, Signal Processing, Speech & AudioBrief- MERL has made numerous contributions to both the organization and technical program of ICASSP 2026, which is being held in Barcelona, Spain from May 4-8, 2026.
Sponsorship
MERL is proud to be a Silver Patron of the conference and will participate in the student job fair on Thursday, May 7. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns. MERL Distinguished Research Scientists Petros T. Boufounos and Jonathan Le Roux will also present a spotlight session on MERL’s research in signal processing on Tuesday, May 5 at 13:05. Finally, MERL will sponsor a photo booth on Thursday, May 7 and Friday, May 8, where ICASSP participants can take professional photos with friends and colleagues, which will be emailed to them.
MERL is also pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Nasir Ahmed, the recipient of the 2026 IEEE Fourier Award for Signal Processing, and Dr. Alex Acero, the recipient of the 2026 IEEE James L. Flanagan Speech and Audio Processing Award.
Technical Program
MERL is presenting 8 papers in the main conference on a wide range of topics including source separation, spatial audio, neural audio codecs, radar-based pose estimation, camera-based airflow sensing, radar array processing, and optimization. Another paper on neural speech codecs will be presented at the Low-Resource Audio Codec (LRAC) Satellite Workshop. MERL researchers will also present two articles published in IEEE Open Journal of Signal Processing (OJSP) on music source separation and head-related transfer function (HRTF) modeling. Finally, Speech and Audio Team members Yoshiki Masuyama and Jonathan Le Roux co-organized a Special Session on Neural Spatial Audio Processing, which will feature six oral presentations.
About ICASSP
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 4000 participants each year.
- MERL has made numerous contributions to both the organization and technical program of ICASSP 2026, which is being held in Barcelona, Spain from May 4-8, 2026.
See All News & Events for Computer Vision -
-
Research Highlights
-
Point4Cast: Streaming Dynamic Scene Reconstruction and Forecasting -
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects -
SLAM-MER: Revisiting Monocular SLAM with Spatio-Temporal Scene Modeling -
Parallel Rigidity Matters for Bundle Adjustment -
LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines -
SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity -
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
Video Anomaly Detection -
MERL Shopping Dataset -
Point-Plane SLAM
-
-
Internships
-
CV0225: Internship - Reconstruction/Novel View Synthesis of Dynamic Scenes
-
OR0299: Internship - Human-Robot Interaction
-
CI0213: Internship - Efficient Foundation Models for Edge Intelligence
See All Internships for Computer Vision -
-
Openings
See All Openings at MERL -
Recent Publications
- , "SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification", CVPR Findings, June 2026.BibTeX TR2026-075 PDF
- @inproceedings{Hsieh2026jun2,
- author = {Hsieh, Jun-Wei and Wu, Ying-Hsuan and Hsieh, Yi-Kuan and Li, Xin and Peng, Kuan-Chuan and Chang, Ming-Ching},
- title = {{SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification}},
- booktitle = {CVPR Findings},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-075}
- }
- , "SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification Supplementary Material", CVPR Findings, June 2026.BibTeX TR2026-074 PDF
- @inproceedings{Hsieh2026jun,
- author = {Hsieh, Jun-Wei and Wu, Ying-Hsuan and Hsieh, Yi-Kuan and Li, Xin and Peng, Kuan-Chuan and Chang, Ming-Ching},
- title = {{SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification Supplementary Material}},
- booktitle = {CVPR Findings},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-074}
- }
- , "AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2026.BibTeX TR2026-076 PDF Video Data Software
- @inproceedings{Li2026jun,
- author = {Li, Danrui and Zhang, Jiahao and Egger, Bernhard and Chatterjee, Moitreya and Lohit, Suhas and Marks, Tim K. and Cherian, Anoop},
- title = {{AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects}},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-076}
- }
- , "Point4Cast: Streaming Dynamic Scene Reconstruction and Forecasting", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2026.BibTeX TR2026-077 PDF
- @inproceedings{Liu2026jun,
- author = {Liu, Xinhang and Miraldo, Pedro and Lohit, Suhas and Jiang, Huaizu and Sawada, Naoko and Tai, Yu-Wing and Tang, Chi-Keung and Chatterjee, Moitreya},
- title = {{Point4Cast: Streaming Dynamic Scene Reconstruction and Forecasting}},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-077}
- }
- , "Robotic manipulators excel in structured envi- ronments but face substantial challenges in unstructured and dynamic settings. This paper presents SplatCtrl, a unified framework for real-time scene reconstruction and reactive robot motion generation to enable collision-free robotic arm control in previously unseen and continuously changing environments. Building on 3D Gaussian Splatting (3D-GS), we introduce a hybrid voxel-based filtering and dynamic Gaussian relocation strategy that supports efficient scene reconstruction from RGB- D streams while accommodating environmental changes. For safe and reactive control, we further propose a method for deriving continuous signed distance functions from isotropic Gaussians, providing stable and differentiable collision prob- ability estimates that bridge classical distance fields with the modern implicit representation. These continuous distance met- rics are incorporated into control barrier functions, resulting in a unified perception–action coupling framework that supports smooth and reliable real-time motion generation in response to scene changes. Experimental validation in simulation, on physical robot, and within shared human–robot workspace demonstrates the framework’s effectiveness, achieving inte- grated scene reconstruction and reactive control in uncertain, and dynamic environments.", 2026 IEEE International Conference on Robotics & Automation (ICRA), June 2026.BibTeX TR2026-069 PDF
- @inproceedings{Jain2026jun,
- author = {Jain, Siddarth and Choi, Ho Jin},
- title = {{Robotic manipulators excel in structured envi- ronments but face substantial challenges in unstructured and dynamic settings. This paper presents SplatCtrl, a unified framework for real-time scene reconstruction and reactive robot motion generation to enable collision-free robotic arm control in previously unseen and continuously changing environments. Building on 3D Gaussian Splatting (3D-GS), we introduce a hybrid voxel-based filtering and dynamic Gaussian relocation strategy that supports efficient scene reconstruction from RGB- D streams while accommodating environmental changes. For safe and reactive control, we further propose a method for deriving continuous signed distance functions from isotropic Gaussians, providing stable and differentiable collision prob- ability estimates that bridge classical distance fields with the modern implicit representation. These continuous distance met- rics are incorporated into control barrier functions, resulting in a unified perception–action coupling framework that supports smooth and reliable real-time motion generation in response to scene changes. Experimental validation in simulation, on physical robot, and within shared human–robot workspace demonstrates the framework’s effectiveness, achieving inte- grated scene reconstruction and reactive control in uncertain, and dynamic environments.}},
- booktitle = {2026 IEEE International Conference on Robotics \& Automation (ICRA)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-069}
- }
- , "LIDIA: Localizing In the Dark with Illumination-Awareness toward Perception-Aware Planning", IEEE International Conference on Robotics and Automation (ICRA), June 2026.BibTeX TR2026-071 PDF
- @inproceedings{Velentzas2026jun,
- author = {Velentzas, Iason, G and Kento, Tomita},
- title = {{LIDIA: Localizing In the Dark with Illumination-Awareness toward Perception-Aware Planning}},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-071}
- }
- , "LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2026.BibTeX TR2026-055 PDF
- @inproceedings{Ding2026may,
- author = {Ding, Tianye and Xie, Yiming and Liang, Yiqing and Chatterjee, Moitreya and Miraldo, Pedro and Jiang, Huaizu},
- title = {{LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction}},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-055}
- }
- , "Parallel Rigidity Matters for Bundle Adjustment", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2026.BibTeX TR2026-053 PDF Video Presentation
- @inproceedings{Lalit2026may,
- author = {{Lalit, Manam and Govindu, Venu}},
- title = {{Parallel Rigidity Matters for Bundle Adjustment}},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2026,
- month = may,
- url = {https://www.merl.com/publications/TR2026-053}
- }
- , "SoREL: Soft-Label Refurbishment with Ensemble Learning for Noisy Long-Tailed Classification", CVPR Findings, June 2026.
-
Videos
-
Software & Data Downloads
-
Physics-Aware Assembly of Complex Industrial Objects -
Mitsubishi Electric Research framework for visual SLAM -
Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines -
MMHOI Dataset: Modeling Complex 3D Multi-Human Multi-Object Interactions -
Open Vocabulary Attribute Detection Dataset -
multi-view Radar object dEtection with 3D bounding boX diffusiOn -
SAmple Consensus for Adaptive Graduated Non-Convexity -
Long-Tailed Online Anomaly Detection dataset -
Group Representation Networks -
ComplexVAD Dataset -
Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
-
Explainable Video Anomaly Localization -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering -
MERL BRDF Database -
Understanding Dynamic Compute Allocation in Recurrent Transformers
-