Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Anoop
Cherian
Tim K.
Marks
Michael J.
Jones
Chiori
Hori
Suhas
Lohit
Jonathan
Le Roux
Hassan
Mansour
Matthew
Brand
Siddarth
Jain
Moitreya
Chatterjee
Devesh K.
Jha
Radu
Corcodel
Diego
Romeres
Pedro
Miraldo
Kuan-Chuan
Peng
Ye
Wang
Petros T.
Boufounos
Anthony
Vetro
Daniel N.
Nikovski
Gordon
Wichern
Dehong
Liu
William S.
Yerazunis
Sameer
Khurana
Toshiaki
Koike-Akino
Arvind
Raghunathan
Avishai
Weiss
Stefano
Di Cairano
François
Germain
Abraham P.
Vinod
Jose
Amaya
Yanting
Ma
Yoshiki
Masuyama
Philip V.
Orlik
Joshua
Rapp
Huifang
Sun
Pu
(Perry)
WangYebin
Wang
Jing
Liu
Naoko
Sawada
Alexander
Schperberg
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
TALK [MERL Seminar Series 2024] Zhaojian Li presents talk titled A Multi-Arm Robotic System for Robotic Apple Harvesting Date & Time: Wednesday, October 2, 2024; 1:00 PM
Speaker: Zhaojian Li, Mivchigan State University
MERL Host: Yebin Wang
Research Areas: Artificial Intelligence, Computer Vision, Control, RoboticsAbstract- Harvesting labor is the single largest cost in apple production in the U.S. Surging cost and growing shortage of labor has forced the apple industry to seek automated harvesting solutions. Despite considerable progress in recent years, the existing robotic harvesting systems still fall short of performance expectations, lacking robustness and proving inefficient or overly complex for practical commercial deployment. In this talk, I will present the development and evaluation of a new dual-arm robotic apple harvesting system. This work is a result of a continuous collaboration between Michigan State University and U.S. Department of Agriculture.
-
NEWS MERL Papers and Workshops at CVPR 2024 Date: June 17, 2024 - June 21, 2024
Where: Seattle, WA
MERL Contacts: Petros T. Boufounos; Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Jonathan Le Roux; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Jing Liu; Kuan-Chuan Peng; Pu (Perry) Wang; Ye Wang; Matthew Brand
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Speech & AudioBrief- MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.
CVPR Conference Papers:
1. "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models" by H. Ni, B. Egger, S. Lohit, A. Cherian, Y. Wang, T. Koike-Akino, S. X. Huang, and T. K. Marks
This work enables a pretrained text-to-video (T2V) diffusion model to be additionally conditioned on an input image (first video frame), yielding a text+image to video (TI2V) model. Other than using the pretrained T2V model, our method requires no ("zero") training or fine-tuning. The paper uses a "repeat-and-slide" method and diffusion resampling to synthesize videos from a given starting image and text describing the video content.
Paper: https://www.merl.com/publications/TR2024-059
Project page: https://merl.com/research/highlights/TI2V-Zero
2. "Long-Tailed Anomaly Detection with Learnable Class Names" by C.-H. Ho, K.-C. Peng, and N. Vasconcelos
This work aims to identify defects across various classes without relying on hard-coded class names. We introduce the concept of long-tailed anomaly detection, addressing challenges like class imbalance and dataset variability. Our proposed method combines reconstruction and semantic modules, learning pseudo-class names and utilizing a variational autoencoder for feature synthesis to improve performance in long-tailed datasets, outperforming existing methods in experiments.
Paper: https://www.merl.com/publications/TR2024-040
3. "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling" by X. Liu, Y-W. Tai, C-T. Tang, P. Miraldo, S. Lohit, and M. Chatterjee
This work presents a new strategy for rendering dynamic scenes from novel viewpoints. Our approach is based on stratifying the scene into regions based on the extent of motion of the region, which is automatically determined. Regions with higher motion are permitted a denser spatio-temporal sampling strategy for more faithful rendering of the scene. Additionally, to the best of our knowledge, ours is the first work to enable tracking of objects in the scene from novel views - based on the preferences of a user, provided by a click.
Paper: https://www.merl.com/publications/TR2024-042
4. "SIRA: Scalable Inter-frame Relation and Association for Radar Perception" by R. Yataka, P. Wang, P. T. Boufounos, and R. Takahashi
Overcoming the limitations on radar feature extraction such as low spatial resolution, multipath reflection, and motion blurs, this paper proposes SIRA (Scalable Inter-frame Relation and Association) for scalable radar perception with two designs: 1) extended temporal relation, generalizing the existing temporal relation layer from two frames to multiple inter-frames with temporally regrouped window attention for scalability; and 2) motion consistency track with a pseudo-tracklet generated from observational data for better object association.
Paper: https://www.merl.com/publications/TR2024-041
5. "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation" by Z. Yang, J. Liu, P. Chen, A. Cherian, T. K. Marks, J. L. Roux, and C. Gan
We leverage Large Language Models (LLM) for zero-shot semantic audio visual navigation. Specifically, by employing multi-modal models to process sensory data, we instruct an LLM-based planner to actively explore the environment by adaptively evaluating and dismissing inaccurate perceptual descriptions.
Paper: https://www.merl.com/publications/TR2024-043
CVPR Workshop Papers:
1. "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation" by R. Dey, B. Egger, V. Boddeti, Y. Wang, and T. K. Marks
This paper proposes a new method for generating 3D faces and rendering them to images by combining the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN.
Paper: https://www.merl.com/publications/TR2024-045
2. “Tracklet-based Explainable Video Anomaly Localization” by A. Singh, M. J. Jones, and E. Learned-Miller
This paper describes a new method for localizing anomalous activity in video of a scene given sample videos of normal activity from the same scene. The method is based on detecting and tracking objects in the scene and estimating high-level attributes of the objects such as their location, size, short-term trajectory and object class. These high-level attributes can then be used to detect unusual activity as well as to provide a human-understandable explanation for what is unusual about the activity.
Paper: https://www.merl.com/publications/TR2024-057
MERL co-organized workshops:
1. "Multimodal Algorithmic Reasoning Workshop" by A. Cherian, K-C. Peng, S. Lohit, M. Chatterjee, H. Zhou, K. Smith, T. K. Marks, J. Mathissen, and J. Tenenbaum
Workshop link: https://marworkshop.github.io/cvpr24/index.html
2. "The 5th Workshop on Fair, Data-Efficient, and Trusted Computer Vision" by K-C. Peng, et al.
Workshop link: https://fadetrcv.github.io/2024/
3. "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models" by X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand, G. Wang, and T. Koike-Akino
This paper proposes a generalized framework called SuperLoRA that unifies and extends different variants of low-rank adaptation (LoRA). Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance up to 10-fold gain in parameter efficiency for transfer learning tasks.
Paper: https://www.merl.com/publications/TR2024-062
- MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.
See All News & Events for Computer Vision -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
Video Anomaly Detection -
MERL Shopping Dataset -
Point-Plane SLAM
-
-
Internships
-
SA0044: Internship - Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
-
CV0084: Internship - Vital signs from video using computer vision and AI
MERL is seeking a highly motivated intern to conduct original research in estimating vital signs such as heart rate, heart rate variability, and blood pressure from video of a person. The successful candidate will use the latest methods in deep learning, computer vision, and signal processing to derive and implement new models, collect data, conduct experiments, and prepare results for publication, all in collaboration with MERL researchers. The candidate should be a Ph.D. student in computer vision with a strong publication record and experience in computer vision, signal processing, machine learning, and health monitoring. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI, and possess strong programming skills in Python and Pytorch. Start date is flexible; duration should be at least 3 months.
Required Specific Experience
- Ph.D. student in computer vision or related field.
- Strong programming skills in Python and Pytorch.
- Published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI.
-
CV0101: Internship - Multimodal Algorithmic Reasoning
MERL is looking for a self-motivated intern to research on problems at the intersection of multimodal large language models and neural algorithmic reasoning. An ideal intern would be a Ph.D. student with a strong background in machine learning and computer vision. The candidate must have prior experience with training multimodal LLMs for solving vision-and-language tasks. Experience in participating and winning mathematical Olympiads is desired. Publications in theoretical machine learning venues would be a strong plus. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience with training large vision-and-language models
- Experience with solving mathematical reasoning problems
- Experience with programming in Python using PyTorch
- Enrolled in a PhD program
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
See All Internships for Computer Vision -
-
Recent Publications
- "Insert-One: One-Shot Robust Visual-Force Servoing for Novel Object Insertion with 6-DoF Tracking", 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), October 2024.BibTeX TR2024-137 PDF
- @inproceedings{Chang2024oct,
- author = {Chang, Haonan and Boularias, Abdeslam and Jain, Siddarth}},
- title = {Insert-One: One-Shot Robust Visual-Force Servoing for Novel Object Insertion with 6-DoF Tracking},
- booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-137}
- }
, - "Autonomous Horizon-Based Optical Navigation on Near-Planar Cislunar Libration Point Orbits", 4th Space Imaging Workshop, October 2024.BibTeX TR2024-139 PDF
- @inproceedings{Shimane2024oct,
- author = {Shimane, Yuri and Ho, Koki and Weiss, Avishai}},
- title = {Autonomous Horizon-Based Optical Navigation on Near-Planar Cislunar Libration Point Orbits},
- booktitle = {4th Space Imaging Workshop},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-139}
- }
, - "Autonomous Robotic Assembly: From Part Singulation to Precise Assembly", IEEE/RSJ International Conference on Intelligent Robots and Systems., October 2024.BibTeX TR2024-133 PDF
- @inproceedings{Ota2024oct,
- author = {Ota, Kei and Jha, Devesh K. and Jain, Siddarth and Yerazunis, William S. and Corcodel, Radu and Shukla, Yash and Bronars, Antonia and Romeres, Diego}},
- title = {Autonomous Robotic Assembly: From Part Singulation to Precise Assembly},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems.},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-133}
- }
, - "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection", European Conference on Computer Vision (ECCV), Leonardis, A. and Ricci, E. and Roth, S. and Russakovsky, O. and Sattler, T. and Varol, G., Eds., DOI: 10.1007/978-3-031-73347-5_27, September 2024, pp. 475-491.BibTeX TR2024-130 PDF Video Presentation
- @inproceedings{Hegde2024sep,
- author = {{Hegde, Deepti and Lohit, Suhas and Peng, Kuan-Chuan and Jones, Michael J. and Patel, Vishal M.}},
- title = {Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- editor = {Leonardis, A. and Ricci, E. and Roth, S. and Russakovsky, O. and Sattler, T. and Varol, G.},
- pages = {475--491},
- month = sep,
- publisher = {Springer},
- doi = {10.1007/978-3-031-73347-5_27},
- issn = {0302-9743},
- isbn = {978-3-031-73346-8},
- url = {https://www.merl.com/publications/TR2024-130}
- }
, - "A Probability-guided Sampler for Neural Implicit Surface Rendering", European Conference on Computer Vision (ECCV), September 2024.BibTeX TR2024-129 PDF
- @inproceedings{Pais2024sep,
- author = {Pais, Goncalo and Piedade, Valter and Chatterjee, Moitreya and Greiff, Marcus and Miraldo, Pedro}},
- title = {A Probability-guided Sampler for Neural Implicit Surface Rendering},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-129}
- }
, - "Few-shot Transparent Instance Segmentation for Bin Picking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024.BibTeX TR2024-127 PDF
- @inproceedings{Cherian2024sep,
- author = {Cherian, Anoop and Jain, Siddarth and Marks, Tim K.}},
- title = {Few-shot Transparent Instance Segmentation for Bin Picking},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-127}
- }
, - "Disentangled Acoustic Fields For Multimodal Physical Scene Understanding", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024.BibTeX TR2024-125 PDF
- @inproceedings{Yin2024sep,
- author = {Yin, Jie and Luo, Andrew and Du, Yilun and Cherian, Anoop and Marks, Tim K. and Le Roux, Jonathan and Gan, Chuang}},
- title = {Disentangled Acoustic Fields For Multimodal Physical Scene Understanding},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-125}
- }
, - "Multi-Agent Formation Control using Epipolar Constraints", IEEE Robotics and Automation Letters, DOI: 10.1109/LRA.2024.3444690, Vol. 9, No. 12, pp. 11002-11009, September 2024.BibTeX TR2024-147 PDF
- @article{Roque2024sep,
- author = {Roque, Pedro and Miraldo, Pedro and Dimarogonas, Dimos}},
- title = {Multi-Agent Formation Control using Epipolar Constraints},
- journal = {IEEE Robotics and Automation Letters},
- year = 2024,
- volume = 9,
- number = 12,
- pages = {11002--11009},
- month = sep,
- doi = {10.1109/LRA.2024.3444690},
- issn = {2377-3766},
- url = {https://www.merl.com/publications/TR2024-147}
- }
,
- "Insert-One: One-Shot Robust Visual-Force Servoing for Novel Object Insertion with 6-DoF Tracking", 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), October 2024.
-
Videos
-
Software & Data Downloads
-
ComplexVAD Dataset -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection (LTAD) Dataset -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
-
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering
-