-
SA0044: Internship - Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
OR0088: Internship - [Robot Learning]
MERL is looking for a highly motivated and qualified PhD student in the areas of machine learning and robotics, to participate in research on advanced algorithms for learning control of robots and other mechanisms. Solid background and hands-on experience with various machine learning algorithms is expected, and in particular with deep learning algorithms for image processing and object detection. Exposure to deep reinforcement learning and/or learning from demonstration is highly desirable. Familiarity with the use of machine learning algorithms for system identification of mechanical systems would be a plus, along with background in other areas of automatic control. Solid experimental skills and hands-on experience in coding in Python, PyTorch, and OpenCV are required for the position. Some experience with ROS2 and familiarity with classical mechanics and computational physics engines would be helpful, but is not required. The position will provide opportunities for exploring fundamental problems in incremental learning in humans and machines, leading to publishable results. The duration of the internship is 3 to 5 months, with a flexible starting date.
Required Specific Experience
- Python, PyTorch, OpenCV
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics
- Host: Daniel Nikovski
- Apply Now
-
OR0087: Internship - Human-Robot Collaboration with Shared Autonomy
MERL is looking for a highly motivated and qualified intern to contribute to research in human-robot interaction (HRI). The ideal candidate is a Ph.D. student with expertise in robotic manipulation, perception, deep learning, probabilistic modeling, or reinforcement learning. We have several research topics available, including assistive teleoperation, visual scene reconstruction, safety in HRI, shared autonomy, intent recognition, cooperative manipulation, and robot learning. The selected intern will work closely with MERL researchers to develop and implement novel algorithms, conduct experiments, and present research findings. We publish our research at top-tier conferences. Start date is flexible, and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their updated CV and list of publications.
Required Specific Experience
- Experience with ROS and deep learning frameworks such as PyTorch are essential.
- Strong programming skills in Python and/or C/C++
- Experience with simulation tools, such as PyBullet, Issac Lab, or MuJoCo.
- Prior experience in human-robot interaction, perception, or robotic manipulation.
- Research Areas: Robotics, Computer Vision, Machine Learning
- Host: Siddarth Jain
- Apply Now
-
CV0051: Internship - Visual-LiDAR fused object detection and recognition
MERL is looking for a self-motivated intern to work on visual-LiDAR fused object detection and recognition using computer vision. The relevant topics in the scope include (but not limited to): open-vocabulary visual-LiDAR object detection and recognition, domain adaptation or generalization in visual-LiDAR object detection, data-efficient methods for visual-LiDAR object detection, small object detection with visual-LiDAR input, etc. The candidates with experiences of object recognition in LiDAR are strongly preferred. The ideal candidate would be a PhD student with a strong background in computer vision and machine learning, and the candidate is expected to have published at least one paper in a top-tier computer vision, machine learning, or artificial intelligence venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI. Proficiency in Python programming and familiarity in at least one deep learning framework are necessary. The ideal candidate is required to collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The duration of the internship is ideally to be at least 3 months with a flexible start date.
Required Specific Experience
- Experience with Python, PyTorch, and datasets with both images and LiDAR (e.g. the nuScenes dataset).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Kuan-Chuan Peng
- Apply Now
-
CV0094: Internship - Instructional Video Generation
We seek a highly motivated intern to conduct original research in generative models for instructional video generation. We are interested in applications to various tasks such as video generation from text, images, and diagrams. The successful candidate will collaborate with MERL researchers to design and implement novel models, conduct experiments, and prepare results for publication. The candidate should be a PhD student (or recent graduate) in computer vision and machine learning with a strong publication record including at least one paper in a top-tier computer vision or machine learning venue such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, AAAI, or TPAMI. Strong programming skills, experience developing and implementing new models in deep learning platforms such as PyTorch, and broad knowledge of machine learning and deep learning methods are expected, including experience in the latest advances in video generation. Start date is flexible; duration should be at least 3 months.
Required Specific Experience
- Experience with video diffusion models, LLMs, and Vision-and-Language Models.
- Experience developing and implementing new models in PyTorch
- At least one paper in a top-tier computer vision or machine learning venue such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, AAAI, or TPAMI.
- Ph.D. student in computer vision or a related field.
- Research Areas: Computer Vision, Artificial Intelligence, Machine Learning
- Host: Tim Marks
- Apply Now
-
CV0063: Internship - Visual Simultaneous Localization and Mapping
MERL is looking for a self-motivated graduate student to work on Visual Simultaneous Localization and Mapping (V-SLAM). Based on the candidate’s interests, the intern can work on a variety of topics such as (but not limited to): camera pose estimation, feature detection and matching, visual-LiDAR data fusion, pose-graph optimization, loop closure detection, and image-based camera relocalization. The ideal candidate would be a PhD student with a strong background in 3D computer vision and good programming skills in C/C++ and/or Python. The candidate must have published at least one paper in a top-tier computer vision, machine learning, or robotics venue, such as CVPR, ECCV, ICCV, NeurIPS, ICRA, or IROS. The intern will collaborate with MERL researchers to derive and implement new algorithms for V-SLAM, conduct experiments, and report findings. A submission to a top-tier conference is expected. The duration of the internship and start date are flexible.
Required Specific Experience
- Experience with 3D Computer Vision and Simultaneous Localization & Mapping.
- Research Areas: Computer Vision, Robotics, Control
- Host: Pedro Miraldo
- Apply Now
-
CV0084: Internship - Vital signs from video using computer vision and AI
MERL is seeking a highly motivated intern to conduct original research in estimating vital signs such as heart rate, heart rate variability, and blood pressure from video of a person. The successful candidate will use the latest methods in deep learning, computer vision, and signal processing to derive and implement new models, collect data, conduct experiments, and prepare results for publication, all in collaboration with MERL researchers. The candidate should be a Ph.D. student in computer vision with a strong publication record and experience in computer vision, signal processing, machine learning, and health monitoring. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI, and possess strong programming skills in Python and Pytorch. Start date is flexible; duration should be at least 3 months.
Required Specific Experience
- Ph.D. student in computer vision or related field.
- Strong programming skills in Python and Pytorch.
- Published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI.
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing, Computational Sensing
- Host: Tim Marks
- Apply Now
-
CV0101: Internship - Multimodal Algorithmic Reasoning
MERL is looking for a self-motivated intern to research on problems at the intersection of multimodal large language models and neural algorithmic reasoning. An ideal intern would be a Ph.D. student with a strong background in machine learning and computer vision. The candidate must have prior experience with training multimodal LLMs for solving vision-and-language tasks. Experience in participating and winning mathematical Olympiads is desired. Publications in theoretical machine learning venues would be a strong plus. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience with training large vision-and-language models
- Experience with solving mathematical reasoning problems
- Experience with programming in Python using PyTorch
- Enrolled in a PhD program
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Anoop Cherian
- Apply Now
-
CV0064: Internship - Robust Estimation for Computer Vision
MERL is looking for a self-motivated graduate student to work on robust estimation in Computer Vision. Based on the candidate’s interests, the intern can work on a variety of topics such as (but not limited to) camera pose estimation, 3D registration, camera calibration, pose-graph optimization, and transformation averaging. The ideal candidate would be a PhD student with a strong background in 3D computer vision, RANSAC, and graduated non-convexity algorithms, and good programming skills in C/C++ and/or Python. The candidate must have published at least one paper in a top-tier computer vision, machine learning, or robotics venue, such as CVPR, ECCV, ICCV, NeurIPS, ICRA, or IROS. The intern will collaborate with MERL researchers to derive and implement new algorithms for V-SLAM, conduct experiments, and report findings. A submission to a top-tier conference is expected. The duration of the internship and start date are flexible.
Required Specific Experience
- Experience with 3D computer vision, RANSAC, or graduated non-convexity algorithms for computer vision.
- Research Areas: Computer Vision, Computational Sensing, Robotics
- Host: Pedro Miraldo
- Apply Now
-
CV0079: Internship - Novel View Synthesis of Dynamic Scenes
MERL is looking for a highly motivated intern to work on an original research project in rendering dynamic scenes from novel views. A strong background in 3D computer vision and/or computer graphics is required. Experience with the latest advances in volumetric rendering, such as neural radiance fields (NeRFs) and Gaussian Splatting (GS), is desired. The successful candidate is expected to have published at least one paper in a top-tier computer vision/graphics or machine learning venue, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks like Pytorch. The candidate will collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The position is available for graduate students on a Ph.D. track or those that have recently graduated with a Ph.D. Duration and start date are flexible but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top computer vision/graphics and/or machine learning venues, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI.
- Experienced in the latest novel-view synthesis approaches such as Neural Radiance Fields (NeRFs) or Gaussian Splatting (GS).
- Proficiency in coding (particularly scripting languages like Python) and familiarity with deep learning frameworks, such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Artificial Intelligence, Machine Learning
- Host: Moitreya Chatterjee
- Apply Now
-
CV0061: Internship - Open-Vocabulary Object Detection
MERL is looking for a highly motivated intern to work on an original research project in open-vocabulary object detection. A strong background in computer vision and deep learning is required. Experience in the latest advances in object detection and open-vocabulary object detection is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, WACV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks like Pytorch. The position is available for graduate students on a Ph.D. track. Duration and start dates are flexible but are expected to last for at least 3 months.
Required Specific Experience
- Graduate student currently in a Ph.D. program
- Publication in computer vision or machine learning conference/journal
- Experience with PyTorch
- Research Area: Computer Vision
- Host: Mike Jones
- Apply Now
-
CV0050: Internship - Anomaly Localization for Industrial Inspection
MERL is looking for a self-motivated intern to work on anomaly localization in industrial inspection setting using computer vision. The relevant topics in the scope include (but not limited to): cross-view image anomaly localization, how to train one model for multiple views and defect types, how to incorporate large foundation models in image anomaly localization, etc. The candidates with experiences of image anomaly localization in industrial inspection settings (e.g., MVTec-AD or VisA datasets) and usage of large foundation models are strongly preferred. The ideal candidate would be a PhD student with a strong background in computer vision and machine learning, and the candidate is expected to have published at least one paper in a top-tier computer vision, machine learning, or artificial intelligence venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI. Proficiency in Python programming and familiarity in at least one deep learning framework are necessary. The ideal candidate is required to collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The duration of the internship is ideally to be at least 3 months with a flexible start date.
Required Specific Experience
- Experience with Python, PyTorch, and large foundation models (e.g. CLIP, ALIGN, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Kuan-Chuan Peng
- Apply Now
-
CV0078: Internship - Audio-Visual Learning with Limited Labeled Data
MERL is looking for a highly motivated intern to work on an original research project on multimodal learning, such as audio-visual learning, using limited labeled data. A strong background in computer vision and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, continual learning, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI.
- Knowledge of the latest self-supervised and weakly-supervised learning techniques.
- Experience with Large (Vision-) Language Models.
- Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Machine Learning, Speech & Audio, Artificial Intelligence
- Host: Moitreya Chatterjee
- Apply Now
-
CV0056: Internship - "Small" Large Generative Models for Vision and Language
MERL is looking for research interns to conduct research into novel architectures for "small" large generative models. We are currently exploring 0.5 - 2 billion parameter language models, text-to-image models and text-to-video models. Interesting research directions include (a) efficient learning for such models that improves the pareto front of current scaling laws for these sizes, (b) enhancing current transformer-based architectures, and (c) new architectural paradigms beyond transformers such as incorporating explicitly temporal designs. Prior experience with machine learning/computer vision/natural language processing research, and proficiency in building and experimenting with machine learning models using a framework like PyTorch are required. Candidates well into their PhD program with publications in top-tier machine learning, natural language processing or computer vision venues, ideally connected to building generative models, are strongly preferred. Candidates are also expected to collaborate with MERL researchers for preparing manuscripts for scientific publications based on the results obtained during the internship. Duration of the internship is 3 months with a flexible start date.
Required Specific Experience
- Research experience with recent vision and text generative models
- Deep understanding of neural network architectures
- Proficiency in machine learning frameworks like PyTorch
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Suhas Lohit
- Apply Now
-
CV0100: Internship - Simulation for Human-Robot Interaction
MERL is looking for a self-motivated intern to develop a simulation platform to train vision-and-language models for dynamic human-robot interaction. The ideal intern must have a strong background in computer graphics, computer vision, and machine learning, as well as experience in using the latest graphics simulation toolboxes and physics engines. Working knowledge of recent multimodal generative AI methods is desired. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing novel realistic 3D interactive scenes for robot learning
- Experience with extending vision-based embodied AI simulators
- Strong foundations in machine learning and programming
- Foundations in optimization, specifically scheduling algorithms, would be a strong plus.
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.)
- Must be enrolled in a graduate program, ideally towards a Ph.D.
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Anoop Cherian
- Apply Now
-
CV0075: Internship - Multimodal Embodied AI
MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing 3D interactive scenes
- Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
- Experience training large language models on multimodal data
- Experience with training reinforcement learning algorithms
- Strong foundations in machine learning and programming
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now
-
CV0060: Internship - Video Anomaly Detection
MERL is looking for a self-motivated intern to work on the problem of video anomaly detection. The intern will help to develop new ideas for improving the state of the art in detecting anomalous activity in videos. The ideal candidate would be a Ph.D. student with a strong background in machine learning and computer vision and some experience with video anomaly detection in particular. Proficiency in Python programming and Pytorch is necessary. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, WACV, ICML, ICLR, NeurIPS or AAAI. The intern will collaborate with MERL researchers to develop and test algorithms and prepare manuscripts for scientific publications. The internship is for 3 months and the start date is flexible.
Required Specific Experience
- Graduate student in Ph.D. program
- Experience with PyTorch.
- Prior publication in computer vision or machine learning conference/journal.
- Research Area: Computer Vision
- Host: Mike Jones
- Apply Now
-
CA0095: Internship - Infrastructure monitoring using quadrotors
MERL seeks graduate students passionate about robotics to collaborate and develop a framework for infrastructure monitoring using quadrotors. The work will involve multi-domain research, including multi-agent planning and control, SLAM, and perception. The methods will be implemented and evaluated on an actual robotic platform (Crazyflies). The results of the internship are expected to be published in top-tier conferences and/or journals. The internship will take place during summer 2025 (exact dates are flexible) with an expected duration of 3-4 months.
Please use your cover letter to explain how you meet the following requirements, preferably with links to papers, code repositories, etc., indicating your proficiency.
Required Specific Experience
- Current enrollment in a PhD program in Mechanical, Electrical Engineering, Computer Science, or related programs, with a focus on Robotics and/or Control Systems
- Experience in some/all of these topics: multi-agent motion planning, constrained control, SLAM, computer vision
- Experience with ROS2 and validation of algorithms on robotic platforms, preferably quadrotors
- Strong programming skills in Python and/or C/C++
Desired Specific Experience
- Experience with Crazyflie quadrotors and the Crazyswarm library
- Experience with the SLAM toolbox in ROS2
- Experience in convex optimization and model predictive control
- Experience with computer vision
- Research Areas: Control, Computer Vision, Optimization, Robotics
- Host: Abraham Vinod
- Apply Now
-
CA0107: Internship - Perception-Aware Control and Planning
MERL is seeking a highly motivated and qualified intern to collaborate with the Control for Autonomy team in the development of visual perception-aware control. The overall objective is to optimize control policy where the perception uncertainty is affected by the chosen policy. Application areas include mobile robotics, drones, autonomous vehicles, and spacecraft. The ideal candidate is expected to be working towards a PhD with a strong emphasis on stochastic optimal control/planning or visual odometry and to have interest and background in as many as possible among: output-feedback optimal control, visual SLAM, POMDP, information fields, motion planning, and machine learning. The expected start date is in the late Spring/Early Summer 2025, for a duration of 3-6 months.
Required Specific Experience
- Current/Past enrollment in a PhD program in Mechanical, Aerospace, Electrical Engineering, or a related field
- 2+ years of research in at least some of: optimal control, motion planning, computer vision, navigation, uncertainty quantification, stochastic planning/control
- Strong programming skills in Python and/or C++
- Research Areas: Machine Learning, Dynamical Systems, Control, Optimization, Robotics, Computer Vision
- Host: Kento Tomita
- Apply Now
-
CA0055: Internship - Human-Collaborative Loco-Manipulation Robots
MERL seeks graduate students passionate about robotics to contribute to the development of a framework for legged robots with manipulator arms to collaborate with human in executing various tasks. The work will involve multi-domain research including planning and control, manipulation, and possibly vision/perception. The methods will be implemented and evaluated in high performance simulators and (time-permitting) in actual robotic platforms. The results of the interns are expected to be published in top-tier robotic conferences and/or journal.
The internship should start in January 2025 (exact date is flexible) with an expected duration 3-6 months depending on agreed scope and intermediate progress.
Required Specific Experience
- Current/Past enrollment in a PhD program in Mechanical, Aerospace, Electrical Engineering, with a concentration in Robotics
- 2+ years of research in at least some of: machine learning, optimization, control, path planning, computer vision
- Experience in design and simulation tools for robotics such as ROS, Mujoco, Gazebo, Isaac Lab
- Strong programming skills in Python and/or C/C++
Additional Desired Experience
- Development of planning and control methods in robotic hardware platforms
- Acquisition and processing of multimodal sensor data, including force/torque and proprioceptive sensors
- Prior experience in human-robot interaction, legged locomotion, mobile manipulation
- Research Areas: Robotics, Control, Machine Learning, Optimization, Computer Vision, Artificial Intelligence
- Host: Stefano Di Cairano
- Apply Now
-
ST0096: Internship - Multimodal Tracking and Imaging
MERL is seeking a motivated intern to assist in developing hardware and algorithms for multimodal imaging applications. The project involves integration of radar, camera, and depth sensors in a variety of sensing scenarios. The ideal candidate should have experience with FMCW radar and/or depth sensing, and be fluent in Python and scripting methods. Familiarity with optical tracking of humans and experience with hardware prototyping is desired. Good knowledge of computational imaging and/or radar imaging methods is a plus.
Required Specific Experience
- Experience with Python and Python Deep Learning Frameworks.
- Experience with FMCW radar and/or Depth Sensors.
- Research Areas: Computer Vision, Machine Learning, Signal Processing, Computational Sensing
- Host: Petros Boufounos
- Apply Now
-
ST0068: Internship - Single-Photon Lidar Algorithms
The Computational Sensing Team at MERL is seeking an intern to work on estimation algorithms for single-photon lidar. The ideal candidate would be a PhD student with a strong background in statistical modeling, estimation theory, computational imaging, or inverse problems. The intern will collaborate with MERL researchers to design new lidar reconstruction algorithms, conduct simulations, and prepare results for publication. A detailed knowledge of single-photon detection, lidar, and Poisson processes is preferred. Hands-on optics experience is beneficial but not required. Strong programming skills in Python or MATLAB are essential. The duration is anticipated to be at least 3 months with a flexible start date.
- Research Areas: Computational Sensing, Computer Vision, Electronic and Photonic Devices, Signal Processing
- Host: Joshua Rapp
- Apply Now