Internship Openings

CV0267: Internship - Audio-Visual Learning for Spatial Audio Processing
- MERL is looking for a highly motivated intern to work on an original research project on audio-visual learning, with a focus on spatial audio, training models using limited labeled data. A strong background in computer vision, audio processing, and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, Room Impulse Response (RIR) estimation, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
  Required Specific Experience
  - Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI.
  - Knowledge of the latest self-supervised and weakly-supervised learning techniques.
  - Experience with Large (Vision-) Language Models, Spatial audio processing techniques.
  - Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
  The pay range for this internship position will be $6-8K per month.
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
- Host: Moitreya Chatterjee
- Apply Now
CI0197: Internship - Embodied AI & Humanoid Robotics
- Those who are passionate about pushing the boundaries of embodied AI, join our cutting-edge research team as an intern and contribute to the development of generalist AI agents for humanoid robots. This is a unique opportunity to work on impactful projects aimed at publishing in top-tier AI and robotics venues.
  What We’re Looking For
  We’re seeking highly motivated individuals with:
  - Advanced research experience in robotic AI, edge AI, and agentic AI systems.
  - Hands-on expertise in Vision-Language-Action (VLA) models and Foundation Models
  - Strong proficiency with Python, PyTorch/JAX, deep learning, and robotic agent frameworks
  Internship Details
  - Duration: ~3 months
  - Start Date: Flexible
  - Goal: Publish research at leading AI/robotics conferences and journals
  If you're excited about shaping the future of humanoid robotics and AI agents, we’d love to hear from you!
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Applied Physics, Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Signal Processing, Speech & Audio, Optimization
- Host: Toshi Koike-Akino
- Apply Now
SA0188: Internship - Audio separation, generation, and analysis
- We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
  The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, probabilistic modeling, and deep generative modeling.
  Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Speech & Audio, Machine Learning, Artificial Intelligence
- Host: Jonathan Le Roux
- Apply Now
SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
- We are looking for a graduate student interested in advancing the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring with a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with a flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
  Required Specific Experience
  - Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now

What We’re Looking For

Internship Details