TR2026-070

M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation

- Oller, M., Qian, Q., Corcodel, R., Jain, S., "M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation", 2026 IEEE International Conference on Robotics & Automation (ICRA), June 2026.
  BibTeX TR2026-070 PDF
  - @inproceedings{Oller2026jun,
  - author = {Oller, Miquel and Qian, Qiyang and Corcodel, Radu and Jain, Siddarth},
  - title = {{M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation}},
  - booktitle = {2026 IEEE International Conference on Robotics \& Automation (ICRA)},
  - year = 2026,
  - month = jun,
  - url = {https://www.merl.com/publications/TR2026-070}
  - }
MERL Contacts:
- Radu
  Corcodel
- Siddarth
  Jain
Research Area:

Robotics

Abstract:

Accurate object pose estimation is essential for robotic manipulation, particularly in tasks involving small or geometrically intricate objects where high precision is required. Existing vision, tactile, and hybrid-based approaches struggle with occlusion, noise, and limited generalization, often requiring extensive retraining or large annotated datasets. In this work, we present M-VTOP, a modular framework for in-hand object pose estimation that integrates vision, tactile, and contact sensing in a flexible manner, allowing robustness against noisy or missing modalities. At the core of the framework is a belief-based particle filter that fuses heterogeneous sensor observations, maintains probabilistic estimates, and continuously refines them toward high-precision convergence in closed-loop robotic control with the pose estimation feedback. A mask- based observation representation unifies visual and tactile signals into geometry-centric inputs, enhancing robustness to texture and lighting variations while supporting zero-shot generalization. The framework requires only an object’s CAD model and avoids task-specific retraining. Experiments show that M-VTOP achieves sub-millimeter accuracy under complex geometries, occlusions, and tight tolerances, demonstrating its promise for high-precision robotic manipulation.

Related News & Events

NEWS MERL researchers present 9 papers at IEEE ICRA 2026
Date: June 1, 2026 - June 5, 2026
Where: Vienna, Austria
MERL Contacts: Radu Corcodel; Stefano Di Cairano; Purnanand Elango; Siddarth Jain; Alexander Schperberg; Kento Tomita
Research Areas: Artificial Intelligence, Computer Vision, Control, Dynamical Systems, Machine Learning, Optimization, Robotics
Brief
- MERL researchers presented nine papers at the recently concluded IEEE International Conference on Robotics and Automation (ICRA) 2026 in Vienna, Austria. The papers covered a broad set of topics in robotics, including robot perception, visuo-tactile sensing, contact and pose estimation, manipulation, reinforcement learning, diffusion policies, loco-manipulation, contact-implicit trajectory optimization, legged locomotion, localization, and perception-aware planning.
  
  IEEE ICRA is the flagship conference of the IEEE Robotics and Automation Society and the world’s largest and most comprehensive technical conference focused on research advances and the latest technological developments in robotics. The event attracts nearly 8,000 participants and receives more than 5,000 paper submissions.

MERL Contacts:

RaduCorcodel

SiddarthJain

Research Area:

Abstract:

Radu
Corcodel

Siddarth
Jain