TR2026-070
M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation
-
- , "M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation", 2026 IEEE International Conference on Robotics & Automation (ICRA), June 2026.BibTeX TR2026-070 PDF
- @inproceedings{Oller2026jun,
- author = {Oller, Miquel and Qian, Qiyang and Corcodel, Radu and Jain, Siddarth},
- title = {{M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation}},
- booktitle = {2026 IEEE International Conference on Robotics \& Automation (ICRA)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-070}
- }
- , "M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation", 2026 IEEE International Conference on Robotics & Automation (ICRA), June 2026.
-
MERL Contacts:
-
Research Area:
Abstract:
Accurate object pose estimation is essential for robotic manipulation, particularly in tasks involving small or geometrically intricate objects where high precision is required. Existing vision, tactile, and hybrid-based approaches struggle with occlusion, noise, and limited generalization, often requiring extensive retraining or large annotated datasets. In this work, we present M-VTOP, a modular framework for in-hand object pose estimation that integrates vision, tactile, and contact sensing in a flexible manner, allowing robustness against noisy or missing modalities. At the core of the framework is a belief-based particle filter that fuses heterogeneous sensor observations, maintains probabilistic estimates, and continuously refines them toward high-precision convergence in closed-loop robotic control with the pose estimation feedback. A mask- based observation representation unifies visual and tactile signals into geometry-centric inputs, enhancing robustness to texture and lighting variations while supporting zero-shot generalization. The framework requires only an object’s CAD model and avoids task-specific retraining. Experiments show that M-VTOP achieves sub-millimeter accuracy under complex geometries, occlusions, and tight tolerances, demonstrating its promise for high-precision robotic manipulation.

