TR2026-070

M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation


    •  Oller, M., Qian, Q., Corcodel, R., Jain, S., "M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation", 2026 IEEE International Conference on Robotics & Automation (ICRA), June 2026.
      BibTeX TR2026-070 PDF
      • @inproceedings{Oller2026jun,
      • author = {Oller, Miquel and Qian, Qiyang and Corcodel, Radu and Jain, Siddarth},
      • title = {{M-VTOP: Modular Visuo-Tactile Object Pose Estimation for High-Precision Robotic Manipulation}},
      • booktitle = {2026 IEEE International Conference on Robotics \& Automation (ICRA)},
      • year = 2026,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2026-070}
      • }
  • MERL Contacts:
  • Research Area:

    Robotics

Abstract:

Accurate object pose estimation is essential for robotic manipulation, particularly in tasks involving small or geometrically intricate objects where high precision is required. Existing vision, tactile, and hybrid-based approaches struggle with occlusion, noise, and limited generalization, often requiring extensive retraining or large annotated datasets. In this work, we present M-VTOP, a modular framework for in-hand object pose estimation that integrates vision, tactile, and contact sensing in a flexible manner, allowing robustness against noisy or missing modalities. At the core of the framework is a belief-based particle filter that fuses heterogeneous sensor observations, maintains probabilistic estimates, and continuously refines them toward high-precision convergence in closed-loop robotic control with the pose estimation feedback. A mask- based observation representation unifies visual and tactile signals into geometry-centric inputs, enhancing robustness to texture and lighting variations while supporting zero-shot generalization. The framework requires only an object’s CAD model and avoids task-specific retraining. Experiments show that M-VTOP achieves sub-millimeter accuracy under complex geometries, occlusions, and tight tolerances, demonstrating its promise for high-precision robotic manipulation.