TR2023-009

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions


    •  Ota, K., Tung, H.-Y., Smith, K., Cherian, A., Marks, T.K., Sullivan, A., Kanezaki, A., Tenenbaum, J.B., "H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions", IEEE International Conference on Robotics and Automation (ICRA), DOI: 10.1109/​ICRA48891.2023.10160575, May 2023, pp. 7272-7278.
      BibTeX TR2023-009 PDF
      • @inproceedings{Ota2023may,
      • author = {Ota, Kei and Tung, Hsiao-Yu and Smith, Kevin and Cherian, Anoop and Marks, Tim K. and Sullivan, Alan and Kanezaki, Asako and Tenenbaum, Joshua B.},
      • title = {H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions},
      • booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
      • year = 2023,
      • pages = {7272--7278},
      • month = may,
      • publisher = {IEEE},
      • doi = {10.1109/ICRA48891.2023.10160575},
      • url = {https://www.merl.com/publications/TR2023-009}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Robotics

Abstract:

The world is filled with articulated objects that are difficult to determine how to use from vision alone, e.g., a door might open inwards or outwards. Humans handle these objects with strategic trial-and-error: first pushing a door then pulling if that doesn’t work. We enable these capabilities in autonomous agents by proposing “Hypothesize, Simulate, Act, Update, and Repeat” (H-SAUR), a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation. We compare our model with existing work in manipulating objects after a handful of exploration actions, on the PartNet-Mobility dataset. We further propose a novel PuzzleBoxes benchmark that contains locked boxes that require multiple steps to solve. We show that the proposed model significantly outperforms the current state-of- the-art articulated object manipulation framework, despite using zero training data. We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.

 

  • Related News & Events