TR2025-139

Manual-PA: Learning 3D Part Assembly from Instruction Diagrams


    •  Zhang, J., Cherian, A., Rodriguez, C., Deng, W., Gould, S., "Manual-PA: Learning 3D Part Assembly from Instruction Diagrams", IEEE International Conference on Computer Vision (ICCV), September 2025.
      BibTeX TR2025-139 PDF
      • @inproceedings{Zhang2025sep,
      • author = {Zhang, Jiahao and Cherian, Anoop and Rodriguez, Cristian and Deng, Weijian and Gould, Stephen},
      • title = {{Manual-PA: Learning 3D Part Assembly from Instruction Diagrams}},
      • booktitle = {IEEE International Conference on Computer Vision (ICCV)},
      • year = 2025,
      • month = sep,
      • url = {https://www.merl.com/publications/TR2025-139}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

Abstract:

Assembling furniture amounts to solving the discretecontinuous optimization task of selecting the furniture parts to assemble and estimating their connecting poses in a physically realistic manner. The problem is hampered by its combinatorially large yet sparse solution space thus making learning to assemble a challenging task for current machine learning models. In this paper, we attempt to solve this task by leveraging the assembly instructions provided in diagrammatic manuals that typically accompany the furniture parts. Our key insight is to use the cues in these diagrams to split the problem into discrete and continuous phases. Specifically, we present Manual-PA, a transformer-based instruction Manual-guided 3D Part Assembly framework that learns to semantically align 3D parts with their illustrations in the manuals using a contrastive learning backbone towards predicting the assembly order and infers the 6D pose of each part via relating it to the final furniture depicted in the manual. To validate the efficacy of our method, we conduct experiments on the benchmark PartNet dataset. Our results show that using the diagrams and the order of the parts lead to significant improvements in assembly performance against the state of the art. Further, Manual-PA demonstrates strong generalization to real-world IKEA furniture assembly on the IKEA-Manual dataset.

 

  • Related News & Events

    •  NEWS    MERL Papers, Workshops, and Talks at ICCV 2025
      Date: October 19, 2025 - October 23, 2025
      Where: Honolulu, HI, USA
      MERL Contacts: Petros T. Boufounos; Anoop Cherian; Toshiaki Koike-Akino; Hassan Mansour; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Pu (Perry) Wang
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing
      Brief
      • MERL researchers presented 3 conference papers and 3 workshop papers, co-organized 2 workshops, and delivered 2 invited talks at the IEEE International Conference on Computer Vision (ICCV) 2025, which was held in Honolulu, HI, USA from October 19-23, 2025. ICCV is one of the most prestigious and competitive international conferences in the area of computer vision. Details of MERL contributions are provided below:


        Main Conference Papers:

        1. "SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity" by V. Piedade, C. Sidhartha, J. Gaspar, V. M. Govindu, and P. Miraldo. (Highlight Paper)
        Paper: https://www.merl.com/publications/TR2025-146

        2. "Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts" by C.-A. Yang, K.-C. Peng, and R. A. Yeh.
        Paper: https://www.merl.com/publications/TR2025-124

        3. "Manual-PA: Learning 3D Part Assembly from Instruction Diagrams" by J. Zhang, A. Cherian, C. Rodriguez-Opazo, W. Deng, and S. Gould.
        Paper: https://www.merl.com/publications/TR2025-139


        MERL Co-Organized Workshops:

        1. "The Workshop on Anomaly Detection with Foundation Models (ADFM)" by K.-C. Peng, Y. Zhao, and A. Aich.
        Workshop link: https://adfmw.github.io/iccv25/

        2. "The 8th International Workshop on Computer Vision for Physiological Measurement (CVPM)" by D. McDuff, W. Wang, S. Stuijk, T. Marks, H. Mansour, V. R. Shenoy.
        Workshop link: https://sstuijk.estue.nl/cvpm/cvpm25/


        MERL Keynote Talks at Workshops:

        1. Tim K. Marks, Keynote Speaker at the Workshop on Computer Vision for Physiological Measurement (CVPM).
        Workshop website: https://vineetrshenoy.github.io/cvpmSeptember2025/

        2. Tim K. Marks, Keynote Speaker at the Workshop on Analysis and Modeling of Faces and Gestures (AMFG).
        Workshop website: https://fulab.sites.northeastern.edu/amfg2025/


        Workshop Papers:

        1. "Joint Training of Image Generator and Detector for Road Defect Detection" by K.-C. Peng.
        paper: https://www.merl.com/publications/TR2025-149

        2. "Radar-Conditioned 3D Bounding Box Diffusion for Indoor Human Perception" by R. Yataka, P. Wang, P.T. Boufounos, and R. Takahashi.
        paper: https://www.merl.com/publications/TR2025-154

        3. "L-GGSC: Learnable Graph-based Gaussian Splatting Compression" by S. Kato, T. Koike-Akino, and T. Fujihashi.
        paper: https://www.merl.com/publications/TR2025-148
    •