TR2026-045

SPINBENCH: PERSPECTIVE AND ROTATION AS A LENS ON SPATIAL REASONING IN VLMS

- Zhang, Y., Corcodel, R., Hori, C., Cherian, A., Zhao, D., "SpinBench: 3D Rotation as a Lens on Spatial Reasoning in VLMs", International Conference on Learning Representations (ICLR) 2026, April 2026.
  BibTeX TR2026-045 PDF
  - @inproceedings{Zhang2026apr2,
  - author = {Zhang, Yuyou and Corcodel, Radu and Hori, Chiori and Cherian, Anoop and Zhao, Ding},
  - title = {{SpinBench: 3D Rotation as a Lens on Spatial Reasoning in VLMs}},
  - booktitle = {International Conference on Learning Representations (ICLR) 2026},
  - year = 2026,
  - month = apr,
  - url = {https://www.merl.com/publications/TR2026-045}
  - }
MERL Contacts:
Research Area:

Robotics

Abstract:

We present SPINBENCH, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs). SPINBENCH is designed around the core challenge of spatial reasoning: perspective taking, the ability to reason about how scenes and object relations change under viewpoint trans- formation. Since perspective taking requires multiple cognitive capabilities, such as recognizing objects across views, relative positions grounding, and mentally simulating transformations, SPINBENCH introduces a set of fine-grained diagnostic categories. Our categories target translation, rotation, object relative pose, and viewpoint change, and are progressively structured so that single-object simpler tasks scaffold toward the most demanding multi-object perspective-taking setting. We evaluate 43 state-of-the-art VLMs, both proprietary and open source. Results reveal systematic weaknesses: strong egocentric bias, poor rotational understanding, and inconsistencies under symmetrical and syntactic reformulations. Scaling analysis shows both smooth improvements and emergent capabilities. While human subjects achieve high accuracy (91.2%), task difficulty as measured by human response time shows strong correlation with VLM accuracy, indicating that SPIN- BENCH captures spatial reasoning challenges shared across humans and VLMs. Together, our findings highlight the need for structured, cognitively inspired diagnostic tools to advance spatial reasoning in multimodal foundation models.

Related Publication

Zhang, Y., Corcodel, R., Hori, C., Cherian, A., Zhao, D., "SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs", arXiv, September 2025.

BibTeX arXiv

@article{Zhang2025sep3,
author = {Zhang, Yuyou and Corcodel, Radu and Hori, Chiori and Cherian, Anoop and Zhao, Ding},
title = {{SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs}},
journal = {arXiv},
year = 2025,
month = sep,
url = {https://arxiv.org/abs/2509.25390}
}

TR2026-045

SPINBENCH: PERSPECTIVE AND ROTATION AS A LENS ON SPATIAL REASONING IN VLMS

MERL Contacts:

Radu
Corcodel

Chiori
Hori

Anoop
Cherian

Research Area:

Abstract:

Related Publication

MERL Contacts:

RaduCorcodel

ChioriHori

AnoopCherian

Research Area:

Abstract:

Radu
Corcodel

Chiori
Hori

Anoop
Cherian