Software & Data Downloads — LLMPhy
Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines for parameter-identifiable physical reasoning combining multimodal large language models and physics engines for simulation-grounded world modeling.
Most learning-based approaches to complex physical reasoning overlook the crucial challenge of parameter identification, such as estimating mass and friction, that governs scene dynamics—despite its importance in real-world applications including collision avoidance and robotic manipulation. We present LLMPhy, a black-box optimization framework that integrates large language models (LLMs) with physics simulators for physical reasoning. LLMPhy bridges the textbook physical knowledge embedded in LLMs with world models implemented in modern physics engines, enabling the construction of digital twins of input scenes through the estimation of latent physical parameters. We are publicly releasing our implementation of the core functionalities of LLMPhy, including Python API interfaces between the LLM and MuJoCo, prompts used in both phases, and evaluation of generated solutions against ground truth.
As existing physical reasoning benchmarks rarely account for parameter identifiability, we introduce LLMPhy-TraySim, a synthetic benchmark for parameter-identifiable physical reasoning designed to rigorously evaluate whether modern LLMs and vision-language models (VLMs) can move beyond pattern recognition to recover physical parameters and generalize them across dynamical settings. The central objective of LLMPhy-TraySim is to assess a model’s ability to: (i) infer intrinsic physical properties of a dynamical system, and (ii) predict event outcomes under novel configurations where the underlying physical parameters remain invariant but the scene layout and external perturbations change. Unlike traditional perception benchmarks, this formulation explicitly tests causal and transferable reasoning rather than memorization of visual patterns. We are publicly releasing LLMPhy-TraySim to foster research on this emerging topic and to support benchmarking of novel physical reasoning methods.
-
MERL Contacts
-
Related Research Highlight
Software & Data Downloads
Access software at https://github.com/merlresearch/llmphy.
Access data at https://doi.org/10.5281/zenodo.19740957.


