SpatialTree: How Spatial Abilities Branch Out in MLLMs

SpatialTree is a cognitive-science-inspired hierarchy and benchmark for evaluating spatial abilities in Large Multimodal Models (MLLMs). It organizes spatial capability into four levels—perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4)—spanning 27 sub-abilities.

🌳 SpatialTree Hierarchy

Distinct from previous benchmarks, our hierarchy moves from basic perception to complex agentic interactions:

capability-tree.mp4

💡 Key Findings

Hierarchy Matters: Higher-level skills (L2-L4) are strongly correlated, while L1 skills are largely orthogonal.
Transfer Dynamics: Strong cross-level transfer exists from low-level to high-level abilities, whereas transfer within L1 can be negative.
Auto-Think: Naive chain-of-thought/reasoning helps complex tasks but hurts intuitive perception. We propose an auto-think strategy that suppresses unnecessary deliberation, enabling RL to consistently improve performance across levels.

📊 Evaluation

We provide evaluation scripts integrated with lmms-eval to test various Large Multimodal Models (LMMs) on SpatialTreeBench.

Installation

First, install lmms-eval and the necessary dependencies:

git clone https://github.com/EvolvingLMMs-Lab/lmms-eval.git
cd lmms-eval
pip install -e .

How to Run

To evaluate a model on SpatialTreeBench via an OpenAI-compatible interface, use the following commands. The task is registered as spatialtreebench.

Evaluate GPT-4o (via OpenAI API)

Configure the judge model settings for LLM-as-a-Judge:

export MODEL_VERSION="gpt-4o-2024-05-13"
export API_TYPE= # openai, azure, async_openai, or async_azure
export OPENAI_API_KEY=xxxx
export OPENAI_API_URL= # Default: https://api.openai.com/v1/chat/completions/v1

You can configure these environment variables directly or modify them in the script. Then, run the evaluation:

bash examples/models/openai_compatible.sh

Metrics & Results

After the evaluation finishes, the results will be saved in the directory specified by --output_path.

The hierarchical scores will also be displayed in the terminal as follows:

--- SpaTreeBench Hierarchical Scores ---
SpaTree
├── L1
├── ├── Geometry
├── ├── ├── Distance
├── ├── ├── Shape
├── ├── └── Size
├── ├── Localization
├── ├── ├── 3D Detection
├── ├── └── 3D Grounding
├── ├── Motion
├── ├── ├── Allo
├── ├── └── Ego
├── ├── Orientation
├── ├── ├── Gravity
├── ├── └── Object Orientation
├── └── Relation
├── └── ├── Correspondence
├── └── └── Relative Direction
├── L2
├── ├── Memory
├── ├── ├── Cognitive Map
├── ├── └── Memory Retrieval
├── └── Underst.
├── └── ├── Affordance
├── └── ├── Motion Understanding
├── └── ├── Perspective Taking
├── └── ├── Relation Understanding
├── └── └── Spatial Caption
├── L3
├── ├── Caus. Reas.
├── ├── ├── Dynamics
├── ├── ├── Geometry Puzzles
├── ├── └── Relation
├── └── Seq. Plan.
├── └── ├── Operation
├── └── └── Route
└── L4
└── ├── Goal-Driven Execution
└── ├── ├── Agentic Navigation
└── ├── └── Robotic Arm
└── └── Open-world Exploration
└── └── ├── Knowledge Acquisition
└── └── └── Self-Goaling
----------------------------------------

You can check results.json for aggregated scores and samples.json for detailed model inputs and predictions.

Dataset Loading

The script automatically handles dataset downloading from Hugging Face: SpatialTree-Bench.

If you need to run the evaluation offline, please download the dataset beforehand and modify the dataset_path in lmms_eval/tasks/spatialtreebench/spatialtreebench.yaml.

🖊️ Citation

If you find this work helpful, please cite our paper:

@article{xiao2025spatialtree,
  title={SpatialTree: How Spatial Abilities Branch Out in MLLMs},
  author={Xiao, Yuxi and Li, Longfei and Yan, Shen and Liu, Xinhang and Peng, Sida and Wei, Yunchao and Zhou, Xiaowei and Kang, Bingyi},
  journal={arXiv preprint arXiv:2512.20617},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
eval.sh		eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpatialTree: How Spatial Abilities Branch Out in MLLMs

🌳 SpatialTree Hierarchy

💡 Key Findings

📊 Evaluation

Installation

How to Run

Evaluate GPT-4o (via OpenAI API)

Metrics & Results

Dataset Loading

🖊️ Citation

About

Uh oh!

Releases

Packages

Languages

ByteDance-Seed/SpatialTree

Folders and files

Latest commit

History

Repository files navigation

SpatialTree: How Spatial Abilities Branch Out in MLLMs

🌳 SpatialTree Hierarchy

💡 Key Findings

📊 Evaluation

Installation

How to Run

Evaluate GPT-4o (via OpenAI API)

Metrics & Results

Dataset Loading

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages