Abstract

Suggest SPACE benchmark
- A benchmark to evaluate spatial cognition abilities of frontier model
It shows that contemporary frontier models are far from achieving the spatial recognition abilities of animals

Motivation

Spatial cognition refers the ability of animals to perceive and interact what they see visually
- They build internal representation of the world to navigate and manipulate
- It’s mentioned in the World Models
Spatial cognition is recognized to be linked to embodiment
- However, current frontier models are trained on disembodied modalities such as text, images and video
Therefore, what this paper want to see is to check such models have emerged spatial congition abilities

fig1

They devided spatial recognition tasks into two sub-categories
- Large-scale spatial cognition
  - In this task, the model is familiarized with an environment and is then asked the estimate quantitative values
- Small-scale spatial cognition
  - about model’s ability to perceive, imagine and mentally transform objects in 2D or 3D

fig2

fig3

tasks to evaluate models’ ability to perceive, imagine and mentally transform objects of shapes
- TL; DR: please refer fig. 2

tab1 tab2

Authors said that contemporary models do not possess same kind of intelligence as humans and animals
Models still fail to cognize the visual inputs
- Models tends to fail on more visual-centric tasks, such as MRT and MCT
- and they have a tendency to get a good performance on text-centric vision tasks, such as SAtt and SAdd