====== World Labs / Spatial Intelligence ======

**World Labs** is an AI company founded in 2024 by **Fei-Fei Li**, the Stanford professor widely regarded as a pioneer of modern computer vision and co-creator of ImageNet. The company focuses on developing **spatial intelligence** — the ability for AI systems to perceive, reason about, and interact within three-dimensional environments through internal world models.((Fei-Fei Li, "From Words to Worlds: Spatial Intelligence," Substack. [[https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence|drfeifei.substack.com]])) World Labs builds **Large World Models (LWMs)** that represent 3D space, track object positions and relationships, and enable prediction, planning, and action in real or simulated environments.((University of Virginia Data Science, "Spatial Intelligence: The Future of AI." [[https://datascience.virginia.edu/news/spatial-intelligence-future-ai|virginia.edu]]))

===== What Is Spatial Intelligence? =====

Spatial intelligence refers to AI's capacity to build and maintain internal representations of 3D space that integrate:

  * **Perception** — recovering 3D structure from 2D images and sensor data (depth estimation, object detection, segmentation)
  * **Geometry** — understanding shapes, surfaces, volumes, and spatial relationships between objects
  * **Memory** — tracking how objects, agents, and environments change over time
  * **Action** — using spatial understanding to plan and execute physical interactions

Humans naturally think spatially — we predict collisions, navigate cluttered rooms, and imagine how objects fit together. Current AI systems, largely trained on 2D images and text, lack this fundamental capability.((Roboflow, "Spatial Intelligence." [[https://blog.roboflow.com/spatial-intelligence/|roboflow.com]]))

===== Large World Models (LWMs) =====

LWMs are dynamic internal maps of 3D environments that:

  * Track objects and their 3D positions, orientations, and relationships
  * Maintain consistency across viewpoints and over time
  * Predict how scenes will change in response to actions or physical forces
  * Enable planning by simulating future states before committing to actions

Unlike label-based 2D systems that classify images, LWMs create coherent, physics-aware representations suitable for imagination, prediction, and interaction.((University of Virginia Data Science, "Spatial Intelligence: The Future of AI." [[https://datascience.virginia.edu/news/spatial-intelligence-future-ai|virginia.edu]]))

===== Key Technologies =====

  * **Neural Radiance Fields (NeRF)** — representing 3D scenes as continuous neural functions that can synthesize novel viewpoints
  * **3D Scene Graphs** — structured representations of objects, their attributes, and spatial relationships((LearnGeoData, "3D Scene Graphs for Spatial AI with NetworkX and OpenUSD." [[https://learngeodata.eu/3d-scene-graphs-for-spatial-ai-with-networkx-and-openusd/|learngeodata.eu]]))
  * **Depth Estimation** — inferring 3D depth from monocular or stereo images
  * **Vision-Language Models** — combining visual perception with language understanding for spatial reasoning about layouts, navigability, and functional spaces
  * **Gaussian Splatting** — efficient 3D representation for real-time rendering and scene reconstruction

===== Applications =====

==== Robotics ====

Spatial intelligence enables robots to navigate cluttered environments, manipulate objects, collaborate with humans, and understand functional spaces (e.g., where a cup can be placed vs. where it will fall).((NVIDIA Developer Blog, "Building Spatial Intelligence from Real-World 3D Data." [[https://developer.nvidia.com/blog/building-spatial-intelligence-from-real-world-3d-data-using-deep-learning-framework-fvdb/|developer.nvidia.com]]))

==== Augmented and Virtual Reality ====

LWMs support object placement in AR scenes, virtual tours with navigable 3D environments, and immersive content creation that respects real-world physics and spatial constraints.

==== Autonomous Systems ====

Self-driving vehicles, delivery drones, and warehouse robots require real-time spatial reasoning to navigate safely in dynamic environments.

==== Simulation and Digital Twins ====

Creating accurate 3D digital replicas of physical spaces for training AI, testing scenarios, and optimizing real-world operations.

===== Founding and Team =====

  * **Fei-Fei Li** (Founder) — Sequoia Professor of Computer Science at Stanford, co-director of the Stanford Human-Centered AI Institute (HAI), known for ImageNet and advancing computer vision
  * The team draws from researchers at Stanford and leading institutions working on 3D world models, embodied AI, and neural scene representations
  * World Labs raised significant early-stage funding in 2024, positioning itself at the frontier of spatial AI research and commercialization

===== Challenges =====

Current AI models still struggle with complex 3D spatial reasoning tasks, including understanding occlusion, physical plausibility, and multi-object interactions in novel scenes.((MBZUAI, "Why 3D Spatial Reasoning Still Trips Up Today's AI Systems." [[https://mbzuai.ac.ae/news/why-3d-spatial-reasoning-still-trips-up-todays-ai-systems/|mbzuai.ac.ae]])) Building LWMs that generalize across diverse environments remains an open research problem.

===== See Also =====

  * [[computer_vision|Computer Vision]]
  * [[embodied_ai|Embodied AI]]
  * [[neural_radiance_fields|Neural Radiance Fields (NeRF)]]

===== References =====