====== World Labs / Spatial Intelligence ====== **World Labs** is an AI company founded in 2024 by **Fei-Fei Li**, the Stanford professor widely regarded as a pioneer of modern computer vision and co-creator of ImageNet. The company focuses on developing **spatial intelligence** — the ability for AI systems to perceive, reason about, and interact within three-dimensional environments through internal world models.((Fei-Fei Li, "From Words to Worlds: Spatial Intelligence," Substack. [[https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence|drfeifei.substack.com]])) World Labs builds **Large World Models (LWMs)** that represent 3D space, track object positions and relationships, and enable prediction, planning, and action in real or simulated environments.((University of Virginia Data Science, "Spatial Intelligence: The Future of AI." [[https://datascience.virginia.edu/news/spatial-intelligence-future-ai|virginia.edu]])) ===== What Is Spatial Intelligence? ===== Spatial intelligence refers to AI's capacity to build and maintain internal representations of 3D space that integrate: * **Perception** — recovering 3D structure from 2D images and sensor data (depth estimation, object detection, segmentation) * **Geometry** — understanding shapes, surfaces, volumes, and spatial relationships between objects * **Memory** — tracking how objects, agents, and environments change over time * **Action** — using spatial understanding to plan and execute physical interactions Humans naturally think spatially — we predict collisions, navigate cluttered rooms, and imagine how objects fit together. Current AI systems, largely trained on 2D images and text, lack this fundamental capability.((Roboflow, "Spatial Intelligence." [[https://blog.roboflow.com/spatial-intelligence/|roboflow.com]])) ===== Large World Models (LWMs) ===== LWMs are dynamic internal maps of 3D environments that: * Track objects and their 3D positions, orientations, and relationships * Maintain consistency across viewpoints and over time * Predict how scenes will change in response to actions or physical forces * Enable planning by simulating future states before committing to actions Unlike label-based 2D systems that classify images, LWMs create coherent, physics-aware representations suitable for imagination, prediction, and interaction.((University of Virginia Data Science, "Spatial Intelligence: The Future of AI." [[https://datascience.virginia.edu/news/spatial-intelligence-future-ai|virginia.edu]])) ===== Key Technologies ===== * **Neural Radiance Fields (NeRF)** — representing 3D scenes as continuous neural functions that can synthesize novel viewpoints * **3D Scene Graphs** — structured representations of objects, their attributes, and spatial relationships((LearnGeoData, "3D Scene Graphs for Spatial AI with NetworkX and OpenUSD." [[https://learngeodata.eu/3d-scene-graphs-for-spatial-ai-with-networkx-and-openusd/|learngeodata.eu]])) * **Depth Estimation** — inferring 3D depth from monocular or stereo images * **Vision-Language Models** — combining visual perception with language understanding for spatial reasoning about layouts, navigability, and functional spaces * **Gaussian Splatting** — efficient 3D representation for real-time rendering and scene reconstruction ===== Applications ===== ==== Robotics ==== Spatial intelligence enables robots to navigate cluttered environments, manipulate objects, collaborate with humans, and understand functional spaces (e.g., where a cup can be placed vs. where it will fall).((NVIDIA Developer Blog, "Building Spatial Intelligence from Real-World 3D Data." [[https://developer.nvidia.com/blog/building-spatial-intelligence-from-real-world-3d-data-using-deep-learning-framework-fvdb/|developer.nvidia.com]])) ==== Augmented and Virtual Reality ==== LWMs support object placement in AR scenes, virtual tours with navigable 3D environments, and immersive content creation that respects real-world physics and spatial constraints. ==== Autonomous Systems ==== Self-driving vehicles, delivery drones, and warehouse robots require real-time spatial reasoning to navigate safely in dynamic environments. ==== Simulation and Digital Twins ==== Creating accurate 3D digital replicas of physical spaces for training AI, testing scenarios, and optimizing real-world operations. ===== Founding and Team ===== * **Fei-Fei Li** (Founder) — Sequoia Professor of Computer Science at Stanford, co-director of the Stanford Human-Centered AI Institute (HAI), known for ImageNet and advancing computer vision * The team draws from researchers at Stanford and leading institutions working on 3D world models, embodied AI, and neural scene representations * World Labs raised significant early-stage funding in 2024, positioning itself at the frontier of spatial AI research and commercialization ===== Challenges ===== Current AI models still struggle with complex 3D spatial reasoning tasks, including understanding occlusion, physical plausibility, and multi-object interactions in novel scenes.((MBZUAI, "Why 3D Spatial Reasoning Still Trips Up Today's AI Systems." [[https://mbzuai.ac.ae/news/why-3d-spatial-reasoning-still-trips-up-todays-ai-systems/|mbzuai.ac.ae]])) Building LWMs that generalize across diverse environments remains an open research problem. ===== See Also ===== * [[computer_vision|Computer Vision]] * [[embodied_ai|Embodied AI]] * [[neural_radiance_fields|Neural Radiance Fields (NeRF)]] ===== References =====