Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs
An asynchronous architecture that decouples lightweight online mapping from heavyweight VLM reasoning, keeping the 3D scene graph queryable from the first frame while background agents progressively enrich it.