Paper page - SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation
…AI-generated summary Vision-Language Models (VLMs) have advanced rapidly in multimodal perception and language understanding , yet it remains unclear whether they can reliably ground language into spatially coherent, plausibly executable actions…
