Search: real-time coding

Paper page - τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation

…The model is trained on approximately 27{,}300 hours of real-robot teleoperation, UMI-style interaction, egocentric human videos, and rollout or failure trajectories using modality-specific supervision masks. At inference time…

Jun 2, 2026

Paper page - SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

…A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling (2026) Lighting-grounded Video Generation with Renderer-based Agent Reasoning (2026) Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE (2026) From…

May 15, 2026

Paper page - DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Robot manipulation critically depends on perception that preserves the action-relevant aspects of a scene. Yet most robot learning pipelines are built upon visual…

May 29, 2026

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

…640, "video.codec": "av1", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "video.fps": 30, "video.channels": 3, "has_audio": false } }, "timestamp": { "dtype": "float32", "shape": [ 1 ], "names": null }, "frame_index": { "dtype…

Sep 17, 2025

Paper page - Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

…object bridge tools, an injectable Resource Provider Substrate , deterministic demos , real-model smoke scripts, and 123 regression tests at the time of writing. Rather than improving planner accuracy, Agent libOS demonstrates a…

Jun 4, 2026

FastRTC: The Real-Time Communication Library for Python

Jan 12, 2025 · Freddy Boulton

Paper page - Personal AI Agent for Camera Roll VQA

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera…

Jun 5, 2026

Paper page - Robotic Policy Adaptation via Weight-Space Meta-Learning

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations…

Jun 9, 2026

Paper page - Towards One-to-Many Temporal Grounding

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios…

Jun 5, 2026

Paper page - EarlyTom: Early Token Compression Completes Fast Video Understanding

…early in the vision encoder to reduce time-to-first-token and computational costs while maintaining model accuracy. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Video large language models (Video-LLMs…

May 29, 2026

Followed topics

Search