Search: containment and access

Paper page - WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

… Each task averages roughly 8 minutes of wall-clock time and over 20 tool calls, and runs inside a reproducible Docker container hosting an actual CLI agent harness OpenClaw, Claude Code, Codex, or Hermes Agent with access to real tools rather than mock services. …

May 15, 2026

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

… Login Succeeded $ export NIM IMAGE=llm-nim export HF TOKEN=hf ... export MODEL=hf://microsoft/Phi-3-mini-4k-instruct-gguf export NGC API KEY=nv... export LOCAL NIM CACHE=~/.cache/nim docker run --rm --gpus all \ --shm-size=16GB \ --network=host \ -u $ id -u \ -v $ pwd /nim cache:/opt/nim/.cache \ -… …

Jul 24, 2025

Paper page - HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

… To address this, we leverage governmental white papers as a scalable source for benchmark construction beyond English, as they contain naturally occurring charts and tables across diverse formats and domains and are freely accessible in many countries. …

Jun 2, 2026

Paper page - Personal AI Agent for Camera Roll VQA

… In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer queries, ranging from simple factual questions e.g., Name of the food I tried yesterday?'' to more open-ended ones e.g., Recommend some dishes I have never eaten before'' . …

Jun 5, 2026

Paper page - Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

… Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in legal AI increasingly depends on access to authoritative legal text at scale. …

Jun 19, 2026

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

… Specify retain graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. · hi, have you solve the problem, i'm encounting the same issue. …

Feb 29, 2024 · Sanchit Gandhi

Paper page - ABot-Earth 0.5: Generative 3D Earth Model

… The framework is designed for accessibility, with integrated hierarchical level-of-detail LOD structures that permit real-time, interactive visualization on web-based map engines. …

Jun 10, 2026

Paper page - ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

… When an agent loads a skill, it may gain new ways to invoke tools, access context, issue subtasks, install dependencies, or interact with external services. …

Jun 3, 2026

Paper page - HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

… But with messy specifications, a few details removed or obscured — just like what is often found in real-world scenarios — the best model's performance drops to 24% — even with access to a tool it can use to ask for help. …

May 5, 2026

Paper page - Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

… Building on the principle of on-policy distillation OPD , PRISM casts alignment as a black-box, response-level adversarial game between the policy and a Mixture-of-Experts MoE discriminator with dedicated perception and reasoning experts, providing disentangled corrective signals that steer the pol… …

May 6, 2026

Followed topics