Search: AI training data

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

…Our data-attribution study reveals that the hardest examples are the most informative: those whose answers never appear in 128 pre-RL samples (only ~18% of training data) drive ~83% of the…

May 13, 2026

Paper page - Orchard: An Open-Source Agentic Modeling Framework

…Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains. View arXiv page View PDF GitHub 66 Add to collection…

May 15, 2026

Paper page - Efficient Training on Multiple Consumer GPUs with RoundPipe

…Efficient Multi-Task LLM Fine-Tuning in Multi-Tenant Datacenters via Spatial-Temporal Backbone Multiplexing (2026) FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training (2026) Tessera: Unlocking…

May 1, 2026

Paper page - How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

…AI-generated summary Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR) when the initial success probability p_0…

May 6, 2026

Paper page - Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

…AI Authors: Ciara Rowles , , , , Abstract Stable-Layers uses reinforcement learning with vision-language model feedback to improve layer decomposition without paired data, employing Flow-GRPO and LoRA adaptation for optimized policy training…

Jun 4, 2026

Paper page - APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

…We propose APEX, the first large-scale multi-task learning framework for AI-generated music , trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement…

May 7, 2026

Open R1: Update #2

…From what I understand, the training uses the default dataset. Since this dataset contains multiple responses per question, I'm curious how the final SFT training data was constructed. Were all the…

Feb 6, 2025

Paper page - AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive

…No dataset linking this paper Cite arxiv.org/abs/2605.11518 in a dataset README.md to link it from this page. No Space linking this paper Cite arxiv.org/abs/2605…

May 13, 2026

Paper page - Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

…Hayate Iso , , , , , , , , , , , , , , , , , Abstract Speculative decoding accelerates RL post-training by preserving output distributions while improving rollout throughput, with projected 2.5x speedup at large scales. AI-generated summary RL post-training of…

Apr 30, 2026

Paper page - MindZero: Learning Online Mental Reasoning With Zero Annotations

…After training, MindZero internalizes model-based reasoning into fast single-pass inference. We evaluate MindZero against baselines across challenging mental reasoning and AI assistance tasks in gridworld and household domains . We found…

Jun 2, 2026

Followed topics

Search