Search: AI training data

Paper page - Dynamic Latent Routing

…DLR searches for useful codes, trains the model to reuse them, and lets codes compose into longer thoughts. Across low-data fine-tuning settings, DLR matches or outperforms SFT, with learned codes…

May 15, 2026

Paper page - Key-Value Means

…No dataset linking this paper Cite arxiv.org/abs/2605.09877 in a dataset README.md to link it from this page. No Space linking this paper Cite arxiv.org/abs/2605…

May 12, 2026

Paper page - StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

…AI-generated summary We present StateSMix, a fully self-contained lossless compressor that couples an online-trained Mamba-style State Space Model ( SSM ) with sparse n-gram context mixing and arithmetic coding…

May 6, 2026

Paper page - Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

…In heterogeneous training systems , the total importance ratio should ideally be decomposed into two semantically distinct factors: a training--inference discrepancy term that aligns inference-side and training-side distributions at the…

May 13, 2026

Train 400x faster Static Embedding Models with Sentence Transformers

…writing! I tried this training on a subset of data (AllNLI, GooAQ, MSMacro, PAQ, S2ORC) with batch size 16384. Took 5 hours. w&b: https://api.wandb.ai/links/arunarumugam411-sui/dkcwm6gs…

Aug 9, 2024 · Tom Aarsen

Paper page - Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

…Self-improving language models construct environments for training rather than generating data, utilizing stable solve-verify asymmetry to maintain informative rewards during learning. AI-generated summary We pursue a vision for self…

May 18, 2026

Paper page - D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

…training approach called D-OPSD enables efficient supervised fine-tuning for diffusion models by leveraging on-policy self-distillation with text and multimodal features while preserving few-step inference capabilities. AI-generated…

May 7, 2026

Paper page - Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

…However, existing work remains limited on both evaluation and training: benchmarks such as BRIGHT provide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance…

May 7, 2026

Paper page - HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

…It represents memory across four orthogonal relational graphs — Semantic, Temporal, Causal, and Entity — and introduces a co-evolutionary training framework that jointly optimizes trainable edge features and a query-conditioned QueryRouter MLP…

May 14, 2026

Paper page - Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

…TS-DFM achieves the best perplexity of any discrete-generation baseline we compare against, including methods trained on 6x more data or using 5x larger models. View arXiv page View PDF Add…

May 11, 2026

Followed topics