Search: Model release

Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

…On this subset, no model exceeds 50%, identifying refusal as a new optimization target that current models do not directly address. To prevent contamination, the dataset will be publicly released in late…

May 12, 2026

Paper page - Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Environments

…Extensive experiments on chest X-ray interpretation demonstrate that our 7B model achieves superior robustness, outperforming even proprietary source models in average accuracy. Furthermore, we release CXR-MAX, a large-scale benchmark…

May 7, 2026

Paper page - TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

…🧩 20 encoders · 16 tasks · 87 datasets across 3 suites 🔍 Built to make heterogeneous tabular models directly comparable, and reusable as embedding models Tabular encoders come in every shape: different input formats…

Jun 11, 2026

Introducing the Ettin Reranker Family

…What a clean release, congrats @ tomaarsen ! · Thanks Maxime! You know, I bet the training script would apply pretty cleanly on LFM models as well 👀 Although maybe the model prefers a generative architecture…

May 19, 2026 · Tom Aarsen

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

…Jiaqi Wei , , , , , , Qingyun Wang , Abstract Side-by-Side Interleaved Reasoning enables controlled disclosure timing in autoregressive models, improving accuracy and efficiency through interleaved private reasoning and delayed content release. AI-generated summary…

May 7, 2026

Paper page - Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

…Our findings suggest that for non-English LLMs, semantic concentration through quality filtering offers a more viable path to efficient language modeling than simply maximizing unique data volume. We release our German…

May 5, 2026

Paper page - Fast Byte Latent Transformer

Papers arxiv:2605.08044 Fast Byte Latent Transformer Published on May 8 Submitted by taesiri on May 11 Authors: , , , , , , , Abstract Byte-level language models overcome slow autoregressive generation through diffusion-based parallel…

May 12, 2026

Paper page - Advancing Creative Physical Intelligence in Large Multimodal Models

Papers arxiv:2605.26396 Advancing Creative Physical Intelligence in Large Multimodal Models Published on May 25 Submitted by Cheng Qian on May 28 University of Illinois at Urbana-Champaign Authors: , , , , , , , , , , , , Abstract Large…

May 28, 2026

SmolLM3: smol, multilingual, long-context reasoner

…where hybrid models might combine the best of both worlds! Here's the paper for anyone interested: · nice thoughts Have the synthetic datasets created by Qwen3-32B been released or posted anywhere…

Sep 10, 2025 · Elie Bakouch

Paper page - MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

…Tracing and Attributing Errors in Large Language Model Memory Systems Published on May 27 Submitted by Ningyu Zhang on May 28 alibaba-inc Authors: , , , , , , , , , , , , , , , , , Ningyu Zhang Abstract Memory systems in large language…