Search: community feedback

Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

…the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected…

May 11, 2026

Paper page - Leveraging Verifier-Based Reinforcement Learning in Image Editing

…AI-generated summary While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is…

May 1, 2026

Paper page - InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

…We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback-based validation. Extensive experiments…

May 1, 2026

Paper page - RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

…In this work, we argue that rubrics should serve not merely as final-answer evaluators, but as the shared interface that structures policy execution , judge feedback , and agent memory . Based on this…

May 13, 2026

Paper page - Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

…Research on the application of RMs in code generation , however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over…

May 4, 2026

Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

…AI-generated summary Recent advancements in agentic test-time scaling allow models to gather environmental feedback before committing to final actions. A key limitation of existing methods is that they typically employ…

May 14, 2026

Paper page - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

…https://huangrh99.github.io/AlphaGRPO/ View arXiv page View PDF Project page GitHub 50 Add to collection Community AlphaGRPO enables multimodal generation RL training across text and image generation for AR-Diffusion…

May 13, 2026

Paper page - Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization

…AI-generated summary Biomolecular generators are often adapted with reward feedback to improve task-specific utility, but pushing utility alone can concentrate generation on a narrow family of candidates. Maintaining diversity is…

May 12, 2026

Paper page - NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

…A label-free policy learning converts free-form feedback into persistent parameter updates of the planner, reshaping subsequent coordination. These three layers co-evolve: reliable skills produce richer memory, richer memory informs…

May 12, 2026

Paper page - Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

…View arXiv page View PDF Project page Add to collection Community i think this is a fascinating work, very timely in this field. thank you for your contribution to the community! the…

May 1, 2026

Followed topics

Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper page - Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper page - InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

Paper page - RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

Paper page - Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Paper page - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Paper page - Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization

Paper page - NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

Paper page - Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists