Paper page - SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?
…The following papers were recommended by the Semantic Scholar API Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews (2026) Teaching Language Models to Check Grounded Claim Factuality with Human Test…