Paper page - Code World Model Preparedness Report
…The following papers were recommended by the Semantic Scholar API Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks (2026) CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge (2026) Evaluating…