Automated Alignment Researchers: Using large language models to scale scalable oversight
… AARs, by their nature, are designed to discover ideas that humans might not have considered. But we still need a way to verify whether their ideas and results are sound. …
