Automated Alignment Researchers: Using large language models to scale scalable oversight
… Here, we’re sharing the questions that drive our research agenda.
… Here, we’re sharing the questions that drive our research agenda.
… Emergent automation patterns As tasks migrate to the API, they may become more exposed to automation. …
… Nevertheless, the automation share was still elevated as compared to nearly one year ago when we first began tracking this measure, suggesting that the underlying trend is still toward greater automation even as the August spike overstated how quickly it was materializing. …
… Designs have organization-scoped sharing. …
… Here, we’re sharing the questions that drive our research agenda.
… Here, we’re sharing the questions that drive our research agenda.
… Here, we’re sharing the questions that drive our research agenda.
… Here, we’re sharing the questions that drive our research agenda.
… This mirrors the arrangements we have with safety institutes in the US, UK, and Japan, where early access and technical information sharing has helped governments build an independent view of where frontier AI is heading, and AI developers increase the safety of their models. …
… Sharing reliable election resources When people come to Claude for information, we want Claude to share the facts, and, when needed, point people to reliable and up-to-date resources. …