XBOW tests Anthropic's Mythos Preview for offensive security
… Due to the way we harvest our web benchmarks set, you can actually find the vulnerability from the code alone on that set. So it’s fair to ask: For these benchmarks, can Mythos Preview find an exploit without being allowed to interact with the live site? …