Project Vend: Can Claude run a small shop? (And why does that matter?)
…of Claude Sonnet 3.7, running for a long period of time. It had the following tools and abilities: A real web search tool for researching products to sell; An email tool…
…of Claude Sonnet 3.7, running for a long period of time. It had the following tools and abilities: A real web search tool for researching products to sell; An email tool…
…Claude Code has become a powerful accelerator for us at Schrödinger. For the projects where it fits best, Claude Code allows us to turn ideas into working code in minutes instead of…
…Related content Agentic coding and persistent returns to expertise Paving the way for agents in biology Measuring LLMs’ impact on N-day exploits In cybersecurity, a large fraction of real-world harm…
…Non-experts performed difficult robotics tasks in a limited time. But in AI, uplift often precedes autonomy. What models can help humans accomplish today, they can frequently do alone tomorrow. Coders no…
…To be clear, we intend to continue scanning open-source code for some time, so we expect this number to rise. One example of an open-source vulnerability that Mythos Preview detected…
…For example, they concentrate their use of AI on more operationally demanding techniques—those that require significant time, oversight, or real-time decision making to carry out—like account discovery, lateral movement…
…taken four hours was accomplished in half the time,” and a 2 to ones like, “Personally, I had AI help me fix code on a website. But it took multiple passes to…
…Related content Agentic coding and persistent returns to expertise Paving the way for agents in biology Measuring LLMs’ impact on N-day exploits In cybersecurity, a large fraction of real-world harm…
…He had vibe-coded a small web app, but when he tried to make it real (authentication, payments, deployment), he lost a week clicking around in browser dashboards. As he summarized, “The…
…Consider: - Number and complexity of human messages - Time reading Claude's responses - Time thinking and formulating questions - Time reviewing outputs and iterating - Realistic typing/reading speeds - Time implementing suggestions or running code…