Every new LLM architecture comes with its own inference challenges, from transformer models to hybrid vision language models (VLMs) to state space models (SSMs). Turning a reference implementation into a high-performance inference engine typically requires adding KV cache management, sharding weights across GPUs, fusing operations, and tuning the execution graph for specific hardware. AutoDeploy shifts this workflow toward a compiler-driven approach. Instead of requiring model authors to manually reimplement inference logic, AutoDeploy automatically extracts a computation graph from an off-the