NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates | NVIDIA Technical Blog
…Across a suite of real kernels (vector add, softmax, Cholesky, attention, Black-Scholes, FFT, matmul), warm JIT compile times are ~1.4x faster on geomean and up to ~2x faster on individual…