CCCL Runtime: A Modern C++ Runtime for CUDA | NVIDIA Technical Blog
…It takes a stream as its first argument to signal stream-ordered operation. Each buffer submits three operations to that stream: allocation from the specified pool, initialization, and eventually deallocation when the…