Search: Dispatch

Win on TCO: How AMD Instinct™ MI355X Achieves Cost-Competitive Distributed Inference Through SGLang with MoRI

… Two-Batch Overlap TBO hides this latency by interleaving communication and compute across two micro-batches: MicroBatch A dispatch sends quantized tokens over the network on a dedicated communication stream While network transfer is in flight, MicroBatch B attention computes on the main compute str… …

May 27, 2026 · Bill He

AOCL-LibMem

… These dynamically dispatched binaries are built using Glibc version 2.34 or later. …

AOCL-Compression

… Highlights of AOCL-Compression 5.3 Refactored dynamic dispatch infrastructure to improve runtime robustness and correctness across all x86 platforms by enhancing CPU feature detection, dispatch selection logic, and fallback handling Introduced a local thread control API enabling applications to man… …

AOCL-LibM (AMD Math Library)

… Highlights of AOCL-LibM 5.3 Added new statistical functions erfinv, erfcinv, cdfnorm, and cdfnorminv with scalar, vector vrd2/vrd4/vrd8 , and vector array vrda variants Added round function support with full vector variant coverage Performance improvements in log2f and round functions Dynamic Dispa… …

AMD Optimizing CPU Libraries (AOCL)

… Release Highlights : Added New Complex Radix Kernels – Radix-20 & Radix-48 Introduced New Solvers: Complex: Buffered, Split Radix, Batched CT One-level Direct Real: N-Dim, Size-one Enhanced dynamic dispatcher functionality across x86 architectures Performance optimizations in Complex Radix-4 & Radi… …

AOCL-Utils

… While each project may have its own mechanisms for CPU identification and dynamic dispatch, AOCL-Utils serves as a common foundation for updating and validating CPU-related information. …

AOCL-FFTZ

… Highlights of AOCL-FFTZ 5.3 Added new complex radix kernels: Radix-20 and Radix-48 Introduced new solvers: Complex: Buffered, Split Radix, Batched CT One-Level Direct Real: N-Dim and Size-one Enhanced dynamic dispatcher functionality across x86 architectures Implemented performance optimizations in… …

AOCL-Cryptography

… It supports multiple cryptographic routines and enables provider path for the following routines: Advanced Encryption Standard AES block ciphers— CTR, CBC, CFB, GCM, OFB, XTS, SIV ChaCha20 stream cipher, Chacha20-Poly1305 Cipher algorithms Cipher, Hash, and Poly1305 based Message Authentication Cod… …

Reliable SHA-256 Through LLM-Aided HLS Dataflow Optimization

… 4: Vitis HLS Dataflow Viewer — optimized two-way parallel architecture with dispatch/collect. …

Apr 6, 2026 · Wen Chen

AMD PACE - High-Performance Platform Aware Compute Engine

… Attention Backends PACE dispatches attention through pluggable backends: Backend support: Three backends are available: JIT OneDNN : Fused matmul-softmax-matmul pipelines with MHA and GQA. …

Apr 8, 2026 · Arjun Muraleedharan

Followed topics