Search

Showing top 143 results for "Qwen3"

Qwen3

Qwen3 is an AI model family developed by Alibaba, released as a set of large language models for natural-language tasks.

49 articles indexed Last updated 4d ago See topic hub

Videos

ZenDNN 5.2.1: Deepening Quantization and Expanding the AI Inference Frontier on AMD EPYC™ CPUs

…BF16) Llama-3.1-8B-Instruct GSM8K -2.06% -5.64% Qwen2.5-VL-7B-Instruct ChartQA -0.29% +9.18% Qwen3-14B-Instruct GSM8K +0.68% +0.85% phi-4 GSM8K…

May 12, 2026 · Chandra Kumar Ramasamy

Scaling Autonomous AI Agents and Workloads with NVIDIA DGX Spark | NVIDIA Technical Blog

…Several models that are popular in the context of OpenClaw—including Qwen3.5 397B, GLM 5, and MiniMax M2.5 230B—can benefit from stacking multiple DGX Spark units, increasing the available…

Mar 16, 2026 · Allen Bourgoyne

Next Gen Networking Transport for Large Scale AI Training

…We achieved up to 10× faster LLM initialization (from ~10s to ~1s) — as measured on Qwen3-4B running on AMD Ryzen™ AI — with zero impact on inference correctness. May 21, 2026 Agent…

May 21, 2026 · AMD Networking

My RTX 5090 can't keep up with Apple Silicon on the biggest local LLMs, and I hate to admit it

…Step up to something like Qwen3-Coder-Next at FP8, taking up 85GB of storage, and the 5090 isn't even in the same conversation anymore. However, that model is a mixture…

May 14, 2026 · Adam Conway

Discussions and forums

Hacker News · u/thc1006 · Apr 21, 2026

Qwen3.6-35B-A3B speculative decoding is net-negative on RTX 3090

5 2

Hacker News · u/GreenGames · Apr 20, 2026

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

165 52

Hacker News · u/freakynit · Apr 17, 2026

Show HN: Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant

https://w418ufqpha7gzj-80.proxy.runpod.netStarted for myself, but since Im not using it continuously, sharing it:Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant (TheTom/llama-cpp-turboquant) on RTX 3090 (Runpod spo…

4 2

r/LocalLLaMA · u/Signal_Ad657 · 3w ago

Qwen3.6-27B vs Coder-Next

Burned about 20 hours of side-by-side compute on my two RTX PRO 6000 Blackwells trying to get a definitive answer on which of these two models was clearly better. As with many things in life, after many tokens and kWhs l…

Hacker News · u/lastdong · 2w ago

Club-3090 Recipes for serving QWEN3.6 27B locally on RTX 3090s

My homelab needed a lightweight dashboard, so I had Claude and a local LLM race to build it

…Related I finally found a local LLM I actually want to use for coding Qwen3-Coder-Next is a great model, and it's even better with Claude Code as a harness…

May 11, 2026 · Shekhar Vaidya

I tested Google’s upcoming Gemini Nano 4 — its faster, smarter AI isn’t what I expected

…Gemma competes on performance with other models like GLM5 and Qwen3.5, but its closed Gemini model remains the flagship to take on OpenAI and Anthropic . Still, the exciting news is that…

Apr 11, 2026 · Robert Triggs

Run OpenClaw Locally On AMD Ryzen™ AI Max+ Processors and Radeon™ GPUs

…Click on the "Model Search" Icon represented by a Robot and a Magnifying glass Select "Qwen3.5 35B A3B" on the left hand side and click download on the right hand side…

Apr 13, 2026

NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP

…throughout the test, while HP consistently trails slightly behind both systems at larger batch sizes. Qwen3 coder 30B A3B Base In Equal ISL/OSL, Dell scales from 59.05 tok/s to…

May 11, 2026

Larger-Than-Ever Single-GPU Quantum Simulation

…May 21, 2026 Deploying Hermes Agent for Free on AMD Developer Cloud with open models and vLLM Deploy Hermes Agent for free on AMD Developer Cloud with Qwen3.5, vLLM, and AMD…

May 12, 2026 · Pratik Mishra

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale | NVIDIA Technical Blog

…Running the Qwen3-VL-30B-A3B-Instruct-FP8 multimodal model on NVIDIA GB200, Dynamo’s embedding cache accelerated time to first token (TTFT) by up to 30% and throughput by up to…

Mar 16, 2026 · Amr Elmeleegy

Followed topics