Search

Showing top 143 results for "Qwen3"

Qwen3

Qwen3 is an AI model family developed by Alibaba, released as a set of large language models for natural-language tasks.

49 articles indexed Last updated 4d ago See topic hub

Videos

Latest Mobile Devices News, Updates & Coverage

…The specific model in question is Qwen3.5-397B-A17B (2-bit quantized), which is a 397B model with 17B active parameters. Per Daniel's published paper , only 5.5 GB of…

Ollama is still the easiest way to start local LLMs, but it's the worst way to keep running them

…Even now, "ollama run deepseek-r1" will launch the 8B Qwen3-derived distilled variant. Then there's the infrastructure itself. Ollama stores models using hashed filenames in its own registry format, which…

Apr 8, 2026 · Adam Conway

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai | NVIDIA Technical Blog

…This range enables evaluating fractional allocation across workloads with different memory footprints. Notably, the largest model (Qwen3-14B) occupies only ~35% of one NVIDIA H100 NVL GPU 80 GB capacity, illustrating why…

Feb 18, 2026 · Boskey Savla

Discussions and forums

Hacker News · u/thc1006 · Apr 21, 2026

Qwen3.6-35B-A3B speculative decoding is net-negative on RTX 3090

5 2

Hacker News · u/GreenGames · Apr 20, 2026

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

165 52

Hacker News · u/freakynit · Apr 17, 2026

Show HN: Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant

https://w418ufqpha7gzj-80.proxy.runpod.netStarted for myself, but since Im not using it continuously, sharing it:Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant (TheTom/llama-cpp-turboquant) on RTX 3090 (Runpod spo…

4 2

r/LocalLLaMA · u/Signal_Ad657 · 3w ago

Qwen3.6-27B vs Coder-Next

Burned about 20 hours of side-by-side compute on my two RTX PRO 6000 Blackwells trying to get a definitive answer on which of these two models was clearly better. As with many things in life, after many tokens and kWhs l…

Hacker News · u/lastdong · 2w ago

Followed topics