Search

Showing top 86 results for "Kimi K2"

Kimi K2

Kimi K2 is a large language model service associated with the Kimi series, also referenced as kimi 2.6 or kimi k2.6.

28 articles indexed Last updated just now See topic hub

Videos

NVIDIA Data Center Deep Learning Product Performance

…DGX GB300 nemo:26.02.01 4096 1 4 1 32 FP8 8192 NVIDIA GB300 Kimi K2 2.2 5,072 tokens/sec/gpu 256x GB300 NVIDIA DGX GB300 nemo:26.02…

I've been running some of the biggest open-weight LLMs for free on Nvidia's cloud

…Kimi K2 Thinking, Qwen3-Coder-480B, DeepSeek V3.2, Llama 4 Maverick, Mistral Large 3, Devstral 2, ByteDance's Seed-OSS, and Google's Gemma 3 family are all in there too…

Apr 30, 2026 · Adam Conway

Scaling Token Factory Revenue and AI Efficiency by Maximizing Performance per Watt | NVIDIA Technical Blog

…This end-to-end approach enables up to 10x higher inference throughput per megawatt and about 10x lower token cost versus Blackwell for AI factories for Kimi K2 (32K/8K). Paired with…

Mar 25, 2026 · Kibibi Moseley

How To: Migrate your Cloud Instance to AMD EPYC

…keep vLLM compatibility while enabling AMD-optimized attention, model execution, and multi-model support including Kimi-K2.5. May 06, 2026 AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes — ROCm Blogs…

May 11, 2026 · Noor fairoza Khan

Discussions and forums

Hacker News · u/heymax054 · May 15, 2026

DeepSeek V4 Pro and Flash vs. Claude Opus 4.7 and Kimi K2.6

2 1

Hacker News · u/nl · May 15, 2026

We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6

r/LocalLLaMA · u/APFrisco · May 11, 2026

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

Hacker News · u/ramonga · 4w ago

Show HN: Free open source coding models in Slack

Hey HN,We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase.Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep…

r/LocalLLaMA · u/Fragrant-Remove-9031 · May 16, 2026

Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs

Saw this post comparing Qwen 3.6 variants on coding primitives, so I wanted to see how local quants stack up against frontier models on a similar dense, single-file coding task. I ran the exact same prompt across local a…

Google's Gemini 3.5 Flash costs 3x the model it replaced, and the era of cheap AI is ending

…V4-Pro is commonly placed as the second-strongest open-weight reasoning model anywhere, behind only Kimi K2.6. That's frontier-adjacent quality made available at a fraction of frontier prices…

May 31, 2026 · Adam Conway

Introducing the Agent Readiness score. Check to see if your site is agent-ready

…faster and cheaper We pointed an agent (Kimi-k2.5 via OpenCode) at other large technical documentation sites' llms.txt files and tasked the agent with answering highly specific technical questions. On…

Apr 17, 2026 · André Jesus

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics