Search

Showing top 86 results for "Kimi K2"

Kimi K2

Kimi K2 is a large language model service associated with the Kimi series, also referenced as kimi 2.6 or kimi k2.6.

28 articles indexed Last updated just now See topic hub

Videos

AMD Instinct MI350P: Enterprise PCIe AI Inference Returns to Standard Servers

…OpenAI’s gpt-oss release made the throughput uplift obvious, and frontier models like Kimi K2.6 are being natively quantization-aware-trained in INT4 from the start, rather than quantized after…

May 7, 2026

Agentic AI Brings New Attention to CPUs in the AI Data Center

…keep vLLM compatibility while enabling AMD-optimized attention, model execution, and multi-model support including Kimi-K2.5. May 06, 2026 SPEC CPU 2026 and the Value of Open, Trusted Performance Measurement…

May 11, 2026 · AMD News

NVIDIA DGX Spark and DGX Station Power the Latest Open-Source and Frontier Models From the Desktop

…This includes a variety of advanced AI models including Kimi-K2 Thinking, DeepSeek-V3.2, Mistral Large 3, Meta Llama 4 Maverick, Qwen3 and OpenAI gpt-oss-120b. “NVIDIA GB300 is typically…

Jan 5, 2026 · Chris Marriott

Your AI bill is out of control. Cloudflare can fix it now.

…You couldn't set a budget that said "engineering gets \$5,000/month on frontier models, interns get \$200/month on Kimi K2.6." That changes today. Spend limits: budgets for AI…

Jun 5, 2026 · Ming Lu

Discussions and forums

Hacker News · u/heymax054 · May 15, 2026

DeepSeek V4 Pro and Flash vs. Claude Opus 4.7 and Kimi K2.6

2 1

Hacker News · u/nl · May 15, 2026

We Tested DeepSeek V4 Pro and Flash Against Claude Opus 4.7 and Kimi K2.6

r/LocalLLaMA · u/APFrisco · May 11, 2026

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

Hacker News · u/ramonga · 4w ago

Show HN: Free open source coding models in Slack

Hey HN,We believe we have the easiest onboarding from signup to being able to spin up coding agents in slack like Stripe, Ramp & Coinbase.Demo of the onboarding: https://www.tella.tv/video/connecting-cord-to-slack-1-19ep…

r/LocalLLaMA · u/Fragrant-Remove-9031 · May 16, 2026

Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs

Saw this post comparing Qwen 3.6 variants on coding primitives, so I wanted to see how local quants stack up against frontier models on a similar dense, single-file coding task. I ran the exact same prompt across local a…

7 Questions to Ask Before Building Your AI Infrastructure

May 11, 2026 · AMD Networking

NVIDIA CEO Jensen Huang at Dell Technologies World: ‘Demand Is Going Parabolic, Utterly Parabolic’

…entities. Additional open models — MiniMax-M2.7, DeepSeek Pro, DeepSeek-V4, GLM 5.1 and Kimi K2.6 with NVIDIA NVFP4 optimization — are available on the Dell Enterprise Hub on Hugging Face…

May 18, 2026 · NVIDIA Writers

From Silicon to Cloud: AMD on AWS Essentials for IT Leaders

May 11, 2026 · Jeremy Girven

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding | NVIDIA Technical Blog

…model checkpoints on Hugging Face with Blackwell and Hopper recipes, covering model families including Qwen, Kimi K2.6, Llama, Gemma, and gpt-oss. The recipes include support for popular inference frameworks such…

Jun 23, 2026 · Amr Elmeleegy

A Look Ahead: Extending Server Energy Efficiency with LPDDR5X Memory

Apr 6, 2026 · Madhu Rangarajan

Orchestrating AI Code Review at scale

…Kimi K2.5: Used for lightweight, text-heavy tasks like the Documentation Reviewer, Release Reviewer, and the AGENTS.md Reviewer. These are the defaults, but every single model assignment can be overridden…

Apr 20, 2026 · Ryan Skidmore

Followed topics