Search

Showing top 16 results for "Local LLM testing"

Filtered by topic: LLMs Clear ✕

Discussions and forums

r/LocalLLaMA · u/The_Paradoxy · 1w ago

The Qwen 3.6 35B A3B hype is real!!!

My personal test for small local LLM intelligence is to check whether a model has any ability to understand the code that I write for my own academic research. My research is on some pretty niche topics and I doubt that …

r/LocalLLaMA · u/spencer_kw · 2w ago

DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.

That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all? Ran m…

r/LocalLLaMA · u/APFrisco · 1w ago

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

r/LocalLLaMA · u/gladkos · 2w ago

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%

Implemented Multi-Token Prediction for LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. Ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Prompt: Write a Python program to find…

r/LocalLLaMA · u/Fragrant-Remove-9031 · 1w ago

Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs

Saw this post comparing Qwen 3.6 variants on coding primitives, so I wanted to see how local quants stack up against frontier models on a similar dense, single-file coding task. I ran the exact same prompt across local a…