Search

Showing top 16 results for "Local LLM testing"

Filtered by topic: LLMs Clear ✕

All sources xda-developers.com 15 jeffgeerling.com 1

I tested 3 local LLMs on my actual work — and each model won at something different

… Related I finally found a local LLM I want to use every day and it's not for coding Local AI that actually fits into my day Before we get into it My setup and what I’m working with I’m running an RTX 3070 with 8GB VRAM, which gets the job done, and I run my models in LM Studio. …

Apr 21, 2026 · Nolen Jonker

You don't need an expensive GPU to run a local LLM that actually works

… But we won't need to be running 32B models to make the most of LLM capabilities. Quiz 8 Questions · Test Your Knowledge You don't need a beefy GPU to run a local LLM Trivia challenge Think you know your way around local AI? Test your knowledge of running LLMs without breaking the bank. …

Apr 29, 2026 · Rich Edmonds

Home Assistant's local LLM support outperforms Gemini for Home, and Google knows it

… Related I don't pay for ChatGPT, Perplexity, Gemini, or Claude – I stick to my self-hosted LLMs instead There's no point in relying on AI tools when my local LLMs can handle everything Home Assistant with a local LLM is already doing what Gemini for Home promises Run your stack with your rules You … …

Apr 28, 2026 · Samir Makwana

After a year of self-hosting LLMs, I realized the real bottleneck isn’t the GPU

… Quiz 8 Questions · Test Your Knowledge You don't need a beefy GPU to run a local LLM Trivia challenge Think you know your way around local AI? Test your knowledge of running LLMs without breaking the bank. …

May 6, 2026 · Yash Patel

Discussions and forums

r/LocalLLaMA · u/The_Paradoxy · 1w ago

The Qwen 3.6 35B A3B hype is real!!!

My personal test for small local LLM intelligence is to check whether a model has any ability to understand the code that I write for my own academic research. My research is on some pretty niche topics and I doubt that …

r/LocalLLaMA · u/spencer_kw · 2w ago

DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.

That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all? Ran m…

r/LocalLLaMA · u/APFrisco · 1w ago

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

r/LocalLLaMA · u/gladkos · 2w ago

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%

Implemented Multi-Token Prediction for LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. Ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Prompt: Write a Python program to find…

r/LocalLLaMA · u/Fragrant-Remove-9031 · 1w ago

Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs

Saw this post comparing Qwen 3.6 variants on coding primitives, so I wanted to see how local quants stack up against frontier models on a similar dense, single-file coding task. I ran the exact same prompt across local a…

Your old GPU can still run big LLMs – you just need the right tweaks

May 6, 2026 · Ayush Pande

Claude Code with a local LLM running offline is the hybrid setup I didn't know I needed

… And that's also why the local LLM helps. Claude Code already has a switch between Opus, Sonnet, and Haiku, and setting up an Anthropic API-capable local LLM slots in as a fourth. …

May 3, 2026 · Joe Rice-Jones

Speculative decoding made my local LLM actually usable

… Running a local LLM is easy until you actually try to use it every day Five minutes to set up, five hours to realize you don't actually want to use it Getting a local LLM running is the easy part. …

Apr 6, 2026 · Marshall Gunnell

LM Studio's frontend was slowing me down, so I switched to this instead

… It gives you local network access at a switch , and now you can serve over Tailscale as well. It's fast enough for one user, easy to switch between multiple local models, and serves local LLM models via an OpenAI-compatible API that most tools can use. …

Apr 22, 2026 · Joe Rice-Jones

Google's Gemma 4 finally made me care about running local LLMs

… Our local LLM expert, Adam Conway, talked all about the nitty-gritty details in a separate article . But I do want to briefly explain how local LLMs actually work, because it makes everything else in this piece make a lot more sense. …

Apr 18, 2026 · Mahnoor Faisal

I’d do these 5 things differently if I started self-hosting LLMs today

… Related 5 self-hosted LLMs I use for specific tasks My customized, self-hosted AI workflow Focus on workflow integration instead Connecting the engine to the wheels For a long time, my local LLM was just a destination, a tab I’d visit when I had a specific question. …

Apr 21, 2026 · Yash Patel

Followed topics