Search

Showing top 34 results for "local LLM performance" · filtered from 38 indexed

Filtered by topic: LLMs Clear ✕

All sources xda-developers.com 25 developer.nvidia.com 6 amd.com 2 intel.com 2 phoronix.com 1 press.asus.com 1 research.google 1

I tried caring about ChatGPT and Claude, but I still haven't found a real need for LLMs

…Related I run local LLMs daily, but I'll never trust them for these tasks Your local LLM is great, but it'll never compare to a cloud model. I've used…

May 6, 2026 · Tanveer Singh

Speculative decoding made my local LLM actually usable

…Local LLM performance is about how you run it, not just about what you run. You don't always need a better model. Sometimes you just need to stop making the one…

Apr 6, 2026 · Marshall Gunnell

I started using my local LLMs and an MCP server to manage my NAS – it's surprisingly powerful (and safe)

…from my local LLM clients into API calls that my TrueNAS rigs can understand. So, I can issue typical commands to my LLMs, and they’ll have no trouble performing the specified…

Apr 15, 2026 · Ayush Pande

I thought I needed a GPU for local LLMs until I tried this lean model

…Sign in to your XDA account When it comes to local LLMs, we have been told that if you aren’t packing a high-end GPU with a massive pool of VRAM…

Apr 5, 2026 · Parth Shah

Discussions and forums

r/LocalLLaMA · u/The_Paradoxy · 2w ago

The Qwen 3.6 35B A3B hype is real!!!

My personal test for small local LLM intelligence is to check whether a model has any ability to understand the code that I write for my own academic research. My research is on some pretty niche topics and I doubt that …

r/LocalLLaMA · u/MikeNonect · 2w ago

Getting a feel for how fast X tokens/second really is.

I love following all your adventures with local LLM setups. Quality and size of the models are important, but so is performance. Numbers don't really convey the experienced speed well, however. If someone claims they run…

r/LocalLLaMA · u/gladkos · 3w ago

Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

Gemma just crushed Qwen in a local LLM gamedev contest! Device: MacBook Pro M5 Max, 64GB RAM Qwen 3.6 27B: 32 tokens/sec · 18m 04s · 33,946 tokens. Gemma 4 31B: 27 tokens/sec · 3m 51s · 6,209 tokens. So what is more impo…

r/LocalLLaMA · u/Signal_Ad657 · 1w ago

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Hey guys, super simple. There have been a lot of online debates about the new M5 Macs vs DGX Sparks vs Strix Halo vs dedicated GPUs etc. So I put them all in a room with good power and cooling and ran everything in paral…

r/LocalLLaMA · u/Porespellar · 2w ago

Unpopular Opinion: The DGX Spark Forum community of devs is talented AF and will make the crippled hardware a success through their sheer force of will.

There is a lot of disdain for DGX Sparks here on the sub. And I get it. A lot of people say “It could have been great if it had been better memory bandwidth”, “SM-121 is a fake /second-class Blackwell chip” yadda, yadda.…

Followed topics

Search

I tried caring about ChatGPT and Claude, but I still haven't found a real need for LLMs

Top stories

My self-hosted LLMs are a lot more than just a chat replacement – here's how they boost my productivity

Local LLMs perform so much better when you teach them to ask before they answer

Trying to self-host LLMs made me realize local AI has a friction problem, not a quality problem

My local LLM can call Claude when it's stuck, and it changed everything about my local-first setup

Speculative decoding made my local LLM actually usable

I started using my local LLMs and an MCP server to manage my NAS – it's surprisingly powerful (and safe)

I thought I needed a GPU for local LLMs until I tried this lean model

Discussions and forums

The Qwen 3.6 35B A3B hype is real!!!

Getting a feel for how fast X tokens/second really is.

Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Unpopular Opinion: The DGX Spark Forum community of devs is talented AF and will make the crippled hardware a success through their sheer force of will.

I ran this bulky LLM on an SBC cluster, and it's the most unhinged setup I've ever built

I built a local LLM server I can access from anywhere, and it uses a Raspberry Pi

LM Studio's frontend was slowing me down, so I switched to this instead

Ollama is still the easiest way to start local LLMs, but it's the worst way to keep running them

I’d do these 5 things differently if I started self-hosting LLMs today

Claude Code with a local LLM running offline is the hybrid setup I didn't know I needed