Search

Showing top 142 results for "Gemma 4 local use"

Google Gemma

Gemma is a family of open-weight language models released by Google for text generation and related NLP tasks.

119 articles indexed Last updated 15h ago See topic hub

Maker packs an opinionated, googly-eyed AI chatbot into a mobile suitcase, powered by an Nvidia Jetson — entirely local machine entity runs Gemma 4 E4B and can respond in 200ms

… Gemma 4 E4B, ~200ms cached TTFT, 30+ sensors, no WiFi/BT/cellular. He has opinions. from r/LocalLLaMA In the r/LocalLLaMA subreddit, CreativelyBankrupt outlines the ‘recipe’ for this characterful digital companion. “Sparky runs entirely on the Jetson. …

May 17, 2026 · Mark Tyson

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

… To use Gemma 4 locally, users can download Ollama to run Gemma 4 models or install llama.cpp and pair it with the Gemma 4 GGUF Hugging Face checkpoint. …

Apr 2, 2026 · Michael Fukuyama

Gemma 4: The new standard for local agentic intelligence on Android

… Coding with Gemma 4 in Android Studio When building Android apps, Android Studio can use Gemma 4 to leverage its state-of-the-art reasoning power and native support for tool use, while keeping the model and inference contained entirely on your local machine. …

Android Studio supports Gemma 4: our most capable local model for agentic coding

… In Agent Mode, select Gemma 4 as your active model. For a detailed walkthrough on configuration, check out the official documentation on how to use a local model . We are excited to see how Gemma 4 enables more private, secure, and powerful development workflows. …

Want to make the most of the new Gemma 4 AI models? RTX GPUs and PCs accelerate local AI like never before

… Fully compatible with OpenClaw, Gemma 4 models allow users to build fast and capable local agents that leverage local-files to action user requests within local applications and automated workloads. …

Apr 2, 2026 · NVIDIA

Google's Gemma 4 Model Can Now Be Deployed on NVIDIA's RTX GPUs, Delivering Optimized Performance for a 'Personalized' Agentic AI Environment

… To use Gemma 4 locally, users can download Ollama to run Gemma 4 models or install llama.cpp and pair it with the Gemma 4 GGUF Hugging Face checkpoint. …

Apr 2, 2026 · Muhammad Zuhair

Run Google's Gemini LLMs right on your Mac with the new AI Edge Gallery

… Heavy LLM users have long been able to run Google's LLMs locally on their devices, including the Mac. But this is the first time that Google's own tool has been available for Mac owners. Hey, Gemma Those downloading the new AI Edge Gallery tool will be able to run the Gemma 4 12B model . …

Jun 4, 2026 · Oliver Haslam

Discussions and forums

r/docker · u/CreativeCollege2815 · 2w ago

Using a Gemma4 Safetensor Already Downloaded Locally

Hi everyone. I need some help or advice. I’m learning how to use N8N, so I downloaded Docker and installed N8N locally. I also wanted to install Gemma4, which I use in ComfyUI to help with image generation prompts. Is it…

r/LocalLLaMA · u/gladkos · May 1, 2026

Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

Gemma just crushed Qwen in a local LLM gamedev contest! Device: MacBook Pro M5 Max, 64GB RAM Qwen 3.6 27B: 32 tokens/sec · 18m 04s · 33,946 tokens. Gemma 4 31B: 27 tokens/sec · 3m 51s · 6,209 tokens. So what is more impo…

Hacker News · u/lostathome · 1w ago

Show HN: Hitoku Draft – Context aware local assistant

Hi guys.I have been working on Hitoku Draft, an open-source, voice-first AI assistant that runs entirely locally. I posted about it already, and now it has also transcription with voice editing. Looking for feedback, as …

15 1

Hacker News · u/limondas · 15h ago

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

Claude Code like agentic workflow ai too costly for me.Any LLM can I run with VSCode at the below setup? 16ram Intel core i7 h processor 13gen 512gb NVMe SSD I want to run the ai as local agentic workflow with Vscode.I w…

5 2

r/LocalLLaMA · u/gladkos · May 8, 2026

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%

Implemented Multi-Token Prediction for LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. Ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Prompt: Write a Python program to find…

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

… Faster local inference right now Google has released new versions of Gemma 4 models with MTP that you can try today. Google says the MTP drafter can make Gemma models up to three times faster, but the actual gain varies based on the hardware you use. …

May 6, 2026 · Ryan Whitwam

Google AI Edge Gallery launches to macOS - 9to5Mac

Google AI Edge Gallery launches on macOS, letting Mac users run Gemini models locally Marcus Mendes | Jun 3 2026 - 7:58 pm PT | Jun 3 2026 - 7:58 pm PT In addition to Google AI Edge Gallery, which lets users run Gemma models locally on their Macs, the company also released the Gemma 4 12B model and… …

Jun 4, 2026 · Marcus Mendes

AMD Rolls Out Gemma 4 Model Support Across Full Range of GPUs & CPUs

… Point Lemonade to the ROCm build by setting the environment variable: export LEMONADE LLAMACPP ROCM BIN=/path/to/llama-server Start Lemonade and load the Gemma 4 model via the API: lemonade-server serve curl http://localhost:8000/api/v1/pull \ -H "Content-Type: application/json" \ -d '{"model name"… …

Apr 4, 2026 · Hassan Mujtaba

Followed topics