Meet Gemma 4
A new generation of lightweight, open models built from the same research and technology behind Gemini. Fast, capable, and designed to run anywhere — from laptops to the cloud.
Open intelligence,
powered by Gemini research.
Gemma 4 is the fourth generation of Google DeepMind's family of open models. Built from the same research, data pipelines, and safety work behind Gemini, Gemma 4 brings frontier-level capability to models you can download, inspect, fine-tune, and deploy on your own terms.
Where earlier Gemma generations focused on raw language modeling, Gemma 4 introduces Thinking variants — models trained to reason step-by-step before answering — alongside a new Mixture-of-Experts architecture that delivers the quality of a 31B dense model at a fraction of the inference cost.
Whether you're a researcher pushing the state of the art, a startup shipping on-device features, or an enterprise running regulated workloads, Gemma 4 is designed to meet you where you are.
Trained on 14 trillion tokens
A carefully curated mixture of web text, code, scientific papers, and licensed data — filtered for quality, safety, and multilingual coverage.
New Mixture-of-Experts routing
Gemma 4 26B A4B activates just 4B parameters per token, delivering 27B-class quality with the latency of a small model.
Natively multimodal
Every variant accepts interleaved text and image input out of the box, with a shared vision encoder distilled from Gemini.
Open weights, open tools
Shipped with reference implementations in JAX, PyTorch, and Keras — plus Colab notebooks, fine-tuning recipes, and evaluation harnesses.
Built for builders
State-of-the-art performance in a compact footprint.
Open weights
Download, fine-tune, and deploy Gemma 4 on your own infrastructure with a permissive license for research and commercial use.
Multimodal reasoning
Understand text, images, and structured data with strong performance on coding, math, and multi-step reasoning tasks.
Runs anywhere
Optimized for NVIDIA GPUs, Google TPUs, Apple Silicon, and even consumer laptops — with quantized variants for on-device use.
128K context
Process long documents, codebases, and conversations with an expanded context window across all model sizes.
Responsible by design
Built with Google's safety principles, extensive red-teaming, and transparent model cards for every release.
140+ languages
Trained on a diverse, high-quality multilingual corpus to serve developers and users worldwide.
One family. Four sizes.
Pick the model that fits your use case and hardware.
Gemma 4 Nano
On-device, mobile, and edge deployments.
Gemma 4 Small
Balanced speed and quality for everyday apps.
Gemma 4 Pro
Flagship open model for demanding workloads.
Most popularGemma 4 Ultra
Frontier reasoning and enterprise performance.
Benchmark performance
Evaluated against a large collection of datasets across text, reasoning, and agentic tool use.
| Benchmark | Gemma 4 31B IT Thinking |
Gemma 4 26B A4B IT Thinking |
Gemma 4 E4B IT Thinking |
Gemma 4 E2B IT Thinking |
Gemma 3 27B IT | |
|---|---|---|---|---|---|---|
| Arena AI (text) As of 4/2/26 |
1452 | 1441 | — | — | 1365 | |
| MMMLU Multilingual Q&A |
No tools | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% |
| MMMU Pro Multimodal reasoning |
76.9% | 73.8% | 52.6% | 44.2% | 49.7% | |
| AIME 2026 Mathematics |
No tools | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% |
| LiveCodeBench v6 Competitive coding problems |
80.0% | 77.1% | 52.0% | 44.0% | 29.1% | |
| GPQA Diamond Scientific knowledge |
No tools | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% |
| τ2-bench Agentic tool use |
Retail | 86.4% | 85.5% | 57.5% | 29.4% | 6.6% |
These models were evaluated against a large collection of datasets and metrics to cover different aspects of text generation. See additional benchmarks in model card.
How does Gemma 4 stack up?
Side-by-side comparisons against the models you're already considering.
Two different approaches to AI. One is an open model you can run anywhere. The other is a closed product from OpenAI. He...
Compare →Two open-weights families with very different priorities. Gemma 4 is built on Gemini's research lineage; Qwen-3.5 is Ali...
Compare →Gemma 4 is open weights you can host. Claude is a closed frontier assistant from Anthropic. Both invest heavily in reaso...
Compare →Same research lab, two different products. Gemma 4 is the open-weights sibling of Gemini — built from the same data, tra...
Compare →Frequently asked questions
Everything you need to know about Gemma 4.
What is Gemma 4?
Gemma 4 is Google DeepMind's latest family of lightweight, state-of-the-art open models. It's built from the same research and technology used to create Gemini, but released with open weights so you can download, inspect, fine-tune, and deploy the models on your own infrastructure.
How is Gemma 4 different from Gemini?
Gemini is Google's closed, hosted flagship model, available through Google's APIs. Gemma 4 shares much of the underlying research but is released as open weights under a permissive license, so you can run it locally, fine-tune it for your own data, and deploy it without sending requests to Google.
Is Gemma 4 free to use commercially?
Yes. Gemma 4 is released under the Gemma license, which permits both research and commercial use. You are free to build products, services, and businesses on top of Gemma 4 — including fine-tuned derivatives — subject to the license's responsible use policy.
What hardware do I need to run it?
It depends on the size. Gemma 4 Nano (2B) runs comfortably on a modern laptop or phone. The 9B and 27B models run on a single high-end GPU such as an NVIDIA RTX 4090 or H100. The 70B Ultra model is best suited for multi-GPU servers or TPU pods. Quantized variants (GGUF, AWQ) reduce requirements further for on-device use.
What are the "Thinking" variants?
Thinking variants are Gemma 4 models trained to reason step-by-step before producing a final answer. They trade a small amount of latency for substantially better performance on math, science, coding, and multi-step reasoning benchmarks — as seen in our AIME 2026 and GPQA Diamond results.
Where can I download Gemma 4?
Gemma 4 is available through Google AI Studio, Vertex AI, Kaggle, Hugging Face, and Ollama. You can also run it directly with popular inference frameworks including llama.cpp, vLLM, and MLX for Apple Silicon.
Can I fine-tune Gemma 4 on my own data?
Absolutely. Gemma 4 supports the full range of fine-tuning techniques: full supervised fine-tuning, LoRA, QLoRA, DPO, and RLHF. We provide reference training recipes and notebooks for all four model sizes to help you get started.
How does Gemma 4 handle safety?
Every Gemma 4 release goes through extensive safety evaluation including red-teaming, bias testing, and responsible AI reviews. We publish detailed model cards for each variant and ship with a built-in responsible use policy. Because the weights are open, the broader research community can audit and improve safety as well.
What's new compared to Gemma 3?
Gemma 4 introduces Thinking variants for step-by-step reasoning, a new sparse Mixture-of-Experts architecture (26B A4B), native multimodal input across the entire family, a 128K context window, and substantial gains on math, coding, and agentic benchmarks. Gemma 4 27B IT beats Gemma 3 27B IT by over 25 points on AIME 2026 and nearly triples its score on LiveCodeBench v6.
Does Gemma 4 support tool use and function calling?
Yes. All instruction-tuned variants were trained with structured tool-use data and support function calling via a standard JSON schema. They can plan multi-step workflows, invoke external APIs, and recover from tool errors — as reflected in our τ2-bench retail scores.
Which languages are supported?
Gemma 4 was trained on more than 140 languages with balanced representation across European, Asian, African, and Indic language families. Instruction tuning covers the top 40 languages with human-verified evaluations; the remaining languages benefit from strong transfer learning.
How do I report a bug or request a feature?
The Gemma community lives on GitHub, the Google Developer forums, and the Hugging Face discussions board. Security-sensitive reports can be sent privately to the DeepMind responsible disclosure address listed in every model card.
Start building with Gemma 4
Available today on Google AI Studio, Vertex AI, Kaggle, Hugging Face, and Ollama.