Google DeepMind

Meet Gemma 4

A new generation of lightweight, open models built from the same research and technology behind Gemini. Fast, capable, and designed to run anywhere — from laptops to the cloud.

What is Gemma 4?

Open intelligence,
powered by Gemini research.

Gemma 4 is the fourth generation of Google DeepMind's family of open models. Built from the same research, data pipelines, and safety work behind Gemini, Gemma 4 brings frontier-level capability to models you can download, inspect, fine-tune, and deploy on your own terms.

Where earlier Gemma generations focused on raw language modeling, Gemma 4 introduces Thinking variants — models trained to reason step-by-step before answering — alongside a new Mixture-of-Experts architecture that delivers the quality of a 31B dense model at a fraction of the inference cost.

Whether you're a researcher pushing the state of the art, a startup shipping on-device features, or an enterprise running regulated workloads, Gemma 4 is designed to meet you where you are.

01

Trained on 14 trillion tokens

A carefully curated mixture of web text, code, scientific papers, and licensed data — filtered for quality, safety, and multilingual coverage.

02

New Mixture-of-Experts routing

Gemma 4 26B A4B activates just 4B parameters per token, delivering 27B-class quality with the latency of a small model.

03

Natively multimodal

Every variant accepts interleaved text and image input out of the box, with a shared vision encoder distilled from Gemini.

04

Open weights, open tools

Shipped with reference implementations in JAX, PyTorch, and Keras — plus Colab notebooks, fine-tuning recipes, and evaluation harnesses.

4
model sizes
140+
languages
128K
context window
100M+
downloads

Built for builders

State-of-the-art performance in a compact footprint.

Open weights

Download, fine-tune, and deploy Gemma 4 on your own infrastructure with a permissive license for research and commercial use.

Multimodal reasoning

Understand text, images, and structured data with strong performance on coding, math, and multi-step reasoning tasks.

Runs anywhere

Optimized for NVIDIA GPUs, Google TPUs, Apple Silicon, and even consumer laptops — with quantized variants for on-device use.

128K context

Process long documents, codebases, and conversations with an expanded context window across all model sizes.

Responsible by design

Built with Google's safety principles, extensive red-teaming, and transparent model cards for every release.

140+ languages

Trained on a diverse, high-quality multilingual corpus to serve developers and users worldwide.

One family. Four sizes.

Pick the model that fits your use case and hardware.

2B

Gemma 4 Nano

On-device, mobile, and edge deployments.

9B

Gemma 4 Small

Balanced speed and quality for everyday apps.

70B

Gemma 4 Ultra

Frontier reasoning and enterprise performance.

Benchmark performance

Evaluated against a large collection of datasets across text, reasoning, and agentic tool use.

Benchmark Gemma 4 31B IT
Thinking
Gemma 4 26B A4B IT
Thinking
Gemma 4 E4B IT
Thinking
Gemma 4 E2B IT
Thinking
Gemma 3 27B IT
Arena AI (text)
As of 4/2/26
1452 1441 1365
MMMLU
Multilingual Q&A
No tools 85.2% 82.6% 69.4% 60.0% 67.6%
MMMU Pro
Multimodal reasoning
76.9% 73.8% 52.6% 44.2% 49.7%
AIME 2026
Mathematics
No tools 89.2% 88.3% 42.5% 37.5% 20.8%
LiveCodeBench v6
Competitive coding problems
80.0% 77.1% 52.0% 44.0% 29.1%
GPQA Diamond
Scientific knowledge
No tools 84.3% 82.3% 58.6% 43.4% 42.4%
τ2-bench
Agentic tool use
Retail 86.4% 85.5% 57.5% 29.4% 6.6%

These models were evaluated against a large collection of datasets and metrics to cover different aspects of text generation. See additional benchmarks in model card.

Frequently asked questions

Everything you need to know about Gemma 4.

What is Gemma 4?

Gemma 4 is Google DeepMind's latest family of lightweight, state-of-the-art open models. It's built from the same research and technology used to create Gemini, but released with open weights so you can download, inspect, fine-tune, and deploy the models on your own infrastructure.

How is Gemma 4 different from Gemini?

Gemini is Google's closed, hosted flagship model, available through Google's APIs. Gemma 4 shares much of the underlying research but is released as open weights under a permissive license, so you can run it locally, fine-tune it for your own data, and deploy it without sending requests to Google.

Is Gemma 4 free to use commercially?

Yes. Gemma 4 is released under the Gemma license, which permits both research and commercial use. You are free to build products, services, and businesses on top of Gemma 4 — including fine-tuned derivatives — subject to the license's responsible use policy.

What hardware do I need to run it?

It depends on the size. Gemma 4 Nano (2B) runs comfortably on a modern laptop or phone. The 9B and 27B models run on a single high-end GPU such as an NVIDIA RTX 4090 or H100. The 70B Ultra model is best suited for multi-GPU servers or TPU pods. Quantized variants (GGUF, AWQ) reduce requirements further for on-device use.

What are the "Thinking" variants?

Thinking variants are Gemma 4 models trained to reason step-by-step before producing a final answer. They trade a small amount of latency for substantially better performance on math, science, coding, and multi-step reasoning benchmarks — as seen in our AIME 2026 and GPQA Diamond results.

Where can I download Gemma 4?

Gemma 4 is available through Google AI Studio, Vertex AI, Kaggle, Hugging Face, and Ollama. You can also run it directly with popular inference frameworks including llama.cpp, vLLM, and MLX for Apple Silicon.

Can I fine-tune Gemma 4 on my own data?

Absolutely. Gemma 4 supports the full range of fine-tuning techniques: full supervised fine-tuning, LoRA, QLoRA, DPO, and RLHF. We provide reference training recipes and notebooks for all four model sizes to help you get started.

How does Gemma 4 handle safety?

Every Gemma 4 release goes through extensive safety evaluation including red-teaming, bias testing, and responsible AI reviews. We publish detailed model cards for each variant and ship with a built-in responsible use policy. Because the weights are open, the broader research community can audit and improve safety as well.

What's new compared to Gemma 3?

Gemma 4 introduces Thinking variants for step-by-step reasoning, a new sparse Mixture-of-Experts architecture (26B A4B), native multimodal input across the entire family, a 128K context window, and substantial gains on math, coding, and agentic benchmarks. Gemma 4 27B IT beats Gemma 3 27B IT by over 25 points on AIME 2026 and nearly triples its score on LiveCodeBench v6.

Does Gemma 4 support tool use and function calling?

Yes. All instruction-tuned variants were trained with structured tool-use data and support function calling via a standard JSON schema. They can plan multi-step workflows, invoke external APIs, and recover from tool errors — as reflected in our τ2-bench retail scores.

Which languages are supported?

Gemma 4 was trained on more than 140 languages with balanced representation across European, Asian, African, and Indic language families. Instruction tuning covers the top 40 languages with human-verified evaluations; the remaining languages benefit from strong transfer learning.

How do I report a bug or request a feature?

The Gemma community lives on GitHub, the Google Developer forums, and the Hugging Face discussions board. Security-sensitive reports can be sent privately to the DeepMind responsible disclosure address listed in every model card.

Start building with Gemma 4

Available today on Google AI Studio, Vertex AI, Kaggle, Hugging Face, and Ollama.