Gemma 4 Is Apache 2.0 Now. Here's Why That Actually Matters.

The licence was always the problem with Gemma.

Google’s open-weight models were technically impressive, but Gemma 3’s custom licence had teeth — restrictions that made it genuinely risky to ship in commercial products without legal sign-off. Developers noticed. The community complained. And for a lot of builders, Llama just became the easier answer by default.

That’s over. Gemma 4, released April 2nd, ships under Apache 2.0. Full stop.

This is the headline. Everything else — and there’s a lot of everything else — is secondary.

Why the Licence Change Is the Real Story

Apache 2.0 is the gold standard of open-source permissiveness. You can use it commercially, modify it, redistribute it, and ship products built on top of it without a legal department’s blessing. It’s the same licence Google uses for Android.

The custom Gemma licence that shipped with Gemma 3 was criticised for being too restrictive — usage caps, service restrictions, ambiguous commercial terms. For indie builders and startups especially, that’s a dealbreaker. You don’t want your infrastructure built on something that might require renegotiation at scale.

With Apache 2.0, that friction is gone. Gemma 4 is now a legitimate foundation for commercial products. That’s not a minor footnote — it’s the entire practical value proposition shift.

Code terminal with open-source development

What You’re Actually Getting: The Model Breakdown

Gemma 4 ships in four sizes, each targeting a different hardware tier:

E2B and E4B (Edge/Mobile) Built for on-device. The Pixel team worked directly with Qualcomm and MediaTek to optimise these. Google claims near-zero latency on smartphones, Raspberry Pi, and Jetson Nano. Native audio input is included — useful for speech recognition at the edge. Context window: 128K.

26B Mixture of Experts The interesting one for most builders. Despite having 26 billion parameters, it only activates 3.8B during inference. That means you get large-model quality at small-model speeds. It’s currently ranked #6 on the Arena AI open model leaderboard — competing with models 20x its size. Context window: 256K.

31B Dense The quality flagship. Ranked #3 on Arena AI’s open model leaderboard, behind GLM-5 and Kimi 2.5 — the two models currently leading the open-weight space. Runs unquantised on a single 80GB H100. Quantised versions drop to consumer GPUs. Context window: 256K.

The 26B MoE is the one to pay attention to for most use cases. High quality, fast inference, and it fits in hardware you probably already have access to.

Agentic Workflows Built In

Gemma 4 isn’t just a chat model with better benchmarks. Google has shipped native support for:

Function calling
Structured JSON output
Native system instructions for tools and APIs

This matters because agentic AI is where the actual builder value is in 2026. If you’re building agents that need to call APIs, parse structured data, and follow multi-step plans — Gemma 4 is designed for that workflow out of the box, locally, with no cloud dependency.

That’s a significant shift. Strong local reasoning + native function calling + Apache 2.0 = a genuinely viable foundation for local agent infrastructure.

Code Generation: The Offline Angle

Google is positioning Gemma 4 as an offline alternative to cloud code assistants like Gemini Pro and Claude Code.

That’s a bold claim, and we’ll need to run our own benchmarks before we fully back it. But the underlying tech is the same as Gemini 3, and the Arena AI rankings suggest the 31B model is legitimately competitive. For teams with privacy requirements, air-gapped environments, or who just want to stop paying per-token for code completion — this is worth testing immediately.

The 26B MoE’s high tokens-per-second makes it particularly interesting here. Fast inference on a local GPU is a very different experience to waiting for cloud roundtrips.

What This Doesn’t Fix

Let’s be honest about the gaps.

The big models still need serious hardware. Unquantised 31B on a single H100 sounds accessible until you remember that’s a $20,000 GPU. Quantised versions on consumer hardware will get you somewhere, but you’ll take a quality hit. For teams without ML infrastructure, you’re still looking at renting compute.

Gemma 4 also doesn’t displace Llama for builders who’ve already committed to that ecosystem. Meta’s models have a massive fine-tuning community, and Llama’s Apache 2.0 equivalent has been the default for a while. Gemma is now a peer, not an automatic replacement.

And at the top of the open-weight leaderboard, GLM-5 and Kimi 2.5 are still ahead of Gemma 31B. If raw performance is your only metric, those are still worth evaluating.

The Verdict

Gemma 4 just became the most compelling open-weight model family for commercial builders.

The Apache 2.0 switch removes the legal ambiguity that made Gemma 3 a liability risk. The 26B MoE gives you frontier-adjacent quality at real-world inference speeds. The native agentic tooling means you’re not bolting on function-calling as an afterthought. And the mobile-first E2B/E4B models open up on-device use cases that were genuinely hard before.

Is it perfect? No. Big hardware requirements are still real. Llama’s ecosystem is still massive. But as a foundation for commercial local AI — Gemma 4 is the one to build on now.

Use Gemma 4 if: You’re shipping a commercial product, need local inference, or are building agentic workflows and want to avoid cloud API costs and lock-in.

Stick with Llama if: You’re deep in the Llama fine-tuning ecosystem and there’s no compelling reason to switch.

Try both if: You’re at the research/prototyping phase and want to see which performs better on your specific task.

The licence was always the problem. Now it’s not.

Gemma 4 is available now on Hugging Face and AI Studio. Apache 2.0 licence. No excuses.