Google releases Gemma, its most capable open model family to date, built using the same underlying research and technology as the proprietary Gemini 3 models.
Google positioned Gemma 4 as byte-for-byte the most capable open models yet, with a strong emphasis on advanced reasoning, agentic workflows, multi-step planning, tool use, autonomous agents, and efficiency for local and on-device deployment. The family includes four sizes optimized for different hardware: Ultra-lightweight for edge devices and smartphones.
Effective 4B (E4B) — Still very efficient for mobile and edge use. 26B Mixture of Experts (MoE, A4B variant) — Balances performance and lower latency. 31B Dense — Highest raw performance in the family, suited for workstations or servers. Multimodal support — Native handling of text, images, and audio inputs. Long context — Up to 256K tokens.
Advanced capabilities — Strong function-calling, structured output, offline code generation, complex logic and reasoning, and thinking mode; explicit reasoning steps before final answers. Fluent in over 140 languages.
Register for Tekedia Mini-MBA edition 20 (June 8 – Sept 5, 2026).
Register for Tekedia AI in Business Masterclass.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register for Tekedia AI Lab.
Significant gains over Gemma 3, including better multimodal reasoning and text benchmarks. Google highlights high intelligence per parameter. Gemma 4 switches to the fully permissive Apache 2.0 license. Previous Gemma versions used a more restrictive custom Google license that many developers disliked due to usage policies and potential complications with synthetic data.
Apache 2.0 allows unrestricted commercial use, fine-tuning, and deployment without the old limitations. You can access Gemma 4 right away: Google AI Studio for the larger models. AI Edge Gallery for the smaller E2B/E4B variants. Download weights from Hugging Face, Kaggle, and Ollama. Also available on Google Cloud (Vertex AI, Model Garden) for hosted deployment.
It integrates well with tools like Android Studio for local agentic coding assistance and is already seeing community support. This release continues Google’s push to make powerful AI runnable anywhere—from phones to cloud—while addressing developer feedback on openness. Previous Gemma models used a custom Google license with restrictive prohibited-use policies that Google could update unilaterally.
This created legal uncertainty, especially around synthetic data, commercial redistribution, and derivative works.Gemma 4 switches to fully permissive Apache 2.0 — the industry standard used by models like Qwen and many others. Developers and companies can now fine-tune on proprietary data, embed the models in commercial products, and release derivatives without worrying about license termination or extra compliance burdens.
Removes a major barrier that previously pushed teams toward competitors. Boosts long-term adoption: Enterprises gain true data sovereignty and control, as models run locally and on-prem without sending data to third parties. Encourages a Gemmaverse explosion — more fine-tunes, agents, and ecosystem tools, similar to how Llama releases accelerated community innovation.
Gemma 4 delivers frontier-level reasoning and agentic skills in relatively small sizes especially the 26B MoE and 31B dense variants, claiming strong intelligence per parameter. Highlights include: Native multimodal support. Up to 256K context. Built-in function calling, structured output, multi-step planning, and thinking and reasoning modes.
Strong coding, logic, and offline agent workflows. Multilingual coverage for 140+ languages. Agentic AI becomes practical on-device or on modest hardware. You can now run autonomous agents; planning, tool use, offline code gen directly on phones, laptops, edge devices, or single GPUs — reducing latency and privacy risks.
Narrows the gap between open and closed models. The 31B variant ranks highly on human preference leaderboards, sometimes competing with much larger models from Chinese labs or Meta’s Llama family in specific tiers. Pushes the entire open-source ecosystem forward. Expect rapid community quantization (GGUF), fine-tunes, and agent frameworks in the coming weeks.
Lowers costs dramatically: No per-token API fees, reduced infrastructure needs. Enables new use cases — real-time multimodal agents on-device. Strengthens Google’s Android ecosystem while benefiting the wider hardware stack. The release comes amid intense competition from Chinese open-weight models that have led in certain benchmarks and scale.
Google with Meta’s Llama series counters the perception that China dominates open models. Gemma 4 is positioned as a high-quality, trusted alternative with rigorous safety protocols inherited from Gemini research. Intensifies the small but mighty race: Efficiency and on-device performance matter as much as raw scale.
Gemma 4 often wins on English coding and agentic tasks at 26-31B scale, while competitors may still lead in extreme context or specific multilingual/CJK scenarios. Self-hosted deployments become more attractive for compliance-heavy industries. Google Cloud makes it easy to run in Vertex AI or private setups.
Startups and smaller teams gain access to Gemini-level research without vendor lock-in. Apache 2.0 + Google’s security auditing lowers legal and operational risks compared to earlier custom licenses. Many see it accelerating the shift from cloud-only APIs to hybrid/local-first AI. Benchmarks are early; real-world performance varies by quantization and use case.
Smaller edge variants trade some capability for efficiency. Rapid open-source iteration means the leaderboard will keep shifting as fine-tunes emerge. Gemma 4 is a strategic acceleration for the open AI ecosystem. It makes advanced reasoning, multimodality, and agentic workflows more accessible, private, and deployable than ever — while signaling Google’s commitment to a vibrant open-source community alongside its proprietary Gemini line.



