Gemma 4 Joins Private Mode in MeetingsAI

Introduction

Private Mode in MeetingsAI just got a new model: Gemma 4 E2B.

If you have used Private Mode, you know the deal. Everything runs on your phone. No audio, transcript, or summary ever leaves the device. That has always been the point. The only thing that's changed over time is how much you can do without sending anything to the cloud.

Gemma 4 is the latest step in that direction. It now sits in the Private Mode model picker as a one-tap download, alongside the lighter and the heavier options we already shipped.

View post on Instagram

Gemma 4

What is Gemma 4 E2B and why we picked it

Gemma 4 E2B is a small open-source model from Google's Gemma family. The "E2B" naming refers to its effective parameter footprint, which keeps it light enough to run on a phone but capable enough to handle nuanced questions about your meetings.

We package it as a quantized GGUF file so it fits cleanly on device. The download is around 2.5 GB. After that, it runs entirely offline through the same Llama-style engine we use for our other GGUF models.

We added it because it fills a real gap in the lineup. SmolLM is fast and tiny. Qwen 3 is heavier and very capable. Gemma 4 lands in the middle with a personality of its own. It tends to think things through carefully before answering, which fits well with how people actually use the chat tab on a meeting summary.

How it shows up in the app

Open Private Mode in MeetingsAI and the model picker now lists Gemma 4 E2B as one of the cards. You will see a small "Thinking" badge next to its name, the same badge we use for Qwen 3.

Tap the card, download it once, and it stays on your device. From then on you can pick Gemma 4 as your active engine the same way you would pick any of the others.

The carousel order is intentional:

Apple Intelligence (built in, no download)
SmolLM2 360M (small, fast)
LLaMA 3.2 1B via ExecuTorch (recommended default)
Gemma 4 E2B (new)
Qwen 3 4B (largest, deep thinking)

That ordering reflects the trade-off: smaller models are quicker, larger models are slower but better at hard questions. Gemma sits in the sweet spot for "I want a thoughtful answer without waiting forever."

Use cases where Gemma 4 fits

A few scenarios where Gemma 4 is a good pick:

You are flying or in a building with no signal. You still want a real conversation with your meeting notes, not just keyword search.
You are dealing with sensitive material. Legal, healthcare, internal HR. Cloud is off the table, and you want a model that reasons before answering.
Your phone is mid-tier. Qwen 3 is too heavy, SmolLM is too short on context. Gemma 4 fits.
You are using Private Mode because of a company policy. The fewer round trips your data takes, the better.

In all of these, the practical win is the same. You get a useful answer without anything leaving the device.

Deep thinking, on-device

Gemma 4 is a thinking-style model. It plans before it speaks, similar to how Qwen 3 with thinking mode behaves. We process that internally and only show you the final answer in the chat bubble, with the reasoning tucked away as faded "thinking" text above it.

This matters for meeting questions because the good answers tend to need at least a little reasoning. "What did we actually decide about the Q3 launch?" is not a one-line lookup. The model has to scan the transcript, weigh what was said, and form a real reply. Gemma 4 does that step well.

You do not need to toggle anything. Gemma 4 is wired so the reasoning is hidden from the visible reply automatically.

Choosing the right model

We did not replace anything. The whole lineup is still there, and you can switch any time. Quick guide:

SmolLM2 360M. Tiny and fast. Good for short, casual replies.
LLaMA 3.2 1B (ExecuTorch). Solid default. Balanced speed and quality.
Gemma 4 E2B. New middle option. Reasons before it answers.
Qwen 3 4B. Heaviest. Best for the toughest questions, slower per reply.

Try a couple. The right one is whichever feels best for the way you ask things.

Privacy, unchanged

Adding a model never changes the privacy story. Once a Private Mode model is on your device, your audio, transcripts, and summaries stay there. Nothing gets uploaded, cached on a server, or analyzed off-device when you use it. Same commitment, more options.

If you want the full picture of how Private Mode works, the Private Mode page on our site covers it.

Real-world impact

The honest version of this update: most people who use Private Mode will not switch models often. They pick one that runs well on their phone and stick with it.

What Gemma 4 changes is the ceiling. If you have been on the lighter models because Qwen 3 was too slow on your device, Gemma 4 gives you a way up without paying that full cost. And if you have been waiting for a thinking-style model that did not come from the same family as the ones we already had, here is one.

Small change in the picker. Bigger change in what your phone can do offline.

Conclusion

Gemma 4 E2B is now in MeetingsAI Private Mode. It sits between SmolLM and Qwen 3 in size, but punches above its weight on reasoning. Everything runs on your device, nothing is uploaded, and the visible reply stays clean.

The download is one tap. After that, it is yours.

We Added Gemma 4 to Private Mode in MeetingsAI