Gemma 4 vs Real-World On-Device AI: MeetingsAI Private Mode

Introduction

Google’s on-device Gemma is the headline of the week — lighting up timelines, newsletters, and every “the future is local” hot take out there.
Here’s the quieter side of that story: at MeetingsAI, we’ve been running on-device AI in production for months. Not as a demo. Not as a toggle that secretly pings a cloud server. Real transcription, real summaries, real chat over your past meetings — all happening entirely on the device in your pocket.

That difference matters the moment your meeting involves anything you wouldn’t paste into ChatGPT.

Meetings are the hardest privacy case in AI

Most AI privacy debates are theoretical. Meetings make them painfully real.

In a single recording, you might capture:

NDAs and unreleased roadmaps
Salary numbers and performance reviews
Legal strategy or settlement ranges
M&A discussions and board-level decisions
Patient details, therapy notes, or medical records
Customer PII your legal team would rather you never send to a third party

Cloud transcription tools ask you to trust a maze of vendors, regional data centers, and retention policies. For many teams, that trust isn’t available — and “we promise to delete it” isn’t a compliance answer.

On-device AI changes the question.
The audio never leaves. The summary never leaves. The chat with your past meetings never leaves.

What Private Mode actually means in MeetingsAI

Open MeetingsAI, switch Private Mode on, put your phone in airplane mode, and record. You’ll still get a transcript, a summary, a translation, and even a todo list — all without sending a single byte outside your device.

This isn’t a marketing line. It’s an engineering contract: in Private Mode, every AI call is routed to a local engine. If no local path exists, the app tells you — instead of quietly phoning home.

That’s what “on-device AI” should mean. Rarely do apps actually ship it.

The stack behind Private Mode

We don’t bet on one model. Devices vary, OS features vary, and privacy-conscious users deserve flexibility. Private Mode runs a fleet of compact engines, automatically selected based on your device and available models.

Apple Intelligence Foundation Models. On iOS 26, we use Apple’s built-in foundation models for text generation, embeddings, and transcription in nine languages — all on-device, zero setup.
ExecuTorch with Llama and Qwen. For rich summaries and long-form chat, MeetingsAI uses quantized open models via ExecuTorch. Llama 3.2 (1B SpinQuant) is the default; Qwen 3 (1.7B and 4B) are available for deeper work on capable hardware.
llama.rn with SmolLM2. For lightweight tasks, llama.rn runs SmolLM2 (360M GGUF). It’s lightning fast and ideal for brief text generation on mid-range phones.
Whisper on-device. English transcription runs fully offline via ExecuTorch Whisper. No uploads, no queues, no rate limits.
ML Kit offline translation. Over 30 languages handled locally once their packs are installed.
MiniLM + sqliteVec for local RAG. “Ask your past meetings” uses MiniLM-L6-v2 embeddings stored in a local SQLite vector index. Every word stays on your device, searchable by meaning.
ExecuTorch OCR. PDFs and screenshots are processed locally, letting you include visual documents in your summaries and searches without any server round-trip.

Why multiple engines instead of one

No single on-device model fits every situation.

A mid-range Android from 2019 can’t handle a 4B model. A modern iPhone shouldn’t be throttled by a legacy 360M model. Users in Japan need translation paths US users may not. Private Mode dynamically picks the best option: Apple Intelligence first where supported, then ExecuTorch, then llama.rn. If something’s missing, the next engine steps up — while privacy guarantees stay constant.

What Gemma changes for us

Gemma’s arrival is great news — not competition.

Every big player shipping capable on-device models makes this ecosystem stronger. Better runtimes, richer quantization, more pressure on Apple and Google to refine their frameworks — and far more users who stop assuming AI must live in the cloud.

We’ll evaluate Gemma as just another candidate behind the same Private Mode switch, alongside Llama, Qwen, and SmolLM2. If it performs well on quality, speed, and footprint, it joins the fleet. No migration needed — users simply get one more model to choose from.

The real-world impact

On-device isn’t bragging rights. It’s what changes in everyday work.

A lawyer transcribes a client meeting mid-flight, in airplane mode, and walks off with a finished summary.
A clinician records a patient session and searches for prior symptom mentions — without uploading a single file.
An executive summarizes a board prep call from a hotel with bad WiFi.
A compliance-locked team finally uses an AI assistant that respects privacy by design, not by policy.

That’s the bar for meaningful AI in meetings — and we think we’re clearing it today.

Conclusion

Gemma didn’t invent on-device AI. It’s a strong model and a welcome milestone. But the relevant question isn’t which model wins headlines — it’s which products apply those models where privacy stakes are highest.

In meetings, the answer is simple:
The audio, the summary, and the search should live on your device by default.
Cloud should be a choice, not a requirement.

On-Device AI for Meetings: Everyone’s Talking About Gemma 4. We’re Already There

Introduction

Meetings are the hardest privacy case in AI

What Private Mode actually means in MeetingsAI

The stack behind Private Mode

Why multiple engines instead of one

What Gemma changes for us

The real-world impact

Conclusion

I. M.

Related Articles

On-Device AI Explained: How Private Mode Keeps Your Data Truly Private

10 Ways to Make Your Meetings More Productive with AI

Privacy-First AI: How MeetingsAI Keeps Your Conversations Secure