On-Device AI for Meetings: Everyone’s Talking About Gemma 4. We’re Already There
Google Gemma 4 made on-device AI the hottest topic again. MeetingsAI has already been running fully private, on-device AI for months, purpose-built for the most sensitive workflows: your meetings.
I. M.

Introduction
Google’s on-device Gemma is the headline of the week — lighting up timelines, newsletters, and every “the future is local” hot take out there.
Here’s the quieter side of that story: at MeetingsAI, we’ve been running on-device AI in production for months. Not as a demo. Not as a toggle that secretly pings a cloud server. Real transcription, real summaries, real chat over your past meetings — all happening entirely on the device in your pocket.
That difference matters the moment your meeting involves anything you wouldn’t paste into ChatGPT.
Meetings are the hardest privacy case in AI
Most AI privacy debates are theoretical. Meetings make them painfully real.
In a single recording, you might capture:
- NDAs and unreleased roadmaps
- Salary numbers and performance reviews
- Legal strategy or settlement ranges
- M&A discussions and board-level decisions
- Patient details, therapy notes, or medical records
- Customer PII your legal team would rather you never send to a third party
Cloud transcription tools ask you to trust a maze of vendors, regional data centers, and retention policies. For many teams, that trust isn’t available — and “we promise to delete it” isn’t a compliance answer.
On-device AI changes the question.
The audio never leaves. The summary never leaves. The chat with your past meetings never leaves.
What Private Mode actually means in MeetingsAI
Open MeetingsAI, switch Private Mode on, put your phone in airplane mode, and record. You’ll still get a transcript, a summary, a translation, and even a todo list — all without sending a single byte outside your device.
This isn’t a marketing line. It’s an engineering contract: in Private Mode, every AI call is routed to a local engine. If no local path exists, the app tells you — instead of quietly phoning home.
That’s what “on-device AI” should mean. Rarely do apps actually ship it.
The stack behind Private Mode
We don’t bet on one model. Devices vary, OS features vary, and privacy-conscious users deserve flexibility. Private Mode runs a fleet of compact engines, automatically selected based on your device and available models.
- Apple Intelligence Foundation Models. On iOS 26, we use Apple’s built-in foundation models for text generation, embeddings, and transcription in nine languages — all on-device, zero setup.
- ExecuTorch with Llama and Qwen. For rich summaries and long-form chat, MeetingsAI uses quantized open models via ExecuTorch. Llama 3.2 (1B SpinQuant) is the default; Qwen 3 (1.7B and 4B) are available for deeper work on capable hardware.
- llama.rn with SmolLM2. For lightweight tasks, llama.rn runs SmolLM2 (360M GGUF). It’s lightning fast and ideal for brief text generation on mid-range phones.
- Whisper on-device. English transcription runs fully offline via ExecuTorch Whisper. No uploads, no queues, no rate limits.
- ML Kit offline translation. Over 30 languages handled locally once their packs are installed.
- MiniLM + sqliteVec for local RAG. “Ask your past meetings” uses MiniLM-L6-v2 embeddings stored in a local SQLite vector index. Every word stays on your device, searchable by meaning.
- ExecuTorch OCR. PDFs and screenshots are processed locally, letting you include visual documents in your summaries and searches without any server round-trip.
Why multiple engines instead of one
No single on-device model fits every situation.
A mid-range Android from 2019 can’t handle a 4B model. A modern iPhone shouldn’t be throttled by a legacy 360M model. Users in Japan need translation paths US users may not. Private Mode dynamically picks the best option: Apple Intelligence first where supported, then ExecuTorch, then llama.rn. If something’s missing, the next engine steps up — while privacy guarantees stay constant.
What Gemma changes for us
Gemma’s arrival is great news — not competition.
Every big player shipping capable on-device models makes this ecosystem stronger. Better runtimes, richer quantization, more pressure on Apple and Google to refine their frameworks — and far more users who stop assuming AI must live in the cloud.
We’ll evaluate Gemma as just another candidate behind the same Private Mode switch, alongside Llama, Qwen, and SmolLM2. If it performs well on quality, speed, and footprint, it joins the fleet. No migration needed — users simply get one more model to choose from.
The real-world impact
On-device isn’t bragging rights. It’s what changes in everyday work.
- A lawyer transcribes a client meeting mid-flight, in airplane mode, and walks off with a finished summary.
- A clinician records a patient session and searches for prior symptom mentions — without uploading a single file.
- An executive summarizes a board prep call from a hotel with bad WiFi.
- A compliance-locked team finally uses an AI assistant that respects privacy by design, not by policy.
That’s the bar for meaningful AI in meetings — and we think we’re clearing it today.
Conclusion
Gemma didn’t invent on-device AI. It’s a strong model and a welcome milestone. But the relevant question isn’t which model wins headlines — it’s which products apply those models where privacy stakes are highest.
In meetings, the answer is simple:
The audio, the summary, and the search should live on your device by default.
Cloud should be a choice, not a requirement.
I. M.
Related Articles

On-Device AI Explained: How Private Mode Keeps Your Data Truly Private
Discover how MeetingsAI's Private Mode uses on-device AI to transcribe and summarize your meetings without any data leaving your phone. True privacy, zero compromise.

10 Ways to Make Your Meetings More Productive with AI
Discover 10 practical ways AI can save you 4+ hours a week, improve accuracy, and turn meetings into clear action.

Privacy-First AI: How MeetingsAI Keeps Your Conversations Secure
Learn how MeetingsAI protects your conversations with privacy-first design, user control, and secure AI processing.