Back to Blog
On-Device AI Whisper Kotlin Mobile ML

Building On-Device AI: How We Built a Real-Time Translator with Whisper

A deep dive into how we built Interpreter — a speech-to-speech translation app that runs entirely on-device using Whisper and Piper AI models, with sub-5-second latency.

Building AI that runs on a phone — without any server calls — is one of the most interesting engineering challenges in mobile development today. This is the story of how we built Interpreter, our real-time voice translation app.

The Problem

Most translation apps rely on cloud APIs. This creates three problems:

  1. Latency — every translation round-trips to a server
  2. Privacy — your voice data leaves your device
  3. Offline — no internet, no translation

We wanted to solve all three.

The Stack

We chose Kotlin Multiplatform (KMP) for Android and iOS code sharing, with the following AI components:

  • Whisper.cpp via sherpa-onnx for speech-to-text
  • Google ML Kit Translate for on-device translation (40+ language pairs)
  • Piper TTS (also via sherpa-onnx) for text-to-speech

The key insight: all three components can run fully on-device.

Performance Results

After optimizing quantized Whisper models (using whisper-base instead of whisper-small):

MetricResult
STT latency~2s
Translation~100ms
TTS generation~1.5s
End-to-end~4s

Sub-5-second end-to-end on a mid-range Android device.

Lessons Learned

Model quantization matters. Moving from whisper-small to a quantized whisper-base.en reduced file size by 60% with minimal accuracy loss for supported languages.

On-device ≠ slow. Modern mobile NPUs are surprisingly capable. The key is choosing the right model architecture for the task.

KMP is production-ready. Sharing business logic across Android and iOS via Kotlin Multiplatform significantly reduced duplicated code.


We’re continuing to improve Interpreter. If you’re building something similar or want to discuss on-device AI architectures, get in touch.