Building On-Device AI: How We Built a Real-Time Translator with Whisper

Building AI that runs on a phone — without any server calls — is one of the most interesting engineering challenges in mobile development today. This is the story of how we built Interpreter, our real-time voice translation app.

The Problem

Most translation apps rely on cloud APIs. This creates three problems:

Latency — every translation round-trips to a server
Privacy — your voice data leaves your device
Offline — no internet, no translation

We wanted to solve all three.

The Stack

We chose Kotlin Multiplatform (KMP) for Android and iOS code sharing, with the following AI components:

Whisper.cpp via sherpa-onnx for speech-to-text
Google ML Kit Translate for on-device translation (40+ language pairs)
Piper TTS (also via sherpa-onnx) for text-to-speech

The key insight: all three components can run fully on-device.

Performance Results

After optimizing quantized Whisper models (using whisper-base instead of whisper-small):

Metric	Result
STT latency	~2s
Translation	~100ms
TTS generation	~1.5s
End-to-end	~4s

Sub-5-second end-to-end on a mid-range Android device.

Lessons Learned

Model quantization matters. Moving from whisper-small to a quantized whisper-base.en reduced file size by 60% with minimal accuracy loss for supported languages.

On-device ≠ slow. Modern mobile NPUs are surprisingly capable. The key is choosing the right model architecture for the task.

KMP is production-ready. Sharing business logic across Android and iOS via Kotlin Multiplatform significantly reduced duplicated code.

We’re continuing to improve Interpreter. If you’re building something similar or want to discuss on-device AI architectures, get in touch.