VoxSherpa-TTS

Introduction: 🎙️ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android ⚡ Sherpa-ONNX powered 🔊 Natural voice synthesis 📱 Fully offline processing 🚀 No cloud • No limits

More: Author ReportBugs OfficialWebsite

Tags:

VoxSherpa TTS

Studio-quality offline neural text-to-speech for Android.
Hindi · English · British · Japanese · Chinese · and more — No cloud. No limits. No compromise.

VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx — the core inference library powering this app.

Why VoxSherpa?

Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible — but they require internet, send your text to remote servers, and charge per character.

VoxSherpa breaks that tradeoff.

It runs two professional-grade neural engines entirely on your device:

Engine	Quality	Speed	Best For
🧠 Kokoro-82M	Studio-grade · rivals ElevenLabs	Slower on budget hardware	Audiobooks, voiceovers, professional content
⚡ Piper / VITS	Natural · clear	Fast on any device	Daily use, quick synthesis

Screenshots

Generate	Models	Library	Settings

Features

🎙️ Dual Neural Engine

Kokoro-82M — 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ languages. Same architecture used by top-tier commercial TTS services.
Piper / VITS — Fast, lightweight, natural. Generates speech in seconds on any Android device.

🔒 100% Offline & Private

All processing happens on your device
No internet required after model download
No account, no telemetry, no data collection
Your text never leaves your phone

📄 Document to Audio

PDF to Audio — listen to any document hands-free
TXT to Audio — convert plain text files instantly
Share any text directly to VoxSherpa from any app

📦 Model Management

Download models directly from the app
Filter voice models by language or type
Sample voice preview before selecting a model
Import your own .onnx models from local storage
Multiple models installed simultaneously
Smart storage tracking

🔊 System-Wide TTS

Set VoxSherpa as your default Android TTS engine
All downloaded models exposed to System TTS — use any voice in Chrome, WhatsApp, TalkBack, and more
Pitch & speed control in System TTS mode
Sample voice preview for all models

🎧 Audio Controls

Real-time waveform visualization
Adjustable speed and pitch
Interactive audio seeking with mini player controls
MediaStyle notification with full playback controls
Export as WAV with correct sample rate per model

📚 Speech Library

Save all generated audio locally
Favorites system for quick access
View generation history with timestamps
Voice model attribution per recording
Regenerate audio on voice change

⚙️ Smart Settings

Smart Punctuation — natural pauses after sentence breaks
Emotion Tags — [whisper], [angry], [happy] support
Per-model voice selection (Kokoro supports 100+ speakers)
Theme-aware UI

Technical Architecture

User Text
    │
    ├─── Kokoro Engine (KokoroEngine.java)
    │         └── Sherpa-ONNX JNI → ONNX Runtime → CPU/NNAPI
    │                   └── kokoro-multi-lang-v1_0 (82M params, FP32)
    │
    └─── Piper / VITS Engine (VoiceEngine.java)
              └── Sherpa-ONNX JNI → ONNX Runtime → CPU
                        └── VITS model (language-specific)

Built with:

Sherpa-ONNX — on-device neural inference
Kokoro-82M — multilingual neural TTS model
Piper — fast local TTS
Android AudioTrack API — low-latency PCM playback

Performance

Generation speed depends entirely on your device's processor:

Device Tier	Kokoro	Piper
🟢 Flagship (Snapdragon 8 Gen 3)	~20–40 sec/min audio	~5 sec/min audio
🟡 Mid-range (8-core)	~60–90 sec/min audio	~10 sec/min audio
🔴 Budget (6-core)	~2–3 min/min audio	~20 sec/min audio

Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS — running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.

Installation

Requirements: Android 11+ · ARM64 · ~500 MB free storage recommended (for models)

Model Import (Technical Users)

VoxSherpa supports importing custom .onnx models without any server:

Place your .onnx model + tokens.txt on device storage
Open Models tab → tap + → Import Local Model
Select your files

Compatible with any Sherpa-ONNX compatible TTS model.

Contributing

VoxSherpa is open source. Contributions welcome:

🐛 Bug reports via Issues
💡 Feature requests via Discussions
🔧 Pull requests for fixes and improvements

License

Copyright (C) 2025 CodeBySonu95

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

https://www.gnu.org/licenses/gpl-3.0.html