VoxSherpa-TTS
VoxSherpa TTS
Studio-quality offline neural text-to-speech for Android.
Hindi · English · British · Japanese · Chinese · and more — No cloud. No limits. No compromise.
VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx — the core inference library powering this app.
Why VoxSherpa?
Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible — but they require internet, send your text to remote servers, and charge per character.
VoxSherpa breaks that tradeoff.
It runs two professional-grade neural engines entirely on your device:
| Engine | Quality | Speed | Best For |
|---|---|---|---|
| 🧠 Kokoro-82M | Studio-grade · rivals ElevenLabs | Slower on budget hardware | Audiobooks, voiceovers, professional content |
| ⚡ Piper / VITS | Natural · clear | Fast on any device | Daily use, quick synthesis |
Screenshots
| Generate | Models | Library | Settings |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Features
🎙️ Dual Neural Engine
- Kokoro-82M — 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ languages. Same architecture used by top-tier commercial TTS services.
- Piper / VITS — Fast, lightweight, natural. Generates speech in seconds on any Android device.
🔒 100% Offline & Private
- All processing happens on your device
- No internet required after model download
- No account, no telemetry, no data collection
- Your text never leaves your phone
📄 Document to Audio
- PDF to Audio — listen to any document hands-free
- TXT to Audio — convert plain text files instantly
- Share any text directly to VoxSherpa from any app
📦 Model Management
- Download models directly from the app
- Filter voice models by language or type
- Sample voice preview before selecting a model
- Import your own
.onnxmodels from local storage - Multiple models installed simultaneously
- Smart storage tracking
🔊 System-Wide TTS
- Set VoxSherpa as your default Android TTS engine
- All downloaded models exposed to System TTS — use any voice in Chrome, WhatsApp, TalkBack, and more
- Pitch & speed control in System TTS mode
- Sample voice preview for all models
🎧 Audio Controls
- Real-time waveform visualization
- Adjustable speed and pitch
- Interactive audio seeking with mini player controls
- MediaStyle notification with full playback controls
- Export as WAV with correct sample rate per model
📚 Speech Library
- Save all generated audio locally
- Favorites system for quick access
- View generation history with timestamps
- Voice model attribution per recording
- Regenerate audio on voice change
⚙️ Smart Settings
- Smart Punctuation — natural pauses after sentence breaks
- Emotion Tags —
[whisper],[angry],[happy]support - Per-model voice selection (Kokoro supports 100+ speakers)
- Theme-aware UI
Technical Architecture
User Text
│
├─── Kokoro Engine (KokoroEngine.java)
│ └── Sherpa-ONNX JNI → ONNX Runtime → CPU/NNAPI
│ └── kokoro-multi-lang-v1_0 (82M params, FP32)
│
└─── Piper / VITS Engine (VoiceEngine.java)
└── Sherpa-ONNX JNI → ONNX Runtime → CPU
└── VITS model (language-specific)
Built with:
- Sherpa-ONNX — on-device neural inference
- Kokoro-82M — multilingual neural TTS model
- Piper — fast local TTS
- Android AudioTrack API — low-latency PCM playback
Performance
Generation speed depends entirely on your device's processor:
| Device Tier | Kokoro | Piper |
|---|---|---|
| 🟢 Flagship (Snapdragon 8 Gen 3) | ~20–40 sec/min audio | ~5 sec/min audio |
| 🟡 Mid-range (8-core) | ~60–90 sec/min audio | ~10 sec/min audio |
| 🔴 Budget (6-core) | ~2–3 min/min audio | ~20 sec/min audio |
Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS — running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.
Installation
Requirements: Android 11+ · ARM64 · ~500 MB free storage recommended (for models)
Model Import (Technical Users)
VoxSherpa supports importing custom .onnx models without any server:
- Place your
.onnxmodel +tokens.txton device storage - Open Models tab → tap + → Import Local Model
- Select your files
Compatible with any Sherpa-ONNX compatible TTS model.
Contributing
VoxSherpa is open source. Contributions welcome:
- 🐛 Bug reports via Issues
- 💡 Feature requests via Discussions
- 🔧 Pull requests for fixes and improvements
License
Copyright (C) 2025 CodeBySonu95
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
https://www.gnu.org/licenses/gpl-3.0.html
Acknowledgements
- k2-fsa/sherpa-onnx — the inference engine that makes this possible
- hexgrad/Kokoro-82M — the neural model behind studio-quality synthesis
- rhasspy/piper — fast local TTS engine
Built with obsession. Runs without internet.
VoxSherpa — Because your voice deserves to stay yours.




