VoxSherpa-TTS

Introduction: 🎙️ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android ⚡ Sherpa-ONNX powered 🔊 Natural voice synthesis 📱 Fully offline processing 🚀 No cloud • No limits
More: Author   ReportBugs   OfficialWebsite   
Tags:
VoxSherpa TTS Banner

Get it on Google Play

Support Android License Sherpa-ONNX Downloads

VoxSherpa TTS

Studio-quality offline neural text-to-speech for Android.
Hindi · English · British · Japanese · Chinese · and more — No cloud. No limits. No compromise.


VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx — the core inference library powering this app.

Sherpa-ONNX HuggingFace


Why VoxSherpa?

Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible — but they require internet, send your text to remote servers, and charge per character.

VoxSherpa breaks that tradeoff.

It runs two professional-grade neural engines entirely on your device:

Engine Quality Speed Best For
🧠 Kokoro-82M Studio-grade · rivals ElevenLabs Slower on budget hardware Audiobooks, voiceovers, professional content
Piper / VITS Natural · clear Fast on any device Daily use, quick synthesis

Screenshots

Generate Models Library Settings

Features

🎙️ Dual Neural Engine

  • Kokoro-82M — 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ languages. Same architecture used by top-tier commercial TTS services.
  • Piper / VITS — Fast, lightweight, natural. Generates speech in seconds on any Android device.

🔒 100% Offline & Private

  • All processing happens on your device
  • No internet required after model download
  • No account, no telemetry, no data collection
  • Your text never leaves your phone

📄 Document to Audio

  • PDF to Audio — listen to any document hands-free
  • TXT to Audio — convert plain text files instantly
  • Share any text directly to VoxSherpa from any app

📦 Model Management

  • Download models directly from the app
  • Filter voice models by language or type
  • Sample voice preview before selecting a model
  • Import your own .onnx models from local storage
  • Multiple models installed simultaneously
  • Smart storage tracking

🔊 System-Wide TTS

  • Set VoxSherpa as your default Android TTS engine
  • All downloaded models exposed to System TTS — use any voice in Chrome, WhatsApp, TalkBack, and more
  • Pitch & speed control in System TTS mode
  • Sample voice preview for all models

🎧 Audio Controls

  • Real-time waveform visualization
  • Adjustable speed and pitch
  • Interactive audio seeking with mini player controls
  • MediaStyle notification with full playback controls
  • Export as WAV with correct sample rate per model

📚 Speech Library

  • Save all generated audio locally
  • Favorites system for quick access
  • View generation history with timestamps
  • Voice model attribution per recording
  • Regenerate audio on voice change

⚙️ Smart Settings

  • Smart Punctuation — natural pauses after sentence breaks
  • Emotion Tags[whisper], [angry], [happy] support
  • Per-model voice selection (Kokoro supports 100+ speakers)
  • Theme-aware UI

Technical Architecture

User Text
    │
    ├─── Kokoro Engine (KokoroEngine.java)
    │         └── Sherpa-ONNX JNI → ONNX Runtime → CPU/NNAPI
    │                   └── kokoro-multi-lang-v1_0 (82M params, FP32)
    │
    └─── Piper / VITS Engine (VoiceEngine.java)
              └── Sherpa-ONNX JNI → ONNX Runtime → CPU
                        └── VITS model (language-specific)

Built with:

  • Sherpa-ONNX — on-device neural inference
  • Kokoro-82M — multilingual neural TTS model
  • Piper — fast local TTS
  • Android AudioTrack API — low-latency PCM playback

Performance

Generation speed depends entirely on your device's processor:

Device Tier Kokoro Piper
🟢 Flagship (Snapdragon 8 Gen 3) ~20–40 sec/min audio ~5 sec/min audio
🟡 Mid-range (8-core) ~60–90 sec/min audio ~10 sec/min audio
🔴 Budget (6-core) ~2–3 min/min audio ~20 sec/min audio

Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS — running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.


Installation

Get it on Google Play

Requirements: Android 11+ · ARM64 · ~500 MB free storage recommended (for models)


Model Import (Technical Users)

VoxSherpa supports importing custom .onnx models without any server:

  1. Place your .onnx model + tokens.txt on device storage
  2. Open Models tab → tap +Import Local Model
  3. Select your files

Compatible with any Sherpa-ONNX compatible TTS model.


Contributing

VoxSherpa is open source. Contributions welcome:

  • 🐛 Bug reports via Issues
  • 💡 Feature requests via Discussions
  • 🔧 Pull requests for fixes and improvements

License

Copyright (C) 2025 CodeBySonu95

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

https://www.gnu.org/licenses/gpl-3.0.html

Acknowledgements


Built with obsession. Runs without internet.

VoxSherpa — Because your voice deserves to stay yours.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools
AI Daily Digest