sherpa-onnx
Supported functions
Speech recognition | Speech synthesis |
---|---|
✔️ | ✔️ |
Speaker identification | Speaker diarization | Speaker verification |
---|---|---|
✔️ | ✔️ | ✔️ |
Spoken Language identification | Audio tagging | Voice activity detection |
---|---|---|
✔️ | ✔️ | ✔️ |
Keyword spotting | Add punctuation | Speech enhancement |
---|---|---|
✔️ | ✔️ | ✔️ |
Supported platforms
Architecture | Android | iOS | Windows | macOS | linux | HarmonyOS |
---|---|---|---|---|---|---|
x64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
x86 | ✔️ | ✔️ | ||||
arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
arm32 | ✔️ | ✔️ | ✔️ | |||
riscv64 | ✔️ |
Supported programming languages
1. C++ | 2. C | 3. Python | 4. JavaScript |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
5. Java | 6. C# | 7. Kotlin | 8. Swift |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
9. Go | 10. Dart | 11. Rust | 12. Pascal |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see sherpa-rs
It also supports WebAssembly.
Introduction
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker diarization
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64
, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- HarmonyOS
- NodeJS
- WebAssembly
- NVIDIA Jetson Orin NX (Support running on both CPU and GPU)
- NVIDIA Jetson Nano B01 (Support running on both CPU and GPU)
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日 X3 派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go,
C#
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
Links for Huggingface Spaces
You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.
| Description | URL | |-------------------------------------------------------|-----------------------------------------| | Speaker diarization | Click me| | Speech recognition | Click me | | Speech recognition with Whisper | Click me | | Speech synthesis | Click me | | Generate subtitles | Click me | | Audio tagging | Click me | | Spoken language identification with Whisper| Click me | We also have spaces built using WebAssembly. They are listed below: | Description | Huggingface space| ModelScope space| |------------------------------------------------------------------------------------------|------------------|-----------------| |Voice activity detection with silero-vad | Click me|地址| |Real-time speech recognition (Chinese + English) with Zipformer | Click me|地址| |Real-time speech recognition (Chinese + English) with Paraformer |Click me| 地址| |Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large|Click me| 地址| |Real-time speech recognition (English) |Click me |地址| |VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice|Click me| 地址| |VAD + speech recognition (English) with Whisper tiny.en|Click me| 地址| |VAD + speech recognition (English) with Moonshine tiny|Click me| 地址| |VAD + speech recognition (English) with Zipformer trained with GigaSpeech |Click me| 地址| |VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech |Click me| 地址| |VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech|Click me| 地址| |VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 |Click me| 地址| |VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model|Click me| 地址| |VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large |Click me| 地址| |VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small |Click me| 地址| |Speech synthesis (English) |Click me| 地址| |Speech synthesis (German) |Click me| 地址| |Speaker diarization |Click me|地址|Links for pre-built Android APKs
You can find pre-built Android APKs for this repository in the following table
| Description | URL | 中国用户 | |----------------------------------------|------------------------------------|-----------------------------------| | Speaker diarization | Address | 点此| | Streaming speech recognition | Address | 点此 | | Text-to-speech | Address | 点此 | | Voice activity detection (VAD) | Address | 点此 | | VAD + non-streaming speech recognition | Address | 点此 | | Two-pass speech recognition | Address | 点此 | | Audio tagging | Address | 点此 | | Audio tagging (WearOS) | Address | 点此 | | Speaker identification | Address | 点此 | | Spoken language identification | Address | 点此 | | Keyword spotting | Address | 点此 |Links for pre-built Flutter APPs
Links for pre-built Lazarus APPs
Links for pre-trained models
Some pre-trained ASR models (Streaming)
Some pre-trained ASR models (Non-Streaming)
Useful links
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Projects using sherpa-onnx
Open-LLM-VTuber
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
See also https://github.com/t41372/Open-LLM-VTuber/pull/50
voiceapi
Streaming ASR and TTS based on FastAPI
It shows how to use the ASR and TTS Python APIs with FastAPI.腾讯会议摸鱼工具 TMSpeech
Uses streaming ASR in C# with graphical user interface.
Video demo in Chinese: 【开源】Windows 实时字幕软件(网课/开会必备)
lol 互动助手
It uses the JavaScript API of sherpa-onnx along with Electron
Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!
Sherpa-ONNX 语音识别服务器
A server based on nodejs providing Restful API for speech recognition.
QSmartAssistant
一个模块化,全过程可离线,低占用率的对话机器人/智能音箱
It uses QT. Both ASR and TTS are used.
Flutter-EasySpeechRecognition
It extends ./flutter-examples/streaming_asr by downloading models inside the app to reduce the size of the app.
Note: [Team B] Sherpa AI backend also uses sherpa-onnx in a Flutter APP.
sherpa-onnx-unity
sherpa-onnx in Unity. See also #1695, #1892, and #1859
xiaozhi-esp32-server
本项目为 xiaozhi-esp32 提供后端服务,帮助您快速搭建 ESP32 设备控制服务器 Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
See also
KaithemAutomation
Pure Python, GUI-focused home automation/consumer grade SCADA.
It uses TTS from sherpa-onnx. See also ✨ Speak command that uses the new globally configured TTS model.