Neiroha VoxCPM2
This page covers the Neiroha VoxCPM2 local backend. Extract the portable Release package, or place the source repository anywhere; <backend-root> refers to that directory.
It provides OpenAI-compatible routes, native /api/voxcpm routes, a voice registry, Neiroha Gradio Admin, and optional official WebUI. Startup behavior is controlled by configs/server.toml.

Capability Summary
| Dimension | Current Notes |
|---|---|
| Recommended version | VoxCPM2 is the current official deployment version, 2B parameters, 48 kHz output. |
| Languages | Official list includes 30 languages: Arabic, Burmese, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Khmer, Korean, Lao, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Vietnamese. |
| Dialects | Official list includes 9 Chinese dialects: Sichuanese, Cantonese, Wu, Northeastern, Henan, Shaanxi, Shandong, Tianjin, Minnan. |
| Cross-language output | Supports multilingual synthesis and cross-language reference-audio cloning. |
| Text voice design | voxcpm2-design needs no reference audio; put age, gender, tone, emotion, and speed in natural-language text. |
| Controllable clone | voxcpm2-clone uses reference_audio and does not need prompt text. |
| Ultimate clone | voxcpm2-ultimate-clone needs prompt_audio and exact prompt_text; do not rely on bracket style control in this mode. |
| Official speed reference | Official PyTorch reports RTF around 0.30 on RTX 4090; Nano-vLLM / vLLM-Omni acceleration is around 0.13. |
| Boundaries | Very short text may sound weak. Long text can speed up, produce noise, fail to stop, or OOM; split by sentence for production. |
Default Addresses
| Service | Default Address | Purpose |
|---|---|---|
| FastAPI | http://127.0.0.1:8000 | Neiroha provider connects here. |
| Admin | http://127.0.0.1:7860 | Manage voices, model presets, tests, and logs. |
Install
Recommended portable flow. The current package is built for NVIDIA GPU / CUDA environments and mainly targets RTX 30 / 40 / 50 series users:
- Open Neiroha-VoxCPM V1.0.0 Release.
- Download all
V1.0.0split archives:Neiroha-VoxCPM-portable.7z.001,.002,.003,.004. - If GitHub downloads are unstable, use the Baidu Netdisk mirror from the Release body.
- Put all four files in the same directory and extract from
.001with 7-Zip. - Run
start_portable.bat.
Source or development environment:
pixi install
pixi run install
Optional ASR model for automatic prompt text transcription in ultimate clone:
pixi run install-asr
ASR is disabled by default. ultimate_clone normally needs manual prompt text.
Start
Portable Release:
.\start_portable.bat
Source environment:
pixi run serve
Common tasks:
| Command | Purpose |
|---|---|
pixi run serve | Start according to configs/server.toml [startup].surface, default API + Admin |
pixi run api | Start FastAPI only |
pixi run admin | Start Neiroha Admin and connect to an existing FastAPI |
pixi run smoke | Check /health, /v1/models, /v1/audio/voices, and capabilities |
pixi run test | Run backend tests |
pixi run launcher-help | Show launcher arguments |
Connect Neiroha
- Open Providers.
- Create a provider, adapter type VoxCPM2 Native.
- Set
Base URLtohttp://127.0.0.1:8000. - Leave
API Keyempty if local auth is disabled. - Click Fetch All.
- Confirm
voxcpm2-design,voxcpm2-clone, andvoxcpm2-ultimate-clone. - Enable the provider and click Health Check.
Android emulator host URL:
http://10.0.2.2:8000
Character Setup
| Goal | Setting |
|---|---|
| Text voice design | Select voxcpm2-design; write natural-language voice description at the beginning of the text. |
| Reference clone | Select voxcpm2-clone; provide reference audio, no prompt text required. |
| High-fidelity clone | Select voxcpm2-ultimate-clone; provide prompt audio and prompt text. |
| Reusable local speaker | Register a voice in Admin or /api/voxcpm/voices, then select it in Neiroha. |
VoxCPM2 recommends natural-language style prompts, for example:
(A young woman, gentle and sweet voice)Hello, welcome to VoxCPM2.
Do not depend on undocumented square-bracket tokens.
API Prefix
OpenAI-compatible routes:
| Method | Path | Purpose |
|---|---|---|
GET | /health | Health check |
GET | /v1/models | List voice sets |
GET | /v1/audio/voices | List voice profiles |
GET | /v1/audio/speakers | Speaker-list compatibility |
POST | /v1/audio/speech | OpenAI-compatible synthesis |
Native prefix is /api/voxcpm. Legacy /voxcpm/* routes remain for compatibility; new integrations should use the standard prefix.