Skip to main content

Neiroha CosyVoice3

This page covers the Neiroha CosyVoice3 local backend. Extract the portable Release package, or place the source repository anywhere; <backend-root> refers to that directory.

It provides FastAPI, Gradio Admin, TOML model presets, TOML voice sets, an OpenAI-compatible TTS API, and native /api/cosyvoice routes. The current version uses an independent CosyVoice3 backend, with official FunAudioLLM/CosyVoice as a submodule.

Neiroha CosyVoice3 Admin home
The backend Admin shows API state, voice sets, model presets, clone configuration, and logs.

Capability Summary

DimensionCurrent Notes
Recommended versionDefault is Fun-CosyVoice3-0.5B, output sample rate 24 kHz.
LanguagesOfficial model card lists 9 common languages: Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian.
Dialects / accentsOfficial model card mentions 18+ Chinese dialects / accents, including Guangdong, Minnan, Sichuan, Northeast, Shanxi / Shaanxi, Shanghai, Tianjin, Shandong, Ningxia, and Gansu.
Cross-language outputSupports multilingual / cross-lingual zero-shot voice cloning. Target language should still stay within the official 9 languages.
prompt cloneprompt-clone / zero_shot needs reference audio and prompt text matching that audio.
cross-lingual clonecross-lingual-clone needs reference audio only; target text decides output language.
instruct cloneinstruct-clone needs reference audio and instruction for speed, emotion, dialect, volume, and similar controls.
Official speed referenceOfficial model card emphasizes bi-streaming and latency down to around 150 ms, but does not provide one unified PyTorch RTF table.
BoundariesRare words, tongue twisters, and specialized terms can be unstable; emotion control depends heavily on text semantics.

Default Addresses

ServiceDefault AddressPurpose
FastAPIhttp://127.0.0.1:9880Neiroha provider connects here.
Adminhttp://127.0.0.1:7880Manage voice sets, clone config, model presets, downloads, and logs.

CosyVoice3 and GPT-SoVITS both default to API port 9880. Change one backend port in configs/server.toml, or use the random port selected by the launcher.

Install

Download Windows portable packages from Neiroha-Cosyvoice V1.0.0 Release. The current package is built for NVIDIA GPU / CUDA environments and mainly targets RTX 30 / 40 / 50 series users. Current split archive names are neiroha-cosyvoice3-portable.7z.001 through .006; if GitHub downloads are unstable, use the Baidu Netdisk mirror from the Release body. Put all six parts in the same directory and extract from .001.

Source or development environment:

pixi install
pixi run submodule-init
pixi run install

pixi run install downloads the CosyVoice3 model to:

models/Fun-CosyVoice3-0.5B

Optional resources:

pixi run install-wetext
pixi run install-ttsfrd

Start

Portable Release:

.\start_portable.bat

Source environment:

pixi run serve

Common tasks:

CommandPurpose
pixi run serveStart according to configs/server.toml [startup].surface, default API + Admin
pixi run apiStart FastAPI only
pixi run adminStart Gradio Admin only and connect to an existing FastAPI
pixi run smokeCheck /health, /v1/models, /v1/audio/voices, and synthesis
pixi run clone-smokeCheck clone flows
pixi run testRun backend tests
pixi run launcher-helpShow launcher arguments

Default [startup].preload_model = true, so first startup loads the model.

Connect Neiroha

  1. Open Providers.
  2. Create a provider, adapter type CosyVoice Native.
  3. Set Base URL to http://127.0.0.1:9880, or the actual logged address.
  4. Leave API Key empty if local auth is disabled.
  5. Click Fetch All.
  6. Confirm prompt-clone, cross-lingual-clone, and instruct-clone.
  7. Enable the provider and click Health Check.

Android emulator host URL:

http://10.0.2.2:9880

Character Setup

GoalSetting
Zero-shot prompt cloneSelect prompt-clone, provide reference audio and prompt text.
Cross-language cloneSelect cross-lingual-clone, provide reference audio and target-language text.
Instruction controlSelect instruct-clone, write voice requirements in instruction.
Custom reusable voiceUpload reference audio in Admin clone config and save a new voice.

Use clean short reference audio without background music. prompt_clone requires prompt text; instruct requires instruction.

API Prefix

OpenAI-compatible routes:

MethodPathPurpose
GET/healthHealth check
GET/v1/modelsList voice sets
GET/v1/audio/voicesList voice profiles
POST/v1/audio/speechSynthesize with registered voice

Native prefix is /api/cosyvoice. Legacy /cosyvoice/* and /cosyvoice3/* remain for compatibility; new integrations should prefer /api/cosyvoice.

Sources