Connect Local Inference Backends
Local inference backends are useful for local GPUs, LAN inference servers, or workflows that keep text local. Neiroha does not train models; it forwards UI, queue, project, and local API requests to already running TTS services.
Pre-Connection Checklist
- Start the TTS backend and confirm the real listening address in the terminal or logs.
- On the machine running Neiroha, open the backend
/health,/v1/models, or voice list URL. - If Neiroha runs in an Android emulator, use
10.0.2.2for the host machine, not127.0.0.1. - If Neiroha runs on an Android phone, use the computer's LAN IP and allow the port through Windows Firewall.
- Return to Providers in Neiroha and add or edit a provider.
Common Adapters
| Backend Type | Neiroha Adapter | Base URL Example | Character Setup |
|---|---|---|---|
| OpenAI-compatible TTS | OpenAI TTS API Compatible | http://127.0.0.1:8880/v1 | Model and preset voice |
| GPT-SoVITS | GPT-SoVITS | http://127.0.0.1:9880 | Trained voice or reference-audio clone |
| CosyVoice3 | CosyVoice Native | http://127.0.0.1:9880 | Prompt clone, cross-lingual clone, instruct |
| VoxCPM2 | VoxCPM2 Native | http://127.0.0.1:8000 | Registered voice, voice design, clone |
| Windows system voice | Windows System TTS | Empty | Enumerates local Windows SAPI voices |
CosyVoice3 and GPT-SoVITS both default to port 9880. When running both, change one backend's [api].port in configs/server.toml, or use the random port chosen by the launcher and copy the logged address into Neiroha.
Backend guides:
Windows Portable Backend Packages
Local backends can be downloaded as portable Releases without a full development environment. The current Windows portable packages are built for NVIDIA GPU / CUDA environments and mainly target RTX 30 / 40 / 50 series users. Download all split archive parts into the same directory, then extract from .001 with 7-Zip.
| Backend | GitHub Release | Baidu Netdisk Mirror | Current Asset Pattern |
|---|---|---|---|
| GPT-SoVITS | V1.0.0 | Mirror | Neiroha-GPT-SoVITS-Portable.7z.001 through .003 |
| VoxCPM2 | V1.0.0 | Mirror | Neiroha-VoxCPM-portable.7z.001 through .004 |
| CosyVoice3 | V1.0.0 | Mirror | neiroha-cosyvoice3-portable.7z.001 through .006 |
Portable packages use runtime/ under the extracted directory for logs, outputs, temporary files, and voice registry. Do not move only one split part, and avoid long-term use from a system temporary directory.
Backend Selection Quick Reference
This table is a relative ranking for the current Neiroha Windows portable backends, not a universal hardware benchmark. More VRAM stars mean lower memory pressure; more speed stars mean faster synthesis. Actual results depend on GPU, driver, text length, reference audio, concurrency, and model preload state.
| Backend | VRAM Floor | VRAM Friendliness | Synthesis Speed | Good For | Notes |
|---|---|---|---|---|---|
| GPT-SoVITS v2ProPlus | 8 GB VRAM is safer | ★★★★★ | ★★★★★ | Trained voices, reference-audio cloning, batch generation | Lowest VRAM use and fastest among the three; clone mode needs reference text. |
| CosyVoice3 0.5B | 8 GB VRAM recommended | ★★★☆☆ | ★★★☆☆ | Cross-lingual cloning, instruction control, multilingual trials | Broader capability set with middle-ground speed and VRAM use. |
| VoxCPM2 | Official reference is about 8 GB VRAM | ★★☆☆☆ | ★★☆☆☆ | Voice design, multilingual and dialect coverage, high-fidelity cloning | Highest VRAM use and slowest among the three; 8 GB can run it, but start concurrency at 1. |
Source Environments and Multiple Backends
Neiroha local backend projects use Pixi to manage Python, Conda, PyPI dependencies, and common launch commands. When running multiple inference backends long-term on one machine, building each backend from source and downloading only the required model assets is usually easier to upgrade, debug, and keep under disk control than keeping several full portable packages.
Pixi's underlying ecosystem reuses rattler/Conda package cache and uv/PyPI cache, and can reuse files through hard links when available. Repeated dependencies across backends usually do not take a full duplicate copy. Model weights, sample voices, and runtime outputs are not automatically shared across projects; organize them by backend and model version.
OpenAI-Compatible Services
OpenAI-compatible TTS is the lowest-friction local protocol for Kokoro, XTTS, Orpheus, KoboldCpp, or a custom /v1/audio/speech wrapper.
- Select OpenAI TTS API Compatible.
- Set
Base URLto the API version layer, for examplehttp://127.0.0.1:8880/v1. - Leave
API Keyempty if the local service has no authentication. - Click Fetch All. Neiroha tries common list endpoints such as
models,audio/voices, andspeakers. - If the voice list is empty, manually fill the backend-supported voice name when creating a character.
- After Health Check passes, create a preset voice character and run Quick Test.
GPT-SoVITS
GPT-SoVITS is useful for trained speaker voices and reference-audio cloning.
- Start the backend: portable package uses
start_portable.bat serve; source environment usespixi run serve. - Select the GPT-SoVITS adapter.
- Set
Base URLto the service root, defaulthttp://127.0.0.1:9880. - Click Fetch All. The backend provides
/v1/models,/v1/audio/voices, and/api/gpt-sovits/voices. - Create characters with either:
- Registered voice: select a server voice such as
genshin-keqing. - Clone: upload reference audio and fill reference text, prompt language, and target text language.
- Registered voice: select a server voice such as
- Use it in Dialogue or Phase batches only after Quick Test succeeds.
CosyVoice Native
CosyVoice Native uses Neiroha's JSON / multipart adapter and does not need the backend to pretend to be a pure OpenAI service.
- Start the backend: portable package uses
start_portable.bat; source environment usespixi run serve. - Select CosyVoice Native.
- Set
Base URLto the service root, defaulthttp://127.0.0.1:9880. - Health Check calls
/health. - Fetch All reads
/v1/models,/v1/audio/voices, and/api/cosyvoice/voices. - Fill character fields by mode:
prompt_cloneneeds reference audio and prompt text;cross_lingualonly needs reference audio;instructneeds reference audio and instruction.
VoxCPM2 Native
VoxCPM2 Native supports registered voices, natural-language voice design, and reference-audio cloning.
- Start the backend: portable package uses
start_portable.bat; source environment usespixi run serve. - Select VoxCPM2 Native.
- Set
Base URLtohttp://127.0.0.1:8000or your actual service address. - Fetch All reads
/v1/models,/v1/audio/voices, and/api/voxcpm/voices. - Create characters using registered voice, design, clone, or ultimate clone.
cloneneeds reference audio but not reference text;ultimate_cloneneeds reference audio and matching prompt text.
Android Connecting to a Local Backend
| Neiroha Location | Backend Location | Base URL |
|---|---|---|
| Windows desktop Neiroha | Same Windows machine | http://127.0.0.1:port |
| Android emulator | Host Windows machine | http://10.0.2.2:port |
| Android phone | LAN computer | http://LAN-IP:port |
| Android phone | Public server | https://domain or public IP |
If a phone cannot access the service, open the same address in the phone browser first. If the browser also fails, check firewall, listening address, proxy, or LAN isolation.
Common Failures
| Symptom | Common Cause | Fix |
|---|---|---|
| Health Check fails | Wrong URL layer or unopened port | OpenAI-compatible usually includes /v1; native adapters usually use service root. |
| Emulator cannot reach host | Used 127.0.0.1 | Use 10.0.2.2. |
| Phone cannot reach computer | Firewall block or backend only listens on localhost | Bind backend to 0.0.0.0 and allow the port. |
| Fetch All is empty | Backend lacks list APIs or the port points to the wrong service | Open /v1/models and voice list manually, then fill model and voice if needed. |
| Batch generation stalls | Local VRAM or concurrency is too high | Start provider max concurrency at 1. |