Skip to main content

Neiroha — Audio API Reference

Neiroha exposes audio APIs in two places:

  1. Neiroha built-in API Server: exposes the active voice banks in the app as an OpenAI-compatible TTS service.
  2. Provider upstream adapters: the app calls local or cloud TTS backends through adapter-specific routes.

1. Built-In API Server

The built-in server defaults to 127.0.0.1:8976 and can be toggled from Settings -> API Server. Bind to 0.0.0.0 only when LAN access is intentional.

Security and Runtime Controls

SettingDefaultNotes
Bind host127.0.0.1Loopback only by default
Port8976Restart the server after changing
API keyemptyWhen set, requests must send Authorization: Bearer <key> or X-API-Key: <key>
CORS originsemptyEmpty denies browser cross-origin access; * allows any origin
Rate limit60 req/min/IP0 disables
Max body size1048576 bytes0 disables declared Content-Length checks
API loggingoffLogs metadata only; request bodies and auth headers are not logged

Every synthesis request goes through the shared TtsQueueService, so provider concurrency and rate limits apply to both the desktop UI and external API clients.

Voice Bank as Model

The built-in API uses voice banks as the model abstraction:

  • Active voice banks appear in /v1/models.
  • The bank name is used as the model value in API requests.
  • /v1/audio/voices and /speakers list voices from active banks only.

Endpoints

MethodPathDescription
POST/v1/audio/speechSynthesize speech from text
GET/v1/audio/voicesList voices from active voice banks
GET/v1/modelsList active voice banks as OpenAI-style models
GET/speakersSillyTavern-style speaker list
GET/healthHealth check

POST /v1/audio/speech

{
"input": "Text to synthesize",
"model": "My Bank",
"voice": "character_name",
"speed": 1.0,
"response_format": "wav"
}
FieldTypeRequiredDescription
inputstringyesText to synthesize
voicestringyesVoice character name
modelstringnoVoice bank name; scopes voice lookup when provided
speednumbernoPlayback speed multiplier, default 1.0
response_formatstringnoOutput format hint passed to the upstream adapter

The response is raw audio bytes with a format-specific Content-Type.

Common errors:

StatusMeaning
400Missing input or voice
401Missing or invalid API key when auth is configured
413Request body exceeds the configured limit
429Per-IP request budget exceeded
404Voice character not found
500Provider not found or upstream synthesis failed

2. Upstream Provider Adapters

Provider base URL rules depend on the adapter. OpenAI-compatible services usually point to /v1; Neiroha native local backends usually use the service root.

OpenAI TTS API Compatible

For any service exposing the standard OpenAI TTS API.

OperationMethodRelative Path
SynthesizePOST/audio/speech
Health check / modelsGET/models
VoicesGET/audio/voices, then /speakers as fallback

Example payload:

{
"model": "tts-1",
"input": "text",
"voice": "alloy",
"speed": 1.0,
"response_format": "wav"
}

Chat Completions TTS

For TTS providers that return audio through Chat Completions, such as MiMo-style audio models.

OperationMethodRelative Path
SynthesizePOST/chat/completions
Health check / modelsGET/models
VoicesGET/speakers

Neiroha reads base64 audio from choices[0].message.audio.data. MiMo-style providers use the api-key header by default.

CosyVoice Native

For the Neiroha CosyVoice3 local backend. The default service root is http://127.0.0.1:9880; if the launcher falls back to a random port, use the address printed in the backend log.

Stable OpenAI routes:

MethodPathDescription
GET/healthHealth check
GET/v1/modelsList voice sets
GET/v1/audio/voicesList voice profiles
POST/v1/audio/speechSynthesize with a registered voice

Standard native routes:

MethodPathDescription
GET/api/cosyvoice/voicesList registered voices
GET/api/cosyvoice/metaBackend metadata
GET/api/cosyvoice/capabilitiesModes, fields, and upload support
GET/api/cosyvoice/logsRuntime logs
POST/api/cosyvoice/ttsJSON synthesis
POST/api/cosyvoice/tts/uploadUpload reference audio and synthesize

Legacy /cosyvoice/* and /cosyvoice3/* routes remain for compatibility. New integrations should prefer /api/cosyvoice/*.

JSON synthesis example:

{
"text": "Text to synthesize",
"model": "default",
"voice": "prompt-clone",
"mode": "zero_shot",
"speed": 1.0,
"response_format": "wav",
"prompt_audio_path": "/path/to/voices/demo.wav",
"prompt_text": "reference text",
"instruct_text": "Read in a gentle and calm tone"
}
ModeDescriptionRequired Fields
zero_shot / prompt_clonePrompt clonereference audio + prompt_text
cross_lingualCross-lingual clonereference audio
instructInstruction controlreference audio + instruct_text

GPT-SoVITS

For the Neiroha GPT-SoVITS local backend. The default service root is http://127.0.0.1:9880; if it conflicts with another backend, use the actual port printed in the log.

Stable OpenAI routes:

MethodPathDescription
GET/healthHealth check
GET/v1/modelsList voice sets
GET/v1/audio/voicesList voice profiles
POST/v1/audio/speechSynthesize with a registered voice

Standard native routes:

MethodPathDescription
GET/api/gpt-sovits/modelsList model presets / low-level weights
GET/api/gpt-sovits/voicesList voice profiles
GET/api/gpt-sovits/capabilitiesClone and audio normalization support
GET/api/gpt-sovits/logsRuntime logs
POST/api/gpt-sovits/ttsNative JSON synthesis
POST/api/gpt-sovits/cloneJSON clone request
POST/api/gpt-sovits/clone/uploadUpload reference audio and clone
POST/api/gpt-sovits/loadLoad a preset
POST/api/gpt-sovits/unloadUnload the current model
POST/api/gpt-sovits/reloadReload the current model

Legacy /gpt-sovits/* and /tts routes remain for compatibility. New integrations should prefer /api/gpt-sovits/*.

Clone example:

{
"input": "Text to synthesize",
"speaker": "clone",
"text_lang": "zh",
"ref_audio_path": "/path/to/ref.wav",
"prompt_text": "reference text",
"prompt_lang": "zh",
"speed": 1.0,
"response_format": "wav"
}

VoxCPM2 Native

For the Neiroha VoxCPM2 local backend. The default service root is http://127.0.0.1:8000.

Stable OpenAI routes:

MethodPathDescription
GET/healthHealth check
GET/v1/modelsList voice sets
GET/v1/audio/voicesList voice profiles
GET/v1/audio/speakersSpeaker-list compatibility
POST/v1/audio/speechOpenAI-compatible synthesis

Standard native routes:

MethodPathDescription
GET/api/voxcpm/modelsList model presets / low-level models
GET/api/voxcpm/capabilitiesModes, aliases, fields, and upload support
GET/api/voxcpm/metaBackend metadata
GET/api/voxcpm/logsRuntime logs
POST/api/voxcpm/loadLoad a preset
POST/api/voxcpm/unloadUnload the current model
POST/api/voxcpm/reloadReload the current model
POST/api/voxcpm/ttsNative JSON synthesis
POST/api/voxcpm/tts/uploadUpload reference / prompt audio and synthesize
GET/api/voxcpm/voicesList registered voices
POST/api/voxcpm/voicesCreate or update a voice
GET/api/voxcpm/voices/{voice_id}Fetch one voice
DELETE/api/voxcpm/voices/{voice_id}Delete one voice

Legacy /voxcpm/* routes remain for compatibility. New integrations should prefer /api/voxcpm/*.

OpenAI extension fields:

FieldDescription
reference_audio / ref_audioReference audio: local path, file://, http(s)://, or data:audio/...;base64,...
prompt_audioPrompt audio for ultimate clone
prompt_text / ref_textTranscript for prompt audio
modedesign, clone, ultimate_clone, or compatible aliases
instructionNatural-language voice description
auto_asrUse optional ASR to transcribe prompt text
cfg_value, inference_timesteps, normalize, denoiseVoxCPM2 inference controls

Azure Speech Service

OperationMethodPath
SynthesizePOST/cognitiveservices/v1
Health check / voicesGET/cognitiveservices/voices/list

Base URL can be a region like eastus or an endpoint such as https://eastus.tts.speech.microsoft.com. Neiroha sends the API key as Ocp-Apim-Subscription-Key.

Google Gemini TTS

Gemini TTS uses a Google AI Studio API key. Set the provider URL to https://generativelanguage.googleapis.com, then choose a Gemini TTS model and voice.

Windows System TTS

Windows desktop uses system SAPI voices and needs no base URL or API key. Android and Linux system TTS providers remain hidden until native platform adapters exist.

3. Response Headers and Troubleshooting

Neiroha local backends usually include these audio response headers. Exact fields vary by backend:

HeaderMeaning
X-Neiroha-BackendBackend name
X-Neiroha-Model-PresetLow-level model preset
X-Neiroha-VoiceActual voice used
X-Neiroha-Sample-RateOutput sample rate
X-Neiroha-Inference-MsInference time
X-Neiroha-Audio-SecondsOutput duration
X-Neiroha-Output-PathServer-side output file path
X-Neiroha-RTFMeasured local RTF

Troubleshooting order:

  1. Open /health in a browser or with curl.
  2. Check /v1/models and /v1/audio/voices.
  3. In Providers, click Fetch All and confirm models and voices are cached.
  4. Test one sentence in Quick TTS before running Dialogue, Phase, or Video batches.