Text-to-Speech (TTS)

Speech synthesis with Gemini TTS and ElevenLabs TTS

Gemini TTS (Multilingual)

Best results in English. Supports 24 common languages. Returns raw PCM audio data.

Request:

POST /v2/extend/tts/gemini/synthesize

Parameters:

Parameter	Type	Required	Description
text	string	Yes	Text to synthesize (≤ 10,000 characters)
voice_name	string	No	Voice name (e.g. Kore, Puck, Charon)
prompt	string	No	Voice style hint (e.g. "speak slowly and clearly")
language_code	string	No	Language code (e.g. en-US, zh-CN)
temperature	number	No	Controls randomness

Example:

curl https://tokenhub.piegateway.me/v2/extend/tts/gemini/synthesize \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test.", "voice_name": "Kore"}' \
  -o output.pcm

Response: Raw PCM audio stream (LINEAR16, 24kHz, mono).

Convert to a playable format with ffmpeg:

ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.mp3

ElevenLabs TTS (High Quality)

Industry-leading audio quality. Supports 70+ languages.

Request:

POST /v2/extend/tts/elevenlabs/synthesize

Parameters:

Parameter	Type	Required	Description
text	string	Yes	Text to synthesize (≤ 10,000 characters)
voice_id	string	Yes	Voice ID
language_code	string	No	Language code
output_format	string	No	Output format

Example:

curl https://tokenhub.piegateway.me/v2/extend/tts/elevenlabs/synthesize \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice_id": "21m00Tcm4TlvDq8ikWAM"}' \
  -o output.mp3

PreviousVideo Generation Next Speech-to-Text (ASR)