Speech-to-Text (ASR)

Fast transcription, file-based ASR, and real-time streaming recognition

Fast Speech-to-Text

Best for real-time voice input — the fastest option.

Request:

POST /v2/extend/asr/transcriptions

Parameters:

Parameter	Type	Required	Description
model	string	No	Default volc.bigasr.auc_turbo, or whisper-1
audio_url	string	One of two	Audio file URL
audio_data	string	One of two	Base64-encoded audio data
enable_itn	boolean	No	Enable number/unit normalization
enable_punc	boolean	No	Enable punctuation
enable_ddc	boolean	No	Enable disfluency removal

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/transcriptions \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "volc.bigasr.auc_turbo",
    "audio_url": "https://example.com/audio.mp3"
  }'

Response:

{
  "model": "volc.bigasr.auc_turbo",
  "text": "Transcribed text content",
  "duration_ms": 5000
}

File-Based Transcription (Async)

Upload a complete recording for async transcription. Best for long audio files.

Parameters:

Parameter	Type	Required	Description
audio_url	string	Yes	Audio file URL
model	string	No	Model name, default volc.bigasr.auc
format	string	No	Audio format, default mp3
language	string	No	Language code
enable_itn	boolean	No	Enable number/unit normalization
enable_punc	boolean	No	Enable punctuation
enable_speaker_info	boolean	No	Enable speaker identification
show_utterances	boolean	No	Return utterance details

Submit Task:

POST /v2/extend/asr/tasks

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/tasks \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/long-audio.mp3"
  }'

Check Result:

GET /v2/extend/asr/tasks/{taskId}

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/tasks/<taskId> \
  -H "X-API-Key: <your-api-key>"

Real-Time Streaming ASR (WebSocket)

For real-time conversations and voice input with low latency. Send audio chunks over WebSocket and receive recognition results in real time.

Connection URL:

GET /ws/v2/extend/asr/stream?model=volc.bigasr.sauc

Authentication: Only HMAC-SHA256 signature authentication is supported, passed via query params:

wss://tokenhub.piegateway.me/ws/v2/extend/asr/stream?model=volc.bigasr.sauc&X-App-Id=<app_id>&X-Timestamp=<timestamp>&X-Nonce=<nonce>&Authorization=HMAC-SHA256 <signature>

Communication Protocol:

After connecting, the client continuously sends binary audio data (PCM 16kHz 16-bit mono)
The server returns recognition results in real time as JSON
When finished, the client sends a text message {"is_last": true} to signal the end

Optional Query Parameters:

Parameter	Type	Required	Description
model	string	No	Model name, default volc.bigasr.sauc
enable_itn	boolean	No	Enable number/unit normalization
enable_punc	boolean	No	Enable punctuation

Response Message Format:

{
  "text": "Current recognition result",
  "is_final": false,
  "utterances": [{ "text": "Utterance 1", "definite": true }]
}

PreviousText-to-Speech (TTS)Next Search