Speech-to-Text (ASR)
Fast transcription, file-based ASR, and real-time streaming recognition
Fast Speech-to-Text
Best for real-time voice input — the fastest option.
Request:
POST /v2/extend/asr/transcriptionsParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | No | Default volc.bigasr.auc_turbo, or whisper-1 |
| audio_url | string | One of two | Audio file URL |
| audio_data | string | One of two | Base64-encoded audio data |
| enable_itn | boolean | No | Enable number/unit normalization |
| enable_punc | boolean | No | Enable punctuation |
| enable_ddc | boolean | No | Enable disfluency removal |
Example:
curl https://tokenhub.piegateway.me/v2/extend/asr/transcriptions \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "volc.bigasr.auc_turbo",
"audio_url": "https://example.com/audio.mp3"
}'Response:
{
"model": "volc.bigasr.auc_turbo",
"text": "Transcribed text content",
"duration_ms": 5000
}File-Based Transcription (Async)
Upload a complete recording for async transcription. Best for long audio files.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| audio_url | string | Yes | Audio file URL |
| model | string | No | Model name, default volc.bigasr.auc |
| format | string | No | Audio format, default mp3 |
| language | string | No | Language code |
| enable_itn | boolean | No | Enable number/unit normalization |
| enable_punc | boolean | No | Enable punctuation |
| enable_speaker_info | boolean | No | Enable speaker identification |
| show_utterances | boolean | No | Return utterance details |
Submit Task:
POST /v2/extend/asr/tasksExample:
curl https://tokenhub.piegateway.me/v2/extend/asr/tasks \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/long-audio.mp3"
}'Check Result:
GET /v2/extend/asr/tasks/{taskId}Example:
curl https://tokenhub.piegateway.me/v2/extend/asr/tasks/<taskId> \
-H "X-API-Key: <your-api-key>"Real-Time Streaming ASR (WebSocket)
For real-time conversations and voice input with low latency. Send audio chunks over WebSocket and receive recognition results in real time.
Connection URL:
GET /ws/v2/extend/asr/stream?model=volc.bigasr.saucAuthentication: Only HMAC-SHA256 signature authentication is supported, passed via query params:
wss://tokenhub.piegateway.me/ws/v2/extend/asr/stream?model=volc.bigasr.sauc&X-App-Id=<app_id>&X-Timestamp=<timestamp>&X-Nonce=<nonce>&Authorization=HMAC-SHA256 <signature>Communication Protocol:
- After connecting, the client continuously sends binary audio data (PCM 16kHz 16-bit mono)
- The server returns recognition results in real time as JSON
- When finished, the client sends a text message
{"is_last": true}to signal the end
Optional Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | No | Model name, default volc.bigasr.sauc |
| enable_itn | boolean | No | Enable number/unit normalization |
| enable_punc | boolean | No | Enable punctuation |
Response Message Format:
{
"text": "Current recognition result",
"is_final": false,
"utterances": [{ "text": "Utterance 1", "definite": true }]
}