PieBox
Documentation

Speech-to-Text (ASR)

Fast transcription, file-based ASR, and real-time streaming recognition

Fast Speech-to-Text

Best for real-time voice input — the fastest option.

Request:

POST /v2/extend/asr/transcriptions

Parameters:

ParameterTypeRequiredDescription
modelstringNoDefault volc.bigasr.auc_turbo, or whisper-1
audio_urlstringOne of twoAudio file URL
audio_datastringOne of twoBase64-encoded audio data
enable_itnbooleanNoEnable number/unit normalization
enable_puncbooleanNoEnable punctuation
enable_ddcbooleanNoEnable disfluency removal

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/transcriptions \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "volc.bigasr.auc_turbo",
    "audio_url": "https://example.com/audio.mp3"
  }'

Response:

{
  "model": "volc.bigasr.auc_turbo",
  "text": "Transcribed text content",
  "duration_ms": 5000
}

File-Based Transcription (Async)

Upload a complete recording for async transcription. Best for long audio files.

Parameters:

ParameterTypeRequiredDescription
audio_urlstringYesAudio file URL
modelstringNoModel name, default volc.bigasr.auc
formatstringNoAudio format, default mp3
languagestringNoLanguage code
enable_itnbooleanNoEnable number/unit normalization
enable_puncbooleanNoEnable punctuation
enable_speaker_infobooleanNoEnable speaker identification
show_utterancesbooleanNoReturn utterance details

Submit Task:

POST /v2/extend/asr/tasks

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/tasks \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/long-audio.mp3"
  }'

Check Result:

GET /v2/extend/asr/tasks/{taskId}

Example:

curl https://tokenhub.piegateway.me/v2/extend/asr/tasks/<taskId> \
  -H "X-API-Key: <your-api-key>"

Real-Time Streaming ASR (WebSocket)

For real-time conversations and voice input with low latency. Send audio chunks over WebSocket and receive recognition results in real time.

Connection URL:

GET /ws/v2/extend/asr/stream?model=volc.bigasr.sauc

Authentication: Only HMAC-SHA256 signature authentication is supported, passed via query params:

wss://tokenhub.piegateway.me/ws/v2/extend/asr/stream?model=volc.bigasr.sauc&X-App-Id=<app_id>&X-Timestamp=<timestamp>&X-Nonce=<nonce>&Authorization=HMAC-SHA256 <signature>

Communication Protocol:

  1. After connecting, the client continuously sends binary audio data (PCM 16kHz 16-bit mono)
  2. The server returns recognition results in real time as JSON
  3. When finished, the client sends a text message {"is_last": true} to signal the end

Optional Query Parameters:

ParameterTypeRequiredDescription
modelstringNoModel name, default volc.bigasr.sauc
enable_itnbooleanNoEnable number/unit normalization
enable_puncbooleanNoEnable punctuation

Response Message Format:

{
  "text": "Current recognition result",
  "is_final": false,
  "utterances": [{ "text": "Utterance 1", "definite": true }]
}