Response Format
For every chunk sent on the WebSocket, the server responds with a JSON message. Users can structure response handling according to their needs. Users can choose to read quick responses with lower accuracy or wait until the server sends larger responses that are highly accurate.
Example Response
Response Fields
type: Message type identifier, set to"transcription"for transcription results.status: Status of the transcription request, typically"success"for valid responses.session_id: Unique identifier for the transcription session.transcript: Partial or complete transcription text for the current segment.is_final: Indicates if this is the final transcription for the current segment.falseindicates a partial/interim transcript;trueindicates a final transcript.is_last: Indicates if this is the last transcription in the session.truewhen the session is complete.
Optional Fields
The following fields may be included in responses under certain conditions:
full_transcript: Complete transcription text accumulated so far. Only included whenfull_transcript=truequery parameter is set ANDis_final=true.language: Detected primary language code. Only returned whenis_final=true.languages: Array of language codes detected in the audio. Only returned whenis_final=true.words: Array of word-level timestamps (only included whenword_timestamps=truein query parameters). Each word object containsword,start,end, andconfidencefields. Whendiarize=true, also includesspeaker(integer ID) andspeaker_confidence(0.0 to 1.0) fields.utterances: Array of sentence-level timestamps (only included whensentence_timestamps=truein query parameters). Each utterance object containstext,start, andendfields. Whendiarize=true, also includesspeaker(integer ID) field.redacted_entities: Array of redacted entity placeholders (only included whenredact_pii=trueorredact_pci=true). Examples:[FIRSTNAME_1],[CREDITCARDCVV_1].
Handling Responses
We maintain an internal server-side buffer that collects chunked audio sent by the user. Once this buffer reaches a specific size, the server sends a special response with the is_final parameter set to true that contains the transcription of user audio collected since the last such response.
is_final = true
We recommend processing responses of this kind for optimal transcription accuracy. The internal buffer size is calibrated to optimize response times and accuracy.
- Additionally, the
languagefield is set to the specified language, or the detected language if the language parameter is set tomulti. Other responses will not include thelanguagefield. - The
full_transcriptis non-empty if the user sends the end token{"type":"end"}to signal end of session.
is_final = false
These are interim transcript responses sent for each chunk. They provide quick feedback for low latency use cases.
- These responses may provide inaccurate results for the most recent words. This occurs when the audio for these words is not fully sent to the server in the respective chunk.
The full_transcript field is a feature that requires the full_transcript query parameter to be set to true. Learn more about the Full Transcript feature.
is_last = true
This response is similar to an is_final=true response, but it is the final response received after the user sends the end token {"type":"end"}. When is_last=true, the server has finished processing all audio and the session is complete.
- This is the last response of the live transcription session and contains all the fields of the
is_final=trueresponse.
Do not close the WebSocket connection immediately after sending the end token. Wait for this is_last=true response to ensure all audio has been processed and you receive the complete transcript.

