Soniox

Overview

Soniox is a cloud speech platform for speech-to-text, speech translation, and text-to-speech. In EnConvo, Soniox is available as a speech recognition provider for dictation and audio/video transcription. Soniox is useful when you need high-accuracy multilingual transcription, real-time dictation, speaker diarization, and automatic language detection through your own Soniox API key.

Supported Models

Model	Mode	Best For
Soniox STT Realtime v5	Real-time	Low-latency dictation and live speech input
Soniox STT Async v5	File-based	Audio/video file transcription and longer recordings

Real-time dictation uses EnConvo’s macOS realtime client with Soniox’s WebSocket STT API. File transcription uses Soniox’s async transcription API.

Setup

Create a Soniox API key

Go to the Soniox Console
Sign in or create a Soniox account
Open your project
Create an API key

Configure credentials in EnConvo

Open Settings -> Credentials
Select Soniox
Paste your Soniox API key
Validate and save the credential

Select Soniox for dictation

Open Settings -> Dictation
Set the dictation provider to Soniox
Choose Soniox STT Realtime v5 for real-time voice input
Set Language to Auto or pick a specific language

Use Soniox for file transcription

In the Transcribe Audio/Video files command, select Soniox and use Soniox STT Async v5 for file-based transcription.

Configuration

Setting	Description	Default
Credential Provider	Soniox API key stored in EnConvo Credentials	Soniox
Model	Realtime or async STT model	Soniox STT Realtime v5
Language	Auto detection or a language hint	Auto
Speaker Diarization	Separate speakers when the workflow supports diarization	Available
Punctuation	Automatic punctuation in recognized text	Enabled

Validate and Use

Validate the API key

Click Validate in the Soniox credential settings. If validation fails, confirm the key belongs to the correct Soniox project and has not been revoked.

Test dictation

For live speech input, run a short dictation test with Soniox STT Realtime v5 before using it in longer sessions.

Test file transcription

For audio or video files, test a short clear recording with Soniox STT Async v5 before processing long recordings.

Choosing a Model

Use Realtime v4 for dictation

Choose Soniox STT Realtime v5 when you want text to appear during live voice input. This is the best fit for Dictation, SmartBar voice input, Chat Window voice input, and short spoken commands.

Use Async v4 for files

Choose Soniox STT Async v5 for recorded audio/video files, longer clips, and workflows where EnConvo transcribes after the recording is complete.

Use Auto language detection for multilingual speech

Set Language to Auto if you switch languages or do not know the source language. Pick a specific language when the recording is mostly one language and you want more predictable recognition.

Privacy

Soniox is a cloud provider. Audio sent through the Soniox provider is processed by Soniox over the network using your Soniox account and API key. For fully offline transcription, use a local provider such as Parakeet MLX, Qwen MLX, or Whisper MLX instead.

Troubleshooting

Invalid API key

Verify the key was copied from the Soniox Console without extra spaces
Check that the key belongs to the project you want to use
Regenerate the key in Soniox and update the EnConvo credential

Dictation is not real-time

Confirm the selected model is Soniox STT Realtime v5
Confirm the provider is Soniox, not another batch transcription provider
Restart dictation after changing provider or model settings
If you still see v4 model names, update EnConvo and reopen the provider settings

Transcription language is wrong

Set Language to a specific language instead of Auto
Check that the selected language is supported by Soniox
For mixed-language audio, switch back to Auto

No transcript from an audio file

Try a common format such as MP3, WAV, FLAC, or M4A
Confirm the recording contains clear speech
Check your Soniox account usage, quota, and billing status

Speech Recognition

Compare Soniox with other speech-to-text providers.

Dictation

Use Soniox for voice-to-text input.

Getting Started

Core Features

AI Capabilities

Providers

Workflows & Extensions

Integrations

Advanced

Configuration

Resources

Overview

Supported Models

Setup

Configuration

Validate and Use

Choosing a Model

Privacy

Troubleshooting

Speech Recognition

Dictation

​Overview

​Supported Models

​Setup

​Configuration

​Validate and Use

​Choosing a Model

​Privacy

​Troubleshooting

​Related

Speech Recognition

Dictation

Overview

Supported Models

Setup

Configuration

Validate and Use

Choosing a Model

Privacy

Troubleshooting

Related