Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.enconvo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Soniox is a cloud speech platform for speech-to-text, speech translation, and text-to-speech. In EnConvo, Soniox is available as a speech recognition provider for dictation and audio/video transcription. Soniox is useful when you need high-accuracy multilingual transcription, real-time dictation, speaker diarization, and automatic language detection through your own Soniox API key.

Supported Models

ModelModeBest For
Soniox STT Realtime v4Real-timeLow-latency dictation and live speech input
Soniox STT Async v4File-basedAudio/video file transcription and longer recordings
Real-time dictation uses EnConvo’s macOS realtime client with Soniox’s WebSocket STT API. File transcription uses Soniox’s async transcription API.

Setup

1

Create a Soniox API key

  1. Go to the Soniox Console
  2. Sign in or create a Soniox account
  3. Open your project
  4. Create an API key
2

Configure credentials in EnConvo

  1. Open Settings -> Credentials
  2. Select Soniox
  3. Paste your Soniox API key
  4. Validate and save the credential
3

Select Soniox for dictation

  1. Open Settings -> Dictation
  2. Set the dictation provider to Soniox
  3. Choose Soniox STT Realtime v4 for real-time voice input
  4. Set Language to Auto or pick a specific language
4

Use Soniox for file transcription

In the Transcribe Audio/Video files command, select Soniox and use Soniox STT Async v4 for file-based transcription.

Configuration

SettingDescriptionDefault
Credential ProviderSoniox API key stored in EnConvo CredentialsSoniox
ModelRealtime or async STT modelSoniox STT Realtime v4
LanguageAuto detection or a language hintAuto
Speaker DiarizationSeparate speakers when the workflow supports diarizationAvailable
PunctuationAutomatic punctuation in recognized textEnabled

Choosing a Model

Choose Soniox STT Realtime v4 when you want text to appear during live voice input. This is the best fit for Dictation, SmartBar voice input, and short spoken commands.
Choose Soniox STT Async v4 for recorded audio/video files, longer clips, and workflows where EnConvo transcribes after the recording is complete.
Set Language to Auto if you switch languages or do not know the source language. Pick a specific language when the recording is mostly one language and you want more predictable recognition.

Privacy

Soniox is a cloud provider. Audio sent through the Soniox provider is processed by Soniox over the network using your Soniox account and API key. For fully offline transcription, use a local provider such as Parakeet MLX, Qwen MLX, or Whisper MLX instead.

Troubleshooting

  • Verify the key was copied from the Soniox Console without extra spaces
  • Check that the key belongs to the project you want to use
  • Regenerate the key in Soniox and update the EnConvo credential
  • Confirm the selected model is Soniox STT Realtime v4
  • Confirm the provider is Soniox, not another batch transcription provider
  • Restart dictation after changing provider or model settings
  • Set Language to a specific language instead of Auto
  • Check that the selected language is supported by Soniox
  • For mixed-language audio, switch back to Auto
  • Try a common format such as MP3, WAV, FLAC, or M4A
  • Confirm the recording contains clear speech
  • Check your Soniox account usage, quota, and billing status

Speech Recognition

Compare Soniox with other speech-to-text providers.

Dictation

Use Soniox for voice-to-text input.