Text to Speech - EnConvo Documentation

Overview

EnConvo integrates with 13+ Text-to-Speech providers, giving you access to hundreds of natural-sounding voices across dozens of languages. Whether you want to listen to AI responses, read selected text aloud, or convert documents to audio files, EnConvo makes it seamless.

Supported TTS Providers

OpenAI TTS

High-quality voices (alloy, echo, fable, onyx, nova, shimmer) with tts-1 and tts-1-hd models

ElevenLabs

Industry-leading voice synthesis with hundreds of voices and multiple models including flash v2.5

Microsoft Azure TTS

Enterprise-grade TTS with multilingual neural voices like EmmaMultilingualNeural

Edge TTS

Free Microsoft Edge voices — no API key required

Google Cloud TTS

Google’s neural TTS with WaveNet and Neural2 voices

Gemini TTS

Google Gemini-powered TTS with style control and multi-speaker support

MiniMax TTS

High-quality multilingual speech with turbo and HD models

xAI TTS

Expressive voices (Eve, Ara, Rex, Sal, Leo) with speech tag support for 20+ languages

Speechify TTS

Simba model family — Base, English, Multilingual, and Turbo variants

Straico TTS

Access ElevenLabs and OpenAI TTS through a single Straico API key

macOS System TTS

Built-in macOS say command — works offline with system voices

Kokoro TTS (Local)

Fully local, privacy-first TTS powered by the Kokoro model via FluidAudio

Provider Comparison

Provider	Quality	Speed	Offline	Free Tier	Languages
OpenAI TTS	Excellent	Fast	No	No	50+
ElevenLabs	Excellent	Fast	No	Limited	29+
Microsoft Azure	Excellent	Fast	No	Free via Enconvo Cloud	140+
Edge TTS	Good	Fast	No	Yes (free)	90+
Google Cloud	Excellent	Fast	No	No	40+
Gemini TTS	Excellent	Medium	No	No	24+
MiniMax	Excellent	Fast	No	No	17+
xAI TTS	Good	Fast	No	No	20+
Speechify	Good	Fast	No	No	20+
macOS System	Basic	Fast	Yes	Yes (free)	30+
Kokoro (Local)	Good	Fast	Yes	Yes (free)	Multiple

Getting Started

Open Settings

Navigate to the TTS command settings in EnConvo. You can find TTS configurations under the TTS extension.

Choose a TTS Provider

Select your preferred provider from the TTS Provider dropdown. Options include cloud providers (OpenAI, ElevenLabs, Azure) and free options (Edge TTS, System TTS).

Configure Credentials

For cloud providers, set up your API key through the Credential Provider setting. If you are on the Enconvo Cloud Plan, Microsoft TTS and MiniMax TTS are available without your own API key.

Select a Voice

Each provider offers different voices. Browse the Voice dropdown to find one that fits your needs. You can preview voices using the built-in preview feature.

Adjust Speed

Set your preferred speech speed from 0.25x to 4x using the Speech Speed dropdown. The default is 1.2x.

Usage Methods

Read Aloud

The most common way to use TTS — have EnConvo read text aloud in real time.

Select any text on your screen
Trigger the Read Aloud command from the SmartBar or PopBar
EnConvo streams the audio as it generates, so playback starts almost immediately

You can also click the speaker icon on any AI chat response to read it aloud.

Text to Audio File

Convert text into a saved audio file for later use.

Open the TTS (Text To Speech) command
Enter or paste the text you want to convert
The audio file is saved to your specified output directory
Use the Show in Finder or Save As actions to manage the file

Gemini TTS (Style-Controlled)

Gemini TTS offers unique style-controlled speech generation. Single Speaker:

Read aloud with a dramatic flair: It was a dark and stormy night..

Multi-Speaker (dialog):

Read aloud in a warm, welcoming tone
Speaker 1: Hello! We're excited to show you our native speech capabilities
Speaker 2: Where you can direct a voice, create realistic dialog, and so much more.

Gemini TTS supports models like Gemini 2.5 Flash Preview TTS and Gemini 2.5 Pro Preview TTS, with voice options such as Zephyr, Puck, and more.

Convert SRT Subtitles to Audio

Convert subtitle files to audio:

Provide your .srt subtitle file
EnConvo processes each subtitle segment with the configured TTS provider
Outputs a synchronized audio file

Voice Configuration by Provider

OpenAI
ElevenLabs
Microsoft Azure
Edge TTS
xAI

Models: tts-1 (fast), tts-1-hd (high quality)Voices:

Voice	Character
alloy	Neutral, balanced
echo	Warm, conversational
fable	Expressive, storytelling
onyx	Deep, authoritative
nova	Friendly, upbeat
shimmer	Clear, gentle

You can also add custom voices if available through the OpenAI API.

Voices: 400+ neural voices across 140+ languagesDefault: en-US-EmmaMultilingualNeural — a multilingual voice that adapts to the input language automatically.

Azure TTS is available free through the Enconvo Cloud Plan, with no separate API key required.

Voice	Style
Eve	Energetic, upbeat
Ara	Warm, friendly
Rex	Confident, clear
Sal	Smooth, balanced
Leo	Authoritative, strong

Speed Settings

All providers support adjustable speech speed:

Speed	Use Case
0.5x — 0.75x	Careful listening, language learning
1.0x	Normal speaking pace
1.2x (default)	Slightly faster, efficient listening
1.5x — 2.0x	Speed listening, familiar content
2.5x — 4.0x	Scanning content, experienced listeners

Offline TTS Options

For privacy-first or no-internet scenarios, EnConvo offers two fully offline options:

macOS System TTS

Uses the built-in macOS say command. No downloads required — works with any voice installed in System Settings > Accessibility > Spoken Content > System Voice.

Supports M4A output format
Adjustable speed
Works completely offline
Quality varies by voice; download enhanced voices for better quality

Kokoro TTS (via FluidAudio)

A local neural TTS model bundled with EnConvo’s FluidAudio framework. Runs entirely on your Mac using Apple Silicon acceleration.

High-quality neural speech
No internet connection required
No API key needed
Your text never leaves your device

Using TTS in Chat

EnConvo integrates TTS directly into the AI chat experience:

Manual playback: Click the speaker icon on any AI response to hear it read aloud
Auto-TTS: Enable automatic TTS in chat settings to have every response read aloud
Playback controls: Pause, resume, or stop playback at any time

Using TTS with Translation

The Translate command can automatically play TTS audio of the translated result:

Open the Translator command settings
Enable Automatically Play TTS Audio under the Text-to-Speech group
Select your preferred TTS provider
Every translation result is now read aloud automatically

Sound Effects Generation

EnConvo can also generate sound effects from text descriptions:

Use the Text To Sound Effect command
Describe the sound you want in English (up to 200 characters)
The more detailed the description, the better the result

A gentle rain falling on a tin roof with distant thunder

Troubleshooting

No audio playing

Check that your Mac’s audio output is working (System Settings > Sound)
Verify the TTS provider is configured with valid credentials
Try switching to Edge TTS or System TTS to rule out API issues
Check the console logs for error messages

Voice sounds robotic or low quality

Switch to a higher-quality provider like ElevenLabs or OpenAI TTS HD
If using OpenAI, switch from tts-1 to tts-1-hd
Adjust the speed — very high speeds can reduce quality
Try a different voice; some voices perform better than others

TTS is too slow

Use a turbo/flash model variant when available (e.g., ElevenLabs flash v2.5, MiniMax turbo)
Edge TTS and System TTS are typically the fastest options
Check your internet connection for cloud providers
Try reducing the text length for faster initial playback

Wrong language pronunciation

For xAI TTS, explicitly set the language instead of using Auto Detect
Use a multilingual voice like Azure’s EmmaMultilingualNeural
Ensure the text language matches the voice’s supported languages

Enconvo Cloud Plan

The Enconvo Cloud Plan includes built-in access to several TTS providers without needing your own API keys:

Provider	Via Cloud Plan
Microsoft Azure TTS	Included
MiniMax TTS	Included
xAI TTS	Included

Cloud Plan TTS usage consumes your Enconvo points. Check Settings > Usage for your current balance.

Dictation

Convert speech to text

AI Chat

Chat with AI and listen to responses

Translation

Translate and listen to results

Speech Recognition

Advanced speech-to-text providers

​Overview

​Supported TTS Providers

OpenAI TTS

ElevenLabs

Microsoft Azure TTS

Edge TTS

Google Cloud TTS

Gemini TTS

MiniMax TTS

xAI TTS

Speechify TTS

Straico TTS

macOS System TTS

Kokoro TTS (Local)

​Provider Comparison

​Getting Started

​Usage Methods

​Read Aloud

​Text to Audio File

​Gemini TTS (Style-Controlled)

​Convert SRT Subtitles to Audio

​Voice Configuration by Provider

​Speed Settings

​Offline TTS Options

​Using TTS in Chat

​Using TTS with Translation

​Sound Effects Generation

​Troubleshooting

​Enconvo Cloud Plan

​Related Features

Dictation

AI Chat

Translation

Speech Recognition

Overview

Supported TTS Providers

Provider Comparison

Getting Started

Usage Methods

Read Aloud

Text to Audio File

Gemini TTS (Style-Controlled)

Convert SRT Subtitles to Audio

Voice Configuration by Provider

Speed Settings

Offline TTS Options

Using TTS in Chat

Using TTS with Translation

Sound Effects Generation

Troubleshooting

Enconvo Cloud Plan

Related Features