Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.enconvo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

EnConvo integrates with 13+ Text-to-Speech providers, giving you access to hundreds of natural-sounding voices across dozens of languages. Whether you want to listen to AI responses, read selected text aloud, or convert documents to audio files, EnConvo makes it seamless.

Supported TTS Providers

OpenAI TTS

High-quality voices (alloy, echo, fable, onyx, nova, shimmer) with tts-1 and tts-1-hd models

ElevenLabs

Industry-leading voice synthesis with hundreds of voices and multiple models including flash v2.5

Microsoft Azure TTS

Enterprise-grade TTS with multilingual neural voices like EmmaMultilingualNeural

Edge TTS

Free Microsoft Edge voices — no API key required

Google Cloud TTS

Google’s neural TTS with WaveNet and Neural2 voices

Gemini TTS

Google Gemini-powered TTS with style control and multi-speaker support

MiniMax TTS

High-quality multilingual speech with turbo and HD models

xAI TTS

Expressive voices (Eve, Ara, Rex, Sal, Leo) with speech tag support for 20+ languages

Speechify TTS

Simba model family — Base, English, Multilingual, and Turbo variants

Straico TTS

Access ElevenLabs and OpenAI TTS through a single Straico API key

macOS System TTS

Built-in macOS say command — works offline with system voices

Kokoro TTS (Local)

Fully local, privacy-first TTS powered by the Kokoro model via FluidAudio

Provider Comparison

ProviderQualitySpeedOfflineFree TierLanguages
OpenAI TTSExcellentFastNoNo50+
ElevenLabsExcellentFastNoLimited29+
Microsoft AzureExcellentFastNoFree via Enconvo Cloud140+
Edge TTSGoodFastNoYes (free)90+
Google CloudExcellentFastNoNo40+
Gemini TTSExcellentMediumNoNo24+
MiniMaxExcellentFastNoNo17+
xAI TTSGoodFastNoNo20+
SpeechifyGoodFastNoNo20+
macOS SystemBasicFastYesYes (free)30+
Kokoro (Local)GoodFastYesYes (free)Multiple

Getting Started

1

Open Settings

Navigate to the TTS command settings in EnConvo. You can find TTS configurations under the TTS extension.
2

Choose a TTS Provider

Select your preferred provider from the TTS Provider dropdown. Options include cloud providers (OpenAI, ElevenLabs, Azure) and free options (Edge TTS, System TTS).
3

Configure Credentials

For cloud providers, set up your API key through the Credential Provider setting. If you are on the Enconvo Cloud Plan, Microsoft TTS and MiniMax TTS are available without your own API key.
4

Select a Voice

Each provider offers different voices. Browse the Voice dropdown to find one that fits your needs. You can preview voices using the built-in preview feature.
5

Adjust Speed

Set your preferred speech speed from 0.25x to 4x using the Speech Speed dropdown. The default is 1.2x.

Usage Methods

Read Aloud

The most common way to use TTS — have EnConvo read text aloud in real time.
  1. Select any text on your screen
  2. Trigger the Read Aloud command from the SmartBar or PopBar
  3. EnConvo streams the audio as it generates, so playback starts almost immediately
You can also click the speaker icon on any AI chat response to read it aloud.

Text to Audio File

Convert text into a saved audio file for later use.
  1. Open the TTS (Text To Speech) command
  2. Enter or paste the text you want to convert
  3. The audio file is saved to your specified output directory
  4. Use the Show in Finder or Save As actions to manage the file

Gemini TTS (Style-Controlled)

Gemini TTS offers unique style-controlled speech generation. Single Speaker:
Read aloud with a dramatic flair: It was a dark and stormy night..
Multi-Speaker (dialog):
Read aloud in a warm, welcoming tone
Speaker 1: Hello! We're excited to show you our native speech capabilities
Speaker 2: Where you can direct a voice, create realistic dialog, and so much more.
Gemini TTS supports models like Gemini 2.5 Flash Preview TTS and Gemini 2.5 Pro Preview TTS, with voice options such as Zephyr, Puck, and more.

Convert SRT Subtitles to Audio

Convert subtitle files to audio:
  1. Provide your .srt subtitle file
  2. EnConvo processes each subtitle segment with the configured TTS provider
  3. Outputs a synchronized audio file

Voice Configuration by Provider

Models: tts-1 (fast), tts-1-hd (high quality)Voices:
VoiceCharacter
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, storytelling
onyxDeep, authoritative
novaFriendly, upbeat
shimmerClear, gentle
You can also add custom voices if available through the OpenAI API.

Speed Settings

All providers support adjustable speech speed:
SpeedUse Case
0.5x — 0.75xCareful listening, language learning
1.0xNormal speaking pace
1.2x (default)Slightly faster, efficient listening
1.5x — 2.0xSpeed listening, familiar content
2.5x — 4.0xScanning content, experienced listeners

Offline TTS Options

For privacy-first or no-internet scenarios, EnConvo offers two fully offline options:
Uses the built-in macOS say command. No downloads required — works with any voice installed in System Settings > Accessibility > Spoken Content > System Voice.
  • Supports M4A output format
  • Adjustable speed
  • Works completely offline
  • Quality varies by voice; download enhanced voices for better quality
A local neural TTS model bundled with EnConvo’s FluidAudio framework. Runs entirely on your Mac using Apple Silicon acceleration.
  • High-quality neural speech
  • No internet connection required
  • No API key needed
  • Your text never leaves your device

Using TTS in Chat

EnConvo integrates TTS directly into the AI chat experience:
  1. Manual playback: Click the speaker icon on any AI response to hear it read aloud
  2. Auto-TTS: Enable automatic TTS in chat settings to have every response read aloud
  3. Playback controls: Pause, resume, or stop playback at any time

Using TTS with Translation

The Translate command can automatically play TTS audio of the translated result:
  1. Open the Translator command settings
  2. Enable Automatically Play TTS Audio under the Text-to-Speech group
  3. Select your preferred TTS provider
  4. Every translation result is now read aloud automatically

Sound Effects Generation

EnConvo can also generate sound effects from text descriptions:
  1. Use the Text To Sound Effect command
  2. Describe the sound you want in English (up to 200 characters)
  3. The more detailed the description, the better the result
A gentle rain falling on a tin roof with distant thunder

Troubleshooting

  1. Check that your Mac’s audio output is working (System Settings > Sound)
  2. Verify the TTS provider is configured with valid credentials
  3. Try switching to Edge TTS or System TTS to rule out API issues
  4. Check the console logs for error messages
  1. Switch to a higher-quality provider like ElevenLabs or OpenAI TTS HD
  2. If using OpenAI, switch from tts-1 to tts-1-hd
  3. Adjust the speed — very high speeds can reduce quality
  4. Try a different voice; some voices perform better than others
  1. Use a turbo/flash model variant when available (e.g., ElevenLabs flash v2.5, MiniMax turbo)
  2. Edge TTS and System TTS are typically the fastest options
  3. Check your internet connection for cloud providers
  4. Try reducing the text length for faster initial playback
  1. For xAI TTS, explicitly set the language instead of using Auto Detect
  2. Use a multilingual voice like Azure’s EmmaMultilingualNeural
  3. Ensure the text language matches the voice’s supported languages

Enconvo Cloud Plan

The Enconvo Cloud Plan includes built-in access to several TTS providers without needing your own API keys:
ProviderVia Cloud Plan
Microsoft Azure TTSIncluded
MiniMax TTSIncluded
xAI TTSIncluded
Cloud Plan TTS usage consumes your Enconvo points. Check Settings > Usage for your current balance.

Dictation

Convert speech to text

AI Chat

Chat with AI and listen to responses

Translation

Translate and listen to results

Speech Recognition

Advanced speech-to-text providers