Overview
EnConvo supports local Large Language Models (LLMs), allowing you to run AI completely on your Mac without sending data to external servers. This is ideal for privacy-sensitive work, offline use, or when you want full control over your AI.Why Use Local LLMs?
Complete Privacy
Your data never leaves your Mac. Process sensitive documents with confidence.
Offline Access
Use AI without internet connection. Perfect for travel or restricted networks.
No Usage Limits
Run unlimited queries without worrying about API costs or rate limits.
Full Control
Choose exactly which models to use and how they’re configured.
Supported Platforms
Ollama
Ollama is the recommended way to run local LLMs on macOS.1
Install Ollama
Download and install from ollama.ai
2
Pull a model
3
Configure EnConvo
Go to Settings → AI Models → Add Local Model → Ollama
4
Select Endpoint
Default:
http://localhost:11434LM Studio
LM Studio provides a GUI for managing and running local models.1
Install LM Studio
Download from lmstudio.ai
2
Download a model
Use LM Studio’s model browser to download a GGUF model
3
Start local server
Click “Start Server” in LM Studio (default port: 1234)
4
Configure EnConvo
Settings → AI Models → Add Local Model → LM Studio
Recommended Models
| Model | Size | Best For | Min RAM |
|---|---|---|---|
| Llama 3 8B | 4.7GB | General purpose | 8GB |
| Llama 3 70B | 40GB | Complex tasks | 64GB |
| Mistral 7B | 4.1GB | Fast responses | 8GB |
| CodeLlama | 4.7GB | Programming | 8GB |
| Phi-2 | 1.7GB | Lightweight | 4GB |
| Gemma 2B | 1.5GB | Ultra-light | 4GB |
Model performance depends on your Mac’s specifications. Apple Silicon Macs with Metal acceleration provide the best experience.
Configuration
Adding a Local Model
- Go to Settings → AI Models
- Click “Add Model” → “Local Model”
- Configure connection:
- Name: Display name in EnConvo
- Provider: Ollama or LM Studio
- Endpoint: Server URL (usually localhost)
- Model: Select from available models
Model Settings
| Setting | Description |
|---|---|
| Context Window | Maximum tokens in context |
| Temperature | Response creativity (0-1) |
| Top P | Sampling parameter |
| GPU Layers | Layers to offload to GPU |
Performance Optimization
Apple Silicon (M1/M2/M3)
Apple Silicon Macs offer excellent local LLM performance:- Metal acceleration: Automatic GPU utilization
- Unified memory: Efficient memory sharing
- Neural Engine: Additional AI acceleration
Tips for Better Performance
Choose appropriate model size
Choose appropriate model size
Use smaller models (7B parameters) for faster responses. Larger models (70B) need more RAM and are slower.
Close other apps
Close other apps
Local LLMs use significant RAM. Close unnecessary applications for better performance.
Use quantized models
Use quantized models
Q4_K_M or Q5_K_M quantized models offer good quality with better speed.
Adjust context window
Adjust context window
Smaller context windows (2048-4096) are faster than maximum settings.
Using Local Models
Set as Default
Make a local model your default:- Settings → AI Models
- Right-click your local model
- Select “Set as Default”
Switch on Demand
Switch models during use:- Click the model name in SmartBar or chat
- Select your local model from the list
Force Local
Use@local prefix to always use local models:
Feature Support
| Feature | Cloud Models | Local Models |
|---|---|---|
| Chat | ✅ | ✅ |
| Context awareness | ✅ | ✅ |
| Knowledge base | ✅ | ✅ |
| Dictation | ✅ | ✅ |
| Web search | ✅ | Limited |
| Image generation | ✅ | Model dependent |
| Code completion | ✅ | ✅ |
Troubleshooting
Model not responding
Model not responding
- Check that Ollama/LM Studio server is running
- Verify the endpoint URL in settings
- Try restarting the local server
- Check system resources (RAM, CPU)
Slow responses
Slow responses
- Use a smaller model
- Reduce context window size
- Close other memory-intensive apps
- Consider using quantized models
Out of memory
Out of memory
- Reduce GPU layers
- Use a smaller model
- Restart the local server
- Increase swap space (not recommended)
Connection refused
Connection refused
- Ensure server is running
- Check firewall settings
- Verify port number matches
- Try localhost vs 127.0.0.1
Best Use Cases
Confidential Work
Legal documents, medical records, financial data
Code Review
Analyze proprietary code without external exposure
Offline Work
Travel, restricted networks, air-gapped systems
Cost Control
Unlimited usage without API costs