Skip to main content

Overview

EnConvo supports local Large Language Models (LLMs), allowing you to run AI completely on your Mac without sending data to external servers. This is ideal for privacy-sensitive work, offline use, or when you want full control over your AI.

Why Use Local LLMs?

Complete Privacy

Your data never leaves your Mac. Process sensitive documents with confidence.

Offline Access

Use AI without internet connection. Perfect for travel or restricted networks.

No Usage Limits

Run unlimited queries without worrying about API costs or rate limits.

Full Control

Choose exactly which models to use and how they’re configured.

Supported Platforms

Ollama

Ollama is the recommended way to run local LLMs on macOS.
1

Install Ollama

Download and install from ollama.ai
2

Pull a model

ollama pull llama3
3

Configure EnConvo

Go to Settings → AI Models → Add Local Model → Ollama
4

Select Endpoint

Default: http://localhost:11434

LM Studio

LM Studio provides a GUI for managing and running local models.
1

Install LM Studio

Download from lmstudio.ai
2

Download a model

Use LM Studio’s model browser to download a GGUF model
3

Start local server

Click “Start Server” in LM Studio (default port: 1234)
4

Configure EnConvo

Settings → AI Models → Add Local Model → LM Studio
ModelSizeBest ForMin RAM
Llama 3 8B4.7GBGeneral purpose8GB
Llama 3 70B40GBComplex tasks64GB
Mistral 7B4.1GBFast responses8GB
CodeLlama4.7GBProgramming8GB
Phi-21.7GBLightweight4GB
Gemma 2B1.5GBUltra-light4GB
Model performance depends on your Mac’s specifications. Apple Silicon Macs with Metal acceleration provide the best experience.

Configuration

Adding a Local Model

  1. Go to Settings → AI Models
  2. Click “Add Model” → “Local Model”
  3. Configure connection:
    • Name: Display name in EnConvo
    • Provider: Ollama or LM Studio
    • Endpoint: Server URL (usually localhost)
    • Model: Select from available models

Model Settings

SettingDescription
Context WindowMaximum tokens in context
TemperatureResponse creativity (0-1)
Top PSampling parameter
GPU LayersLayers to offload to GPU

Performance Optimization

Apple Silicon (M1/M2/M3)

Apple Silicon Macs offer excellent local LLM performance:
  • Metal acceleration: Automatic GPU utilization
  • Unified memory: Efficient memory sharing
  • Neural Engine: Additional AI acceleration

Tips for Better Performance

Use smaller models (7B parameters) for faster responses. Larger models (70B) need more RAM and are slower.
Local LLMs use significant RAM. Close unnecessary applications for better performance.
Q4_K_M or Q5_K_M quantized models offer good quality with better speed.
Smaller context windows (2048-4096) are faster than maximum settings.

Using Local Models

Set as Default

Make a local model your default:
  1. Settings → AI Models
  2. Right-click your local model
  3. Select “Set as Default”

Switch on Demand

Switch models during use:
  1. Click the model name in SmartBar or chat
  2. Select your local model from the list

Force Local

Use @local prefix to always use local models:
@local Explain this code snippet

Feature Support

FeatureCloud ModelsLocal Models
Chat
Context awareness
Knowledge base
Dictation
Web searchLimited
Image generationModel dependent
Code completion

Troubleshooting

  1. Check that Ollama/LM Studio server is running
  2. Verify the endpoint URL in settings
  3. Try restarting the local server
  4. Check system resources (RAM, CPU)
  1. Use a smaller model
  2. Reduce context window size
  3. Close other memory-intensive apps
  4. Consider using quantized models
  1. Reduce GPU layers
  2. Use a smaller model
  3. Restart the local server
  4. Increase swap space (not recommended)
  1. Ensure server is running
  2. Check firewall settings
  3. Verify port number matches
  4. Try localhost vs 127.0.0.1

Best Use Cases

Confidential Work

Legal documents, medical records, financial data

Code Review

Analyze proprietary code without external exposure

Offline Work

Travel, restricted networks, air-gapped systems

Cost Control

Unlimited usage without API costs