Local LLM

Overview

EnConvo supports local Large Language Models (LLMs), allowing you to run AI completely on your Mac without sending data to external servers. This is ideal for privacy-sensitive work, offline use, or when you want full control over your AI.

Why Use Local LLMs?

Complete Privacy

Your data never leaves your Mac. Process sensitive documents with confidence.

Offline Access

Use AI without internet connection. Perfect for travel or restricted networks.

No Usage Limits

Run unlimited queries without worrying about API costs or rate limits.

Full Control

Choose exactly which models to use and how they’re configured.

Supported Platforms

Ollama

Ollama is the recommended way to run local LLMs on macOS.

Install Ollama

Download and install from ollama.ai

Pull a model

ollama pull llama3

Configure EnConvo

Go to Settings → AI Models → Add Local Model → Ollama

Select Endpoint

Default: http://localhost:11434

LM Studio

LM Studio provides a GUI for managing and running local models.

Install LM Studio

Download from lmstudio.ai

Download a model

Use LM Studio’s model browser to download a GGUF model

Start local server

Click “Start Server” in LM Studio (default port: 1234)

Configure EnConvo

Settings → AI Models → Add Local Model → LM Studio

Recommended Models

Model	Size	Best For	Min RAM
Llama 3 8B	4.7GB	General purpose	8GB
Llama 3 70B	40GB	Complex tasks	64GB
Mistral 7B	4.1GB	Fast responses	8GB
CodeLlama	4.7GB	Programming	8GB
Phi-2	1.7GB	Lightweight	4GB
Gemma 2B	1.5GB	Ultra-light	4GB

Model performance depends on your Mac’s specifications. Apple Silicon Macs with Metal acceleration provide the best experience.

Configuration

Adding a Local Model

Go to Settings → AI Models
Click “Add Model” → “Local Model”
Configure connection:
- Name: Display name in EnConvo
- Provider: Ollama or LM Studio
- Endpoint: Server URL (usually localhost)
- Model: Select from available models

Model Settings

Setting	Description
Context Window	Maximum tokens in context
Temperature	Response creativity (0-1)
Top P	Sampling parameter
GPU Layers	Layers to offload to GPU

Performance Optimization

Apple Silicon (M1/M2/M3)

Apple Silicon Macs offer excellent local LLM performance:

Metal acceleration: Automatic GPU utilization
Unified memory: Efficient memory sharing
Neural Engine: Additional AI acceleration

Tips for Better Performance

Choose appropriate model size

Use smaller models (7B parameters) for faster responses. Larger models (70B) need more RAM and are slower.

Close other apps

Local LLMs use significant RAM. Close unnecessary applications for better performance.

Use quantized models

Q4_K_M or Q5_K_M quantized models offer good quality with better speed.

Adjust context window

Smaller context windows (2048-4096) are faster than maximum settings.

Using Local Models

Set as Default

Make a local model your default:

Settings → AI Models
Right-click your local model
Select “Set as Default”

Switch on Demand

Switch models during use:

Click the model name in SmartBar or chat
Select your local model from the list

Force Local

Use @local prefix to always use local models:

@local Explain this code snippet

Feature Support

Feature	Cloud Models	Local Models
Chat	✅	✅
Context awareness	✅	✅
Knowledge base	✅	✅
Dictation	✅	✅
Web search	✅	Limited
Image generation	✅	Model dependent
Code completion	✅	✅

Troubleshooting

Model not responding

Check that Ollama/LM Studio server is running
Verify the endpoint URL in settings
Try restarting the local server
Check system resources (RAM, CPU)

Slow responses

Use a smaller model
Reduce context window size
Close other memory-intensive apps
Consider using quantized models

Out of memory

Reduce GPU layers
Use a smaller model
Restart the local server
Increase swap space (not recommended)

Connection refused

Ensure server is running
Check firewall settings
Verify port number matches
Try localhost vs 127.0.0.1

Best Use Cases

Confidential Work

Legal documents, medical records, financial data

Code Review

Analyze proprietary code without external exposure

Offline Work

Travel, restricted networks, air-gapped systems

Cost Control

Unlimited usage without API costs

AI Chat

Chat with local models

Knowledge Base

Local knowledge processing

Getting Started

Core Features

AI Capabilities

Providers

Workflows & Extensions

Configuration

Resources

Overview

Why Use Local LLMs?

Complete Privacy

Offline Access

No Usage Limits

Full Control

Supported Platforms

Ollama

LM Studio

Recommended Models

Configuration

Adding a Local Model

Model Settings

Performance Optimization

Apple Silicon (M1/M2/M3)

Tips for Better Performance

Using Local Models

Set as Default

Switch on Demand

Force Local

Feature Support

Troubleshooting

Best Use Cases

Confidential Work

Code Review

Offline Work

Cost Control

AI Chat

Knowledge Base

Getting Started

Core Features

AI Capabilities

Providers

Workflows & Extensions

Configuration

Resources

​Overview

​Why Use Local LLMs?

Complete Privacy

Offline Access

No Usage Limits

Full Control

​Supported Platforms

​Ollama

​LM Studio

​Recommended Models

​Configuration

​Adding a Local Model

​Model Settings

​Performance Optimization

​Apple Silicon (M1/M2/M3)

​Tips for Better Performance

​Using Local Models

​Set as Default

​Switch on Demand

​Force Local

​Feature Support

​Troubleshooting

​Best Use Cases

Confidential Work

Code Review

Offline Work

Cost Control

​Related Features

AI Chat

Knowledge Base

Overview

Why Use Local LLMs?

Supported Platforms

Ollama

LM Studio

Recommended Models

Configuration

Adding a Local Model

Model Settings

Performance Optimization

Apple Silicon (M1/M2/M3)

Tips for Better Performance

Using Local Models

Set as Default

Switch on Demand

Force Local

Feature Support

Troubleshooting

Best Use Cases

Related Features