Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.enconvo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LM Studio is a desktop application for running large language models locally on your Mac. It provides a user-friendly GUI for discovering, downloading, and running GGUF models with no API key required. Like Ollama, it keeps all data on your machine for complete privacy.

Supported Models

LM Studio supports any GGUF-format model. Popular choices include:
ModelSizeBest For
Llama 3.1 8B~5 GBGeneral purpose
Llama 3.1 70B~40 GBComplex tasks
Mistral 7B~4 GBFast responses
Phi-3 Mini~2 GBLightweight tasks
CodeLlama 7B~4 GBProgramming
Qwen 2.5 7B~5 GBMultilingual

Setup

1

Install LM Studio

  1. Download from lmstudio.ai
  2. Install the application on your Mac
  3. Launch LM Studio
2

Download a Model

  1. In LM Studio, go to the Discover tab
  2. Search for a model (e.g., “Llama 3.1”)
  3. Click Download on your preferred quantization (Q4_K_M recommended for balance)
  4. Wait for the download to complete
3

Start the Local Server

  1. Go to the Developer tab in LM Studio
  2. Select your downloaded model
  3. Click Start Server
  4. Note the server address (default: http://localhost:1234)
4

Configure in EnConvo

  1. Open SettingsAI Provider
  2. Select LM Studio
  3. Go to Credentials module
  4. Set the endpoint to http://localhost:1234 (or your custom port)
5

Select Model

Choose from your loaded models in the dropdown
No API key is needed — LM Studio runs entirely on your local machine.

Configuration

SettingDescriptionDefault
EndpointLocal server addresshttp://localhost:1234
Model NameCurrently loaded modelAuto-detected
TemperatureCreativity (0-2)Medium (1)

System Requirements

RAMRecommended Models
8 GB7B models (Q4 quantization)
16 GBLarger 7B/13B models
32 GB30B models
64 GB+70B models
Apple Silicon Macs (M1/M2/M3/M4) with Metal acceleration provide significantly better performance than Intel Macs for local LLM inference. GPU VRAM (unified memory) is the key factor for model size.

LM Studio vs Ollama

FeatureLM StudioOllama
InterfaceGUI applicationCommand-line
Model formatGGUFGGUF (auto-managed)
Model discoveryIn-app browserollama pull command
Server controlManual start/stopAuto-starts on install
ConfigurationVisual settingsConfig files

Privacy Benefits

Complete Privacy

All data stays on your Mac

Offline Access

Works without internet after model download

No Usage Limits

Run unlimited queries locally

No Cost

Free to use, no API fees

Troubleshooting

  • Ensure LM Studio’s local server is running (check the Developer tab)
  • Verify the port matches your EnConvo configuration (default: 1234)
  • Check that no other application is using the same port
  • Use a smaller model or higher quantization (Q4_K_M or Q4_K_S)
  • Close other memory-intensive applications
  • Check that Metal GPU acceleration is enabled in LM Studio settings
  • Ensure you have enough RAM for the model size
  • Try a smaller quantization variant
  • Re-download the model if the file may be corrupted
  • Restart LM Studio and try again
  • Make sure a model is loaded and the server is started in LM Studio
  • Refresh the model list in EnConvo settings
  • Check the endpoint URL is correct