Skip to main content

Overview

Groq provides extremely fast inference using custom LPU hardware. Perfect for real-time applications requiring instant responses.

Supported Models

ModelDescriptionSpeed
Gemma2 9B ITGoogle’s model~700 tok/s
Llama 3.1 70BMeta’s large model~300 tok/s
Llama 3.1 8BMeta’s small model~750 tok/s
Mixtral 8x7BMistral MoE~500 tok/s

Setup

1

Get API Key

  1. Go to Groq Console
  2. Sign in or create an account
  3. Navigate to API Keys
  4. Create a new API key
2

Configure in EnConvo

  1. Open SettingsAI Provider
  2. Select Groq AI
  3. Go to Credentials module
  4. Enter your API key
3

Select Model

Choose your preferred model from the dropdown

Configuration

SettingDescriptionDefault
CredentialsAPI key configurationRequired
Model NameModel to useGemma2 9B IT
TemperatureCreativity (0-2)Medium (1)

Reasoning Effort

For reasoning-capable models (GPT-OSS):
LevelDescription
LowFast reasoning
MediumBalanced
HighThorough

Pricing

Groq offers generous free tiers. Check Groq for current pricing.
Groq’s free tier is great for trying ultra-fast inference!

Why Groq?

Speed

Fastest inference available - 300-750+ tokens/second

Free Tier

Generous free usage for development

Quality Models

Access to Llama, Gemma, Mixtral

Low Latency

Near-instant responses

Best Practices

  • Real-time chat applications
  • Quick iterations during development
  • Time-sensitive tasks
  • Llama 3.1 70B: Best quality
  • Llama 3.1 8B: Fastest
  • Gemma2 9B: Good balance

Troubleshooting

  • Free tier has rate limits
  • Wait and retry, or upgrade