Overview
Groq provides extremely fast inference using custom LPU hardware. Perfect for real-time applications requiring instant responses.Supported Models
| Model | Description | Speed |
|---|---|---|
| Gemma2 9B IT | Google’s model | ~700 tok/s |
| Llama 3.1 70B | Meta’s large model | ~300 tok/s |
| Llama 3.1 8B | Meta’s small model | ~750 tok/s |
| Mixtral 8x7B | Mistral MoE | ~500 tok/s |
Setup
1
Get API Key
- Go to Groq Console
- Sign in or create an account
- Navigate to API Keys
- Create a new API key
2
Configure in EnConvo
- Open Settings → AI Provider
- Select Groq AI
- Go to Credentials module
- Enter your API key
3
Select Model
Choose your preferred model from the dropdown
Configuration
| Setting | Description | Default |
|---|---|---|
| Credentials | API key configuration | Required |
| Model Name | Model to use | Gemma2 9B IT |
| Temperature | Creativity (0-2) | Medium (1) |
Reasoning Effort
For reasoning-capable models (GPT-OSS):| Level | Description |
|---|---|
| Low | Fast reasoning |
| Medium | Balanced |
| High | Thorough |
Pricing
Groq offers generous free tiers. Check Groq for current pricing.Groq’s free tier is great for trying ultra-fast inference!
Why Groq?
Speed
Fastest inference available - 300-750+ tokens/second
Free Tier
Generous free usage for development
Quality Models
Access to Llama, Gemma, Mixtral
Low Latency
Near-instant responses
Best Practices
When to use Groq
When to use Groq
- Real-time chat applications
- Quick iterations during development
- Time-sensitive tasks
Model Selection
Model Selection
- Llama 3.1 70B: Best quality
- Llama 3.1 8B: Fastest
- Gemma2 9B: Good balance
Troubleshooting
Rate limits
Rate limits
- Free tier has rate limits
- Wait and retry, or upgrade