Groq - EnConvo Documentation

Overview

Groq provides extremely fast inference using custom LPU hardware. Perfect for real-time applications requiring instant responses.

Supported Models

Model	Description	Speed
Gemma2 9B IT	Google’s model	~700 tok/s
Llama 3.1 70B	Meta’s large model	~300 tok/s
Llama 3.1 8B	Meta’s small model	~750 tok/s
Mixtral 8x7B	Mistral MoE	~500 tok/s

Setup

Get API Key

Go to Groq Console
Sign in or create an account
Navigate to API Keys
Create a new API key

Configure in EnConvo

Open Settings → AI Provider
Select Groq AI
Go to Credentials module
Enter your API key

Select Model

Choose your preferred model from the dropdown

Configuration

Setting	Description	Default
Credentials	API key configuration	Required
Model Name	Model to use	Gemma2 9B IT
Temperature	Creativity (0-2)	Medium (1)

Validate and Use

Validate credentials

Click Validate in the Groq credential settings. If validation fails, confirm your API key is active and your Groq account has quota available.

Pick for speed

Groq is best when low latency matters. Start with a small or medium model for quick interactive chat.

Watch rate limits

If responses fail during bursts, wait briefly or switch to a smaller model. Free-tier rate limits can be strict.

Reasoning Effort

For reasoning-capable models (GPT-OSS):

Level	Description
Low	Fast reasoning
Medium	Balanced
High	Thorough

Pricing

Groq offers generous free tiers. Check Groq for current pricing.

Groq’s free tier is great for trying ultra-fast inference!

Why Groq?

Speed

Fastest inference available - 300-750+ tokens/second

Free Tier

Generous free usage for development

Quality Models

Access to Llama, Gemma, Mixtral

Low Latency

Near-instant responses

Best Practices

When to use Groq

Real-time chat applications
Quick iterations during development
Time-sensitive tasks

Model Selection

Llama 3.1 70B: Best quality
Llama 3.1 8B: Fastest
Gemma2 9B: Good balance

Troubleshooting

Rate limits

Free tier has rate limits
Wait and retry, or upgrade

DeepSeek Mistral AI

Documentation Index

​Overview

​Supported Models

​Setup

​Configuration

​Validate and Use

​Reasoning Effort

​Pricing

​Why Groq?

Speed

Free Tier

Quality Models

Low Latency

​Best Practices

​Troubleshooting

Overview

Supported Models

Setup

Configuration

Validate and Use

Reasoning Effort

Pricing

Why Groq?

Best Practices

Troubleshooting