Fireworks AI - EnConvo Documentation

Overview

Fireworks AI is a high-performance inference platform optimized for speed. It serves open-source models with extremely low latency and high throughput, making it an excellent choice for real-time applications and production workloads.

Supported Models

Model	Description	Best For
Llama 3.1 405B	Largest open model	Complex reasoning
Llama 3.1 70B	Strong general model	Analysis, coding
Llama 3.1 8B	Fast lightweight model	Quick responses
Mixtral 8x22B	Mistral MoE	Balanced workloads
Qwen 2.5 72B	Alibaba’s model	Multilingual tasks
FireFunction V2	Function calling model	Tool use, agents

Setup

Get API Key

Go to Fireworks AI
Sign in or create an account
Navigate to Account → API Keys
Create a new API key

Configure in EnConvo

Open Settings → AI Provider
Select Fireworks AI
Go to Credentials module
Enter your API key

Select Model

Choose your preferred model from the dropdown

Configuration

Setting	Description	Default
Credentials	API key configuration	Required
Model Name	Model to use	Llama 3.1 70B
Temperature	Creativity (0-2)	Medium (1)

Validate and Use

Validate credentials

Click Validate in the Fireworks credential settings. If validation fails, confirm the API key is active and the account has available credits or quota.

Use exact model paths

Prefer selecting from the model dropdown. If you type a model manually, use the exact Fireworks model path from the Fireworks model catalog.

Test latency

Fireworks is optimized for fast inference. If a model is slow, test a smaller model or a different hosted variant.

Pricing

Check Fireworks AI Pricing for current rates.

Model	Input	Output
Llama 3.1 405B	$3.00/1M	$3.00/1M
Llama 3.1 70B	$0.90/1M	$0.90/1M
Llama 3.1 8B	$0.20/1M	$0.20/1M
Mixtral 8x22B	$1.20/1M	$1.20/1M

Fireworks AI offers a free tier with rate limits, so you can test models before adding payment.

Why Fireworks?

Ultra-Fast Inference

Optimized for lowest possible latency

Function Calling

FireFunction models excel at tool use

OpenAI Compatible

Drop-in replacement for OpenAI API format

Scalable

Handles high-throughput production workloads

Best Practices

Model Selection

Llama 3.1 70B: Best default for most tasks — fast and capable
Llama 3.1 405B: When you need the highest quality open-source output
Llama 3.1 8B: Real-time chat, high-volume processing
FireFunction V2: Agent workflows that require reliable function calling

Performance Tips

Fireworks excels at low-latency responses — ideal for interactive use cases
Use smaller models for streaming chat to minimize time-to-first-token
Monitor your usage in the Fireworks dashboard

Troubleshooting

Invalid API key

Verify the key is copied correctly from fireworks.ai/account/api-keys
Check if your account has sufficient credits
Ensure the key has not been revoked

Rate limits

Free tier has rate limits per minute
Upgrade to a paid plan for higher throughput
Use smaller models to reduce token consumption

Model not found

Model identifiers may change — check the Fireworks model catalog
Ensure you are using the full model path (e.g., accounts/fireworks/models/llama-v3p1-70b-instruct)
Try selecting the model from the EnConvo dropdown instead of typing manually

​Overview

​Supported Models

​Setup

​Configuration

​Validate and Use

​Pricing

​Why Fireworks?

Ultra-Fast Inference

Function Calling

OpenAI Compatible

Scalable

​Best Practices

​Troubleshooting

Overview

Supported Models

Setup

Configuration

Validate and Use

Pricing

Why Fireworks?

Best Practices

Troubleshooting