Documentation Index
Fetch the complete documentation index at: https://docs.enconvo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Fireworks AI is a high-performance inference platform optimized for speed. It serves open-source models with extremely low latency and high throughput, making it an excellent choice for real-time applications and production workloads.Supported Models
| Model | Description | Best For |
|---|---|---|
| Llama 3.1 405B | Largest open model | Complex reasoning |
| Llama 3.1 70B | Strong general model | Analysis, coding |
| Llama 3.1 8B | Fast lightweight model | Quick responses |
| Mixtral 8x22B | Mistral MoE | Balanced workloads |
| Qwen 2.5 72B | Alibaba’s model | Multilingual tasks |
| FireFunction V2 | Function calling model | Tool use, agents |
Setup
Get API Key
- Go to Fireworks AI
- Sign in or create an account
- Navigate to Account → API Keys
- Create a new API key
Configure in EnConvo
- Open Settings → AI Provider
- Select Fireworks AI
- Go to Credentials module
- Enter your API key
Configuration
| Setting | Description | Default |
|---|---|---|
| Credentials | API key configuration | Required |
| Model Name | Model to use | Llama 3.1 70B |
| Temperature | Creativity (0-2) | Medium (1) |
Validate and Use
Validate credentials
Click Validate in the Fireworks credential settings. If validation fails, confirm the API key is active and the account has available credits or quota.
Use exact model paths
Prefer selecting from the model dropdown. If you type a model manually, use the exact Fireworks model path from the Fireworks model catalog.
Pricing
Check Fireworks AI Pricing for current rates.| Model | Input | Output |
|---|---|---|
| Llama 3.1 405B | $3.00/1M | $3.00/1M |
| Llama 3.1 70B | $0.90/1M | $0.90/1M |
| Llama 3.1 8B | $0.20/1M | $0.20/1M |
| Mixtral 8x22B | $1.20/1M | $1.20/1M |
Fireworks AI offers a free tier with rate limits, so you can test models before adding payment.
Why Fireworks?
Ultra-Fast Inference
Optimized for lowest possible latency
Function Calling
FireFunction models excel at tool use
OpenAI Compatible
Drop-in replacement for OpenAI API format
Scalable
Handles high-throughput production workloads
Best Practices
Model Selection
Model Selection
- Llama 3.1 70B: Best default for most tasks — fast and capable
- Llama 3.1 405B: When you need the highest quality open-source output
- Llama 3.1 8B: Real-time chat, high-volume processing
- FireFunction V2: Agent workflows that require reliable function calling
Performance Tips
Performance Tips
- Fireworks excels at low-latency responses — ideal for interactive use cases
- Use smaller models for streaming chat to minimize time-to-first-token
- Monitor your usage in the Fireworks dashboard
Troubleshooting
Invalid API key
Invalid API key
- Verify the key is copied correctly from fireworks.ai/account/api-keys
- Check if your account has sufficient credits
- Ensure the key has not been revoked
Rate limits
Rate limits
- Free tier has rate limits per minute
- Upgrade to a paid plan for higher throughput
- Use smaller models to reduce token consumption
Model not found
Model not found
- Model identifiers may change — check the Fireworks model catalog
- Ensure you are using the full model path (e.g.,
accounts/fireworks/models/llama-v3p1-70b-instruct) - Try selecting the model from the EnConvo dropdown instead of typing manually