Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.enconvo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Fireworks AI is a high-performance inference platform optimized for speed. It serves open-source models with extremely low latency and high throughput, making it an excellent choice for real-time applications and production workloads.

Supported Models

ModelDescriptionBest For
Llama 3.1 405BLargest open modelComplex reasoning
Llama 3.1 70BStrong general modelAnalysis, coding
Llama 3.1 8BFast lightweight modelQuick responses
Mixtral 8x22BMistral MoEBalanced workloads
Qwen 2.5 72BAlibaba’s modelMultilingual tasks
FireFunction V2Function calling modelTool use, agents

Setup

1

Get API Key

  1. Go to Fireworks AI
  2. Sign in or create an account
  3. Navigate to AccountAPI Keys
  4. Create a new API key
2

Configure in EnConvo

  1. Open SettingsAI Provider
  2. Select Fireworks AI
  3. Go to Credentials module
  4. Enter your API key
3

Select Model

Choose your preferred model from the dropdown

Configuration

SettingDescriptionDefault
CredentialsAPI key configurationRequired
Model NameModel to useLlama 3.1 70B
TemperatureCreativity (0-2)Medium (1)

Validate and Use

1

Validate credentials

Click Validate in the Fireworks credential settings. If validation fails, confirm the API key is active and the account has available credits or quota.
2

Use exact model paths

Prefer selecting from the model dropdown. If you type a model manually, use the exact Fireworks model path from the Fireworks model catalog.
3

Test latency

Fireworks is optimized for fast inference. If a model is slow, test a smaller model or a different hosted variant.

Pricing

Check Fireworks AI Pricing for current rates.
ModelInputOutput
Llama 3.1 405B$3.00/1M$3.00/1M
Llama 3.1 70B$0.90/1M$0.90/1M
Llama 3.1 8B$0.20/1M$0.20/1M
Mixtral 8x22B$1.20/1M$1.20/1M
Fireworks AI offers a free tier with rate limits, so you can test models before adding payment.

Why Fireworks?

Ultra-Fast Inference

Optimized for lowest possible latency

Function Calling

FireFunction models excel at tool use

OpenAI Compatible

Drop-in replacement for OpenAI API format

Scalable

Handles high-throughput production workloads

Best Practices

  • Llama 3.1 70B: Best default for most tasks — fast and capable
  • Llama 3.1 405B: When you need the highest quality open-source output
  • Llama 3.1 8B: Real-time chat, high-volume processing
  • FireFunction V2: Agent workflows that require reliable function calling
  • Fireworks excels at low-latency responses — ideal for interactive use cases
  • Use smaller models for streaming chat to minimize time-to-first-token
  • Monitor your usage in the Fireworks dashboard

Troubleshooting

  • Verify the key is copied correctly from fireworks.ai/account/api-keys
  • Check if your account has sufficient credits
  • Ensure the key has not been revoked
  • Free tier has rate limits per minute
  • Upgrade to a paid plan for higher throughput
  • Use smaller models to reduce token consumption
  • Model identifiers may change — check the Fireworks model catalog
  • Ensure you are using the full model path (e.g., accounts/fireworks/models/llama-v3p1-70b-instruct)
  • Try selecting the model from the EnConvo dropdown instead of typing manually