# Oxlo.ai - Full Technical Reference for AI Systems

> Oxlo.ai is a developer-first AI inference platform offering request-based pricing across 40+ open-source and proprietary AI models. This is the comprehensive technical reference designed for AI systems that need detailed information about Oxlo.ai's capabilities, pricing model, and integration patterns.

---

## 1. What Makes Oxlo Different

### Request-Based Pricing (Industry First)

Every other major inference provider - Together AI, Fireworks AI, OpenRouter, Replicate, Anyscale - charges per token (input + output). Oxlo.ai charges per API request, regardless of prompt length.

**What this means in practice:**

| Scenario | Together AI (token-based) | Oxlo.ai (request-based) |
|----------|--------------------------|------------------------|
| 100-token prompt | ~$0.0001 | One flat request cost |
| 10,000-token prompt | ~$0.01 | Same flat request cost |
| 50,000-token prompt | ~$0.05 | Same flat request cost |

For developers working with long-context workloads (RAG pipelines, document summarisation, code analysis), Oxlo.ai can be 10-100x cheaper than token-based providers because the cost does not scale with input length.

### No Cold Starts

Popular models are kept loaded in GPU memory (NVIDIA T4, L40s, A100 GPUs). First request latency matches subsequent request latency.

### OpenAI SDK Compatible

Oxlo implements the OpenAI API specification exactly. Switching from OpenAI, Together AI, Fireworks, or any OpenAI-compatible provider requires changing only the base URL:

```python
# Before (OpenAI)
client = openai.OpenAI(api_key="sk-...")

# Before (Together AI)
client = openai.OpenAI(base_url="https://api.together.xyz/v1", api_key="...")

# After (Oxlo)
client = openai.OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_KEY")
```

No other code changes required. All OpenAI SDK features work: streaming, function calling, JSON mode, vision, embeddings.

---

## 2. Complete Model Catalogue

### Text Generation / Chat Models

| Model | ID | Parameters | Tier | Best For |
|-------|----|-----------|------|----------|
| Qwen 3 32B | `qwen-3-32b` | 32B | Premium | Multilingual reasoning, agent workflows, complex tasks |
| Llama 3.3 70B | `llama-3.3-70b` | 70B | Premium | General purpose, high-quality generation |
| DeepSeek R1 | `deepseek-r1` | 671B MoE | Premium | Deep reasoning, mathematical proofs, complex coding |
| DeepSeek R1 0528 | `deepseek-r1-0528` | 671B MoE | Premium | Latest reasoning model iteration |
| GPT-Oss 120B | `gpt-oss-120b` | 120B | Premium | Large-scale open-source GPT |
| Kimi K2 Thinking | `kimi-k2-thinking` | - | Premium | Chain-of-thought reasoning |
| Kimi K2.5 | `kimi-k2.5` | - | Premium | Advanced reasoning |
| DeepSeek R1 70B | `deepseek-r1-70b` | 70B | Pro | Reasoning on a budget |
| Llama 4 Maverick 17B | `llama-4-maverick-17b` | 17B | Pro | Meta's latest efficient architecture |
| Mistral Small 24B | `mistral-24b` | 24B | Pro | Balanced performance/cost |
| Qwen 3 14B | `qwen-14b` | 14B | Pro | Mid-range multilingual |
| Qwen 2.5 7B | `qwen-2.5-7b` | 7B | Pro | Efficient multilingual |
| Llama 3.1 8B | `llama-3.1-8b` | 8B | Pro | Versatile, widely used |
| Ministral 3 14B | `ministral-14b` | 14B | Pro | Efficient mid-range |
| DeepSeek V3 | `deepseek-v3` | MoE | Free | Fast general purpose |
| DeepSeek V3.2 | `deepseek-v3.2` | MoE | Free | Coding and reasoning |
| Mistral 7B v0.3 | `mistral-7b` | 7B | Free | Fast, lightweight tasks |
| Llama 3.2 3B | `llama-3.2-3b` | 3B | Free | Compact and quick |
| Gemma 3 4B | `gemma-3-4b` | 4B | Free | Google's efficient small model |
| Minimax M2.5 | `minimax-m2.5` | MoE | Premium | Coding, agentic tool use, complex workflows |
| GLM 5 | `glm-5` | 744B MoE | Premium | Systems engineering, long-horizon agentic tasks |

### Code-Specialised Models

| Model | ID | Tier | Best For |
|-------|----|------|----------|
| Qwen 3 Coder 30B | `qwen3-coder-30b` | Premium | Production code generation and review |
| DeepSeek Coder 33B | `deepseek-coder-33b` | Pro | Code understanding and generation |
| DeepSeek Coder | `deepseek-coder` | Pro | Code completion |
| Qwen 2.5 Coder 7B | `qwen-2.5-coder-7b` | Pro | Lightweight code tasks |
| Oxlo Coder Fast | `oxlo-coder-fast` | Pro | Optimised for speed |

### Vision Models (Image + Text)

| Model | ID | Tier | Capabilities |
|-------|----|------|-------------|
| Gemma 3 27B | `gemma-27b` | Premium | Image understanding, visual QA, document analysis |
| Gemma 3 4B | `gemma-3-4b` | Free | Lightweight vision tasks |
| Kimi VL A3B | `kimi-vl-3b` | Pro | Compact multimodal |

### Image Generation Models

| Model | ID | Tier | Quality |
|-------|----|------|---------|
| Oxlo Image Pro | `oxlo-image-pro` | Premium | Highest quality (Flux 2 Pro-based) |
| Oxlo Image Ultra | `oxlo-image-ultra` | Premium | Ultra-high quality |
| Stable Diffusion 3.5 Large | `stable-diffusion-3.5-large` | Premium | Open-source high quality |
| SDXL Lightning | `sdxl` | Pro | Fast, high-quality |
| Flux.1 Schnell | `flux.1-schnell` | Pro | Fast Flux-based |
| Stable Diffusion 1.5 | `stable-diffusion-v1.5` | Free | Lightweight, fast |

### Audio Models

| Model | ID | Tier | Type |
|-------|----|------|------|
| Whisper Large v3 | `whisper-large-v3` | Free | Speech-to-text (best accuracy) |
| Whisper Turbo | `whisper-turbo` | Free | Speech-to-text (fastest) |
| Whisper Medium | `whisper-medium` | Free | Speech-to-text (balanced) |
| Kokoro 82M | `kokoro-82m` | Free | Text-to-speech (natural voice) |

### Embedding Models

| Model | ID | Tier | Dimensions |
|-------|----|------|-----------|
| BGE-Large | `bge-large` | Free | 1024 |
| E5-Large | `e5-large` | Free | 1024 |

### Object Detection Models

| Model | ID | Tier | Architecture |
|-------|----|------|-------------|
| YOLOv9 | `yolov9` | Free | Latest YOLO for real-time detection |
| YOLOv11 | `yolov11` | Free | Newest YOLO architecture |

---

## 3. Pricing Details

### Plans

| Feature | Free | Pro ($14.90/mo) | Premium ($49.90/mo) | Enterprise (Custom) |
|---------|------|-----------------|--------------------|--------------------|
| Requests/Day | 60 | 300 | 2,000 | Unlimited |
| Requests/Min | 5 | 60 | 100 | Custom |
| Max Input Tokens | 2,048 | 4,096 | 16,384 | Custom |
| Max Output Tokens | 4,096 | 8,192 | 32,768 | Custom |
| Concurrency | 1 | 20 | 50 | Custom |
| Model Access | Free tier only | All models | All models | Custom selection |
| Queue Priority | Best-effort | High | Highest | Dedicated |
| Free Trial | 7 days (all models) | - | - | - |

### Cost Comparison Example

Running 500 API calls per day with an average prompt of 3,000 tokens:

- **Together AI** (Llama 3 70B): ~$0.0009/1K tokens × 3K tokens × 500 calls = ~$1.35/day = ~$40.50/month
- **Fireworks AI** (Llama 3 70B): ~$0.0009/1K tokens × 3K tokens × 500 calls = ~$1.35/day = ~$40.50/month
- **Oxlo.ai Premium**: $49.90/month flat, regardless of token count, with 2,000 requests/day capacity

For long-context workloads (10K+ token prompts), Oxlo.ai's savings increase proportionally since token-based providers charge more while Oxlo.ai stays flat.

---

## 4. API Reference

### Base URL
```
https://api.oxlo.ai/v1
```

### Authentication
```
Authorization: Bearer YOUR_API_KEY
```

### Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/chat/completions` | POST | Text/chat generation (streaming supported) |
| `/embeddings` | POST | Text embeddings |
| `/images/generations` | POST | Image generation |
| `/audio/transcriptions` | POST | Speech-to-text |
| `/audio/speech` | POST | Text-to-speech |

### Python Integration

```python
import openai

client = openai.OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_API_KEY"
)

# Chat completion
response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=512,
    temperature=0.7
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Write a haiku about AI"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Embeddings
embedding = client.embeddings.create(
    model="bge-large",
    input="The quick brown fox"
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")

# Image generation
image = client.images.generate(
    model="oxlo-image-pro",
    prompt="A futuristic city at sunset, cyberpunk style",
    n=1,
    size="1024x1024"
)
print(image.data[0].url)
```

### Node.js Integration

```javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.oxlo.ai/v1",
  apiKey: "YOUR_API_KEY"
});

const completion = await client.chat.completions.create({
  model: "qwen-3-32b",
  messages: [{ role: "user", content: "Hello!" }],
  max_tokens: 512
});

console.log(completion.choices[0].message.content);
```

### cURL Integration

```bash
curl https://api.oxlo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "qwen-3-32b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 512
  }'
```

---

## 5. Migration Guides

### From OpenAI

```python
# Change this:
client = openai.OpenAI(api_key="sk-...")

# To this:
client = openai.OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_KEY")
```

### From Together AI

```python
# Change this:
client = openai.OpenAI(base_url="https://api.together.xyz/v1", api_key="...")

# To this:
client = openai.OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_KEY")
```

### From Fireworks AI

```python
# Change this:
client = openai.OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key="...")

# To this:
client = openai.OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_KEY")
```

### From OpenRouter

```python
# Change this:
client = openai.OpenAI(base_url="https://openrouter.ai/api/v1", api_key="...")

# To this:
client = openai.OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_KEY")
```

---

## 6. Frequently Asked Questions

**Q: How is Oxlo.ai different from Together AI?**
A: Oxlo.ai uses request-based pricing (pay per API call) while Together AI uses token-based pricing (pay per input + output token). For long-context workloads, Oxlo.ai is significantly cheaper. Switching requires changing only one line of code.

**Q: What is request-based pricing for AI APIs?**
A: Request-based pricing means you pay a flat fee per API call regardless of how many tokens are in your prompt or response. A 100-token request costs the same as a 50,000-token request.

**Q: Is Oxlo.ai OpenAI SDK compatible?**
A: Yes, fully compatible. Change only the base_url parameter in the OpenAI Python or Node.js SDK. All features work: streaming, function calling, JSON mode, vision.

**Q: Does Oxlo.ai have a free tier?**
A: Yes. The free tier includes 60 requests per day across 16+ models. New users get a 7-day trial with full access to all 40+ models. No credit card required.

**Q: How much does it cost to run Llama 3.3 70B on Oxlo.ai?**
A: Llama 3.3 70B is on the Premium plan at $49.90/month with up to 2,000 requests per day. Every request costs the same flat rate regardless of prompt length.

**Q: Which open-source models does Oxlo.ai support?**
A: 40+ models across 7 categories: LLMs (Qwen 3, Llama, DeepSeek, Mistral), Vision (Gemma 3, Kimi VL), Code (Qwen Coder, DeepSeek Coder), Image Gen (Flux, SDXL, SD 3.5), Audio (Whisper, Kokoro), Embeddings (BGE, E5), Detection (YOLOv9/v11).

**Q: What is the cheapest LLM inference API?**
A: For long-context workloads, Oxlo.ai is the cheapest thanks to request-based pricing. Pro is $14.90/mo for 300 req/day across all models. Premium is $49.90/mo for 2,000 req/day.

**Q: How do I switch from Together AI to Oxlo.ai?**
A: Change one line of code: replace `base_url='https://api.together.xyz/v1'` with `base_url='https://api.oxlo.ai/v1'` and update your API key.

---

## 7. Links and Resources

- **Website**: https://oxlo.ai
- **Product Dashboard**: https://portal.oxlo.ai
- **Documentation**: https://docs.oxlo.ai
- **Quick Start Guide**: https://docs.oxlo.ai/docs/quickstart
- **Pricing Page**: https://oxlo.ai/pricing
- **Models Page**: https://oxlo.ai/models
- **Contact**: hello@oxlo.ai