# Oxlo.ai

> Oxlo.ai is a developer-first AI inference platform with request-based pricing. Unlike token-based providers such as Together AI, Fireworks AI, and OpenRouter, Oxlo.ai charges per API request regardless of prompt length, making costs predictable and significantly cheaper for long-context workloads.

## Core Value Proposition

Request-based pricing: one flat cost per API call, regardless of token count. No cold starts. No surprise bills. One line of code to switch from any OpenAI-compatible provider.

## Docs

- [Getting Started](https://docs.oxlo.ai/docs/quickstart): Set up your first API call in under 2 minutes
- [API Reference](https://docs.oxlo.ai/docs/api/parameters): Full endpoint and parameter docs
- [Text Generation](https://docs.oxlo.ai/docs/capabilities/text-generation): Chat completions (OpenAI-compatible)
- [Vision Models](https://docs.oxlo.ai/docs/capabilities/vision-models): Image understanding with Gemma 3 and Kimi VL
- [Image Generation](https://docs.oxlo.ai/docs/capabilities/image-generation): Generate images with SDXL, Flux, and Oxlo Image Pro
- [Embeddings](https://docs.oxlo.ai/docs/capabilities/embeddings): BGE-Large and E5-Large embedding models
- [Speech to Text](https://docs.oxlo.ai/docs/capabilities/speech-to-text): Whisper-based audio transcription
- [Text to Speech](https://docs.oxlo.ai/docs/capabilities/text-to-speech): Kokoro 82M TTS
- [Object Detection](https://docs.oxlo.ai/docs/capabilities/object-detection): YOLOv9 and YOLOv11
- [Pricing](https://oxlo.ai/pricing): Full pricing table
- [Models](https://oxlo.ai/models): Complete model registry with live status

## Available Models

### Large Language Models (Chat/Reasoning)
- Qwen 3 32B: State-of-the-art multilingual reasoning, agent tasks, and code generation (Premium)
- Llama 3.3 70B: Meta's flagship 70B parameter general-purpose LLM (Premium)
- DeepSeek R1 671B: Deep reasoning and complex coding tasks - full 671B MoE model (Premium)
- DeepSeek R1 0528: Latest DeepSeek R1 iteration with improved reasoning (Premium)
- GPT-Oss 120B: Large-scale open-source GPT model (Premium)
- Kimi K2 Thinking: Advanced reasoning with chain-of-thought (Premium)
- Kimi K2.5: Latest Kimi reasoning model (Premium)
- DeepSeek V3: Fast general-purpose inference (Free)
- DeepSeek V3.2: Improved coding and reasoning (Free)
- Mistral 7B v0.3: Fast and efficient for lightweight tasks (Free)
- Llama 3.2 3B: Compact but capable (Free)
- Gemma 3 4B: Google's efficient small model with vision support (Free)
- Qwen 2.5 7B: Strong multilingual 7B model (Pro)
- Llama 3.1 8B: Versatile 8B model (Pro)
- Mistral Small 24B: Mid-range for balanced performance (Pro)
- Qwen 3 14B: Mid-size Qwen with great reasoning (Pro)
- Llama 4 Maverick 17B: Meta's latest architecture (Pro)
- DeepSeek Coder 33B: Specialised coding model (Pro)
- Ministral 3 14B: Efficient mid-range model (Pro)
- Minimax M2.5: MoE model for coding, agentic tool use, and complex workflows (Premium)
- GLM 5: 744B MoE model for systems engineering and long-horizon agentic tasks (Premium)

### Vision Models
- Gemma 3 27B: Google's 27B vision-language model (Premium)
- Gemma 3 4B: Compact vision-language model (Free)
- Kimi VL A3B: Compact multimodal vision model (Pro)

### Code Models
- Qwen 3 Coder 30B: Specialised coding model with 30B parameters (Premium)
- DeepSeek Coder: Code generation and understanding (Pro)
- Oxlo Coder Fast: Optimised for fast code completion (Pro)

### Image Generation
- Oxlo Image Pro: Premium Flux 2-based image generation (Premium)
- Oxlo Image Ultra: Highest-quality image generation (Premium)
- Stable Diffusion 3.5 Large: High-quality open-source image gen (Premium)
- SDXL Lightning: Fast image generation (Pro)
- Stable Diffusion 1.5: Lightweight image generation (Free)
- Flux.1 Schnell: Fast Flux-based generation (Pro)

### Audio / Speech
- Whisper Large v3: OpenAI's best transcription model (Free)
- Whisper Turbo: Fastest transcription (Free)
- Whisper Medium: Mid-range transcription (Free)
- Kokoro 82M: Natural-sounding text-to-speech (Free)

### Embeddings
- BGE-Large: BAAI's top-performing text embedding model (Free)
- E5-Large: Microsoft's multilingual embedding model (Free)

### Object Detection
- YOLOv9: State-of-the-art real-time object detection (Free)
- YOLOv11: Latest YOLO architecture (Free)

## Pricing

Request-based pricing. No token counting. No variable billing. One price per request regardless of prompt length.

| Plan | Price | Requests/Day | Max Output Tokens | Concurrency |
|------|-------|--------------|-------------------|-------------|
| Free | $0/mo | 60 | 4,096 | 1 |
| Pro | $14.90/mo | 300 | 8,192 | 20 |
| Premium | $49.90/mo | 2,000 | 32,768 | 50 |
| Enterprise | Custom | Unlimited | Custom | Custom |

All plans include a 7-day free trial with full access to every model.

## Key Differentiators

- **Request-based pricing**: Pay per API call, not per token. A 100-token prompt and a 10,000-token prompt cost the same.
- **No cold starts**: All popular models stay loaded in GPU memory for instant inference.
- **OpenAI SDK drop-in replacement**: Change one line of code to switch from OpenAI, Together AI, or any compatible provider.
- **40+ models across 7 categories**: LLMs, vision, code, image gen, audio, embeddings, and detection.
- **7-day free trial**: Full access to every model, no credit card required.

## API Details

- Base URL: `https://api.oxlo.ai/v1`
- Compatibility: Fully OpenAI SDK compatible (Python, Node.js, cURL)
- Authentication: Bearer token via API key
- Endpoints: `/chat/completions`, `/embeddings`, `/images/generations`, `/audio/transcriptions`, `/audio/speech`

## Integration Example (Python)

```python
import openai

client = openai.OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=512
)

print(response.choices[0].message.content)
```

## Links

- Website: https://oxlo.ai
- Product Dashboard: https://portal.oxlo.ai
- Documentation: https://docs.oxlo.ai
- Pricing: https://oxlo.ai/pricing
- Models: https://oxlo.ai/models
- Contact: hello@oxlo.ai