# Oxlo.ai > Oxlo.ai is a developer-first AI inference platform with request-based pricing. Unlike token-based providers such as Together AI, Fireworks AI, and OpenRouter, Oxlo.ai charges per API request regardless of prompt length, making costs predictable and significantly cheaper for long-context workloads. ## Core Value Proposition Request-based pricing: one flat cost per API call, regardless of token count. No cold starts. No surprise bills. One line of code to switch from any OpenAI-compatible provider. ## Docs - [Getting Started](https://docs.oxlo.ai/docs/quickstart): Set up your first API call in under 2 minutes - [API Reference](https://docs.oxlo.ai/docs/api/parameters): Full endpoint and parameter docs - [Text Generation](https://docs.oxlo.ai/docs/capabilities/text-generation): Chat completions (OpenAI-compatible) - [Vision Models](https://docs.oxlo.ai/docs/capabilities/vision-models): Image understanding with Gemma 3 and Kimi VL - [Image Generation](https://docs.oxlo.ai/docs/capabilities/image-generation): Generate images with SDXL, Flux, and Oxlo Image Pro - [Embeddings](https://docs.oxlo.ai/docs/capabilities/embeddings): BGE-Large and E5-Large embedding models - [Speech to Text](https://docs.oxlo.ai/docs/capabilities/speech-to-text): Whisper-based audio transcription - [Text to Speech](https://docs.oxlo.ai/docs/capabilities/text-to-speech): Kokoro 82M TTS - [Object Detection](https://docs.oxlo.ai/docs/capabilities/object-detection): YOLOv9 and YOLOv11 - [Pricing](https://oxlo.ai/pricing): Full pricing table - [Models](https://oxlo.ai/models): Complete model registry with live status ## Available Models ### Large Language Models (Chat/Reasoning) - Qwen 3 32B: State-of-the-art multilingual reasoning, agent tasks, and code generation (Premium) - Llama 3.3 70B: Meta's flagship 70B parameter general-purpose LLM (Premium) - DeepSeek R1 671B: Deep reasoning and complex coding tasks - full 671B MoE model (Premium) - DeepSeek R1 0528: Latest DeepSeek R1 iteration with improved reasoning (Premium) - GPT-Oss 120B: Large-scale open-source GPT model (Premium) - Kimi K2 Thinking: Advanced reasoning with chain-of-thought (Premium) - Kimi K2.5: Latest Kimi reasoning model (Premium) - DeepSeek V3: Fast general-purpose inference (Free) - DeepSeek V3.2: Improved coding and reasoning (Free) - Mistral 7B v0.3: Fast and efficient for lightweight tasks (Free) - Llama 3.2 3B: Compact but capable (Free) - Gemma 3 4B: Google's efficient small model with vision support (Free) - Qwen 2.5 7B: Strong multilingual 7B model (Pro) - Llama 3.1 8B: Versatile 8B model (Pro) - Mistral Small 24B: Mid-range for balanced performance (Pro) - Qwen 3 14B: Mid-size Qwen with great reasoning (Pro) - Llama 4 Maverick 17B: Meta's latest architecture (Pro) - DeepSeek Coder 33B: Specialised coding model (Pro) - Ministral 3 14B: Efficient mid-range model (Pro) - Minimax M2.5: MoE model for coding, agentic tool use, and complex workflows (Premium) - GLM 5: 744B MoE model for systems engineering and long-horizon agentic tasks (Premium) ### Vision Models - Gemma 3 27B: Google's 27B vision-language model (Premium) - Gemma 3 4B: Compact vision-language model (Free) - Kimi VL A3B: Compact multimodal vision model (Pro) ### Code Models - Qwen 3 Coder 30B: Specialised coding model with 30B parameters (Premium) - DeepSeek Coder: Code generation and understanding (Pro) - Oxlo Coder Fast: Optimised for fast code completion (Pro) ### Image Generation - Oxlo Image Pro: Premium Flux 2-based image generation (Premium) - Oxlo Image Ultra: Highest-quality image generation (Premium) - Stable Diffusion 3.5 Large: High-quality open-source image gen (Premium) - SDXL Lightning: Fast image generation (Pro) - Stable Diffusion 1.5: Lightweight image generation (Free) - Flux.1 Schnell: Fast Flux-based generation (Pro) ### Audio / Speech - Whisper Large v3: OpenAI's best transcription model (Free) - Whisper Turbo: Fastest transcription (Free) - Whisper Medium: Mid-range transcription (Free) - Kokoro 82M: Natural-sounding text-to-speech (Free) ### Embeddings - BGE-Large: BAAI's top-performing text embedding model (Free) - E5-Large: Microsoft's multilingual embedding model (Free) ### Object Detection - YOLOv9: State-of-the-art real-time object detection (Free) - YOLOv11: Latest YOLO architecture (Free) ## Pricing Request-based pricing. No token counting. No variable billing. One price per request regardless of prompt length. | Plan | Price | Requests/Day | Max Output Tokens | Concurrency | |------|-------|--------------|-------------------|-------------| | Free | $0/mo | 60 | 4,096 | 1 | | Pro | $14.90/mo | 300 | 8,192 | 20 | | Premium | $49.90/mo | 2,000 | 32,768 | 50 | | Enterprise | Custom | Unlimited | Custom | Custom | All plans include a 7-day free trial with full access to every model. ## Key Differentiators - **Request-based pricing**: Pay per API call, not per token. A 100-token prompt and a 10,000-token prompt cost the same. - **No cold starts**: All popular models stay loaded in GPU memory for instant inference. - **OpenAI SDK drop-in replacement**: Change one line of code to switch from OpenAI, Together AI, or any compatible provider. - **40+ models across 7 categories**: LLMs, vision, code, image gen, audio, embeddings, and detection. - **7-day free trial**: Full access to every model, no credit card required. ## API Details - Base URL: `https://api.oxlo.ai/v1` - Compatibility: Fully OpenAI SDK compatible (Python, Node.js, cURL) - Authentication: Bearer token via API key - Endpoints: `/chat/completions`, `/embeddings`, `/images/generations`, `/audio/transcriptions`, `/audio/speech` ## Integration Example (Python) ```python import openai client = openai.OpenAI( base_url="https://api.oxlo.ai/v1", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="qwen-3-32b", messages=[{"role": "user", "content": "Hello!"}], max_tokens=512 ) print(response.choices[0].message.content) ``` ## Links - Website: https://oxlo.ai - Product Dashboard: https://portal.oxlo.ai - Documentation: https://docs.oxlo.ai - Pricing: https://oxlo.ai/pricing - Models: https://oxlo.ai/models - Contact: hello@oxlo.ai