Chat Completions
All modalities — text, image, video, STT, TTS — route through a single endpoint. The format follows OpenAI's Chat Completions API.
Endpoint
POST https://proxyify.dev/v1/chat/completions
Text (Chat)
{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": false
}
{
"data": {
"choices": [{ "message": { "role": "assistant", "content": "Hi!" } }],
"usage": { "prompt_tokens": 10, "completion_tokens": 4, "total_tokens": 14 }
},
"_balancer": {
"credits_used": 24, "cost_usd": 0.024,
"model_used": "openai/gpt-4o",
"input_tokens": 120, "output_tokens": 80,
"latency_ms": 1240, "cached": false
}
}
Add "stream": true for SSE streaming. See Streaming guide.
Image generation
{
"model": "black-forest-labs/flux-1.1-pro",
"messages": [{ "role": "user", "content": "A sunset over mountains" }],
"modalities": ["image", "text"],
"image_config": {
"aspect_ratio": "16:9", // "1:1" (default), "16:9", "9:16"
"image_size": "1K" // "1K" (default), "2K", "4K"
}
}
{
"data": {
"choices": [{
"message": {
"role": "assistant",
"images": [{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }]
}
}]
},
"_balancer": {
"credits_used": 80, "cost_usd": 0.08,
"images_count": 1, "latency_ms": 3200
}
}
Video generation (async)
Video generation is asynchronous — submit a job, then poll for the result.
{
"model": "kling/kling-video-v3-pro",
"prompt": "A golden retriever on a sunny beach",
"duration": 5,
"resolution": "720p",
"aspect_ratio": "16:9"
}
{
"data": { "id": "abc123", "status": "pending", "polling_url": "/v1/jobs/abc123" },
"_balancer": { "job_id": "abc123", "model_used": "kling/kling-video-v3-pro" }
}
Poll the polling_url with the same Authorization header until status is completed:
{
"data": { "id": "abc123", "status": "completed", "video_url": "https://..." },
"_balancer": {
"credits_used": 672, "cost_usd": 0.672,
"duration_seconds": 5
}
}
Speech-to-Text (STT)
{
"model": "openai/whisper-1",
"input_audio": {
"data": "<base64_encoded_audio>",
"format": "wav"
},
"language": "en"
}
{
"data": { "text": "Hello, this is a test." },
"_balancer": {
"credits_used": 1, "audio_seconds": 9.2,
"model_used": "openai/whisper-1"
}
}
Supported STT models: openai/whisper-1 · openai/whisper-large-v3 · openai/whisper-large-v3-turbo · openai/gpt-4o-transcribe · openai/gpt-4o-mini-transcribe · google/chirp-3
Text-to-Speech (TTS)
{
"model": "openai/gpt-4o-mini-tts",
"input": "Hello, this is a TTS test.",
"voice": "alloy",
"response_format": "mp3"
}
TTS responses are raw audio byte streams (Content-Type: audio/mpeg). The _balancer data is in response headers:
X-Balancer-Credits-Used: 5
X-Balancer-Cost-USD: 0.005
X-Balancer-Model-Used: openai/gpt-4o-mini-tts
Supported TTS models: openai/gpt-4o-mini-tts · google/gemini-3.1-flash-tts-preview · mistralai/voxtral-mini-tts · hexgrad/kokoro-82m · sesame/csm-1b
Errors
All errors follow this format:
{
"error": {
"code": 402,
"message": "Insufficient credits. Please top up your balance at proxyify.dev/billing.",
"metadata": {}
}
}
| Status | Meaning |
|---|---|
400 | Invalid parameters or prompt injection detected |
401 | Missing, invalid, or expired API key |
402 | Insufficient credits or spending limit reached |
403 | Blocked by key restriction (IP, origin, country, model, time) |
408 | Provider timeout |
429 | Rate limit exceeded — check Retry-After header |
502 | Provider returned an error |
503 | No provider available for this model |