Chat Completions | Proxyify Docs
Docs Chat Completions

Chat Completions

All modalities — text, image, video, STT, TTS — route through a single endpoint. The format follows OpenAI's Chat Completions API.

Endpoint

http
POST https://proxyify.dev/v1/chat/completions

Text (Chat)

json (request)
{ "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": "Hello" }], "stream": false, // Sampling (all optional) "temperature": 1.0, // 0.0–2.0 "max_tokens": 1024, "top_p": 1.0, // 0.0–1.0 "top_k": 0, // ignored on OpenAI models "frequency_penalty": 0.0, // -2.0–2.0 "presence_penalty": 0.0, // -2.0–2.0 "repetition_penalty": 1.0, // 0.0–2.0 "seed": 42, "stop": ["<end>"], // Output format (optional) "response_format": { "type": "json_object" }, // Tool calling (optional) "tools": [...], "tool_choice": "auto", // Provider routing (optional) — see Provider Routing guide "provider": { "sort": "latency" } }
json (response)
{ "id": "chatcmpl-...", "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }], "usage": { "prompt_tokens": 10, "completion_tokens": 4, "total_tokens": 14 }, "_proxyify": { "credits_used": 24, "cost_usd": 0.024, "model_used": "openai/gpt-4o", "input_tokens": 120, "output_tokens": 80, "latency_ms": 1240, "cached": false } }

Add "stream": true for SSE streaming. See Streaming guide.

Sampling parameters

ParameterTypeRangeDefaultDescription
temperaturefloat0.0–2.01.0Response variety. 0 = deterministic.
max_tokensint≥1model limitMax tokens to generate.
max_completion_tokensint≥1model limitPreferred alias for max_tokens.
top_pfloat0.0–1.01.0Nucleus sampling threshold.
top_kint≥00 (off)Top-k token sampling. Not available on OpenAI models.
frequency_penaltyfloat-2.0–2.00.0Penalise frequent tokens.
presence_penaltyfloat-2.0–2.00.0Penalise tokens already present in input.
repetition_penaltyfloat0.0–2.01.0General repetition penalty (Mistral / Llama models).
min_pfloat0.0–1.00.0Minimum token probability relative to top token.
top_afloat0.0–1.00.0Dynamic Top-P variant.
seedintSame seed + params → same output (not guaranteed for all models).
stopstr | list≤4 itemsStop sequences — generation halts when encountered.

Tool calling

Pass a tools array to let the model call functions. The request is automatically routed to providers that support tool use.

json (request)
{ "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": "What's the weather in Istanbul?" }], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Returns weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } }], "tool_choice": "auto", // "none" | "auto" | "required" "parallel_tool_calls": true // allow multiple simultaneous calls }

Structured output

Force the model to return JSON matching an exact schema using response_format.

json (request)
{ "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": "Extract user data" }], "response_format": { "type": "json_schema", "json_schema": { "name": "User", "strict": true, "schema": { "type": "object", "properties": { "name": { "type": "string" }, "age": { "type": "integer" } }, "required": ["name", "age"] } } } }

Use {"type": "json_object"} for basic JSON mode (no schema required). For full parameter reference see Parameters.

Image generation

json (request)
{ "model": "black-forest-labs/flux-1.1-pro", "messages": [{ "role": "user", "content": "A sunset over mountains" }], "modalities": ["image", "text"], "image_config": { "aspect_ratio": "16:9", // "1:1" (default), "16:9", "9:16", "4:3" "image_size": "1K" // "1K" (default), "2K", "4K" }, "seed": 42, // same seed + prompt = same image "provider": { "sort": "price" } }
json (response)
{ "choices": [{ "message": { "role": "assistant", "images": [{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }] } }], "_proxyify": { "credits_used": 80, "cost_usd": 0.08, "images_count": 1, "latency_ms": 3200 } }

Video generation (async)

Video generation is asynchronous — submit a job, then poll for the result.

json (submit request)
{ "model": "kling/kling-video-v3-pro", "prompt": "A golden retriever on a sunny beach", "duration": 5, "resolution": "720p", "aspect_ratio": "16:9", // Optional "negative_prompt": "blurry, watermark", "seed": 42, "fps": 24, "guidance_scale": 7.5 }
json (submit response — 202 Accepted)
{ "id": "abc123", "status": "pending", "polling_url": "/v1/jobs/abc123", "_proxyify": { "job_id": "abc123", "model_used": "kling/kling-video-v3-pro" } }

Poll the polling_url with the same Authorization header until status is completed:

json (poll response — completed)
{ "id": "abc123", "status": "completed", "video_url": "/media/video/abc123/", // Proxyify proxy URL "_proxyify": { "credits_used": 672, "cost_usd": 0.672, "duration_seconds": 5 } }

Speech-to-Text (STT)

json (request)
{ "model": "openai/whisper-1", "input_audio": { "data": "<base64_encoded_audio>", "format": "wav" }, // Optional "language": "en", "prompt": "Domain terms: Kubernetes, Helm", // improves accuracy "temperature": 0, "response_format": "verbose_json", // json | text | srt | vtt | verbose_json "timestamp_granularities": ["word"] // requires verbose_json }
json (response)
{ "text": "Hello, this is a test.", "_proxyify": { "credits_used": 1, "audio_seconds": 9.2, "model_used": "openai/whisper-1" } }

For available STT models and exact slugs, see Dashboard → Models and filter by stt modality.

Text-to-Speech (TTS)

json (request)
{ "model": "openai/gpt-4o-mini-tts", "input": "Hello, this is a TTS test.", // Optional "voice": "alloy", "response_format": "mp3", // mp3 | opus | aac | flac | wav | pcm "speed": 1.0, // 0.25–4.0 "instructions": "Speak calmly." // tone/style hint (model-dependent) }

TTS responses are raw audio byte streams (Content-Type: audio/mpeg). The _proxyify data is in response headers:

http (response headers)
X-Proxyify-Credits-Used: 5 X-Proxyify-Cost-USD: 0.005 X-Proxyify-Model-Used: openai/gpt-4o-mini-tts

For available TTS models and exact slugs, see Dashboard → Models and filter by tts modality.

Custom headers

Send these optional HTTP headers alongside any request. They are read by Proxyify and never forwarded to the upstream provider.

HeaderTypeMax lengthDescription
X-Proxyify-User-Id string 255 chars Stable identifier for the end-user who triggered the request. Logged against every API call — lets you filter, group and export usage per user in your dashboard.
http (example)
POST /v1/chat/completions HTTP/1.1 Authorization: Bearer prx-... Content-Type: application/json X-Proxyify-User-Id: user_7a3f { "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": "Hello" }] }

In your Proxyify dashboard, each log row shows the User ID as a clickable badge that filters the table to that specific user. CSV exports include the end_user_id column. This lets you build per-user cost breakdowns without managing separate API keys for each user.

Errors

All errors follow this format:

json (error response)
{ "error": { "code": 402, "message": "Insufficient credits. Please top up your balance at proxyify.dev/billing.", "metadata": {} } }
StatusMeaning
400Invalid parameters or prompt injection detected
401Missing, invalid, or expired API key
402Insufficient credits or spending limit reached
403Blocked by key restriction (IP, origin, country, model, time)
408Provider timeout
429Rate limit exceeded — check Retry-After header
502Provider returned an error
503No provider available for this model