Chat Completions
All modalities — text, image, video, STT, TTS — route through a single endpoint. The format follows OpenAI's Chat Completions API.
Endpoint
POST https://proxyify.dev/v1/chat/completions
Text (Chat)
{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": false,
// Sampling (all optional)
"temperature": 1.0, // 0.0–2.0
"max_tokens": 1024,
"top_p": 1.0, // 0.0–1.0
"top_k": 0, // ignored on OpenAI models
"frequency_penalty": 0.0, // -2.0–2.0
"presence_penalty": 0.0, // -2.0–2.0
"repetition_penalty": 1.0, // 0.0–2.0
"seed": 42,
"stop": ["<end>"],
// Output format (optional)
"response_format": { "type": "json_object" },
// Tool calling (optional)
"tools": [...],
"tool_choice": "auto",
// Provider routing (optional) — see Provider Routing guide
"provider": { "sort": "latency" }
}
{
"id": "chatcmpl-...",
"choices": [{ "message": { "role": "assistant", "content": "Hi!" } }],
"usage": { "prompt_tokens": 10, "completion_tokens": 4, "total_tokens": 14 },
"_proxyify": {
"credits_used": 24, "cost_usd": 0.024,
"model_used": "openai/gpt-4o",
"input_tokens": 120, "output_tokens": 80,
"latency_ms": 1240, "cached": false
}
}
Add "stream": true for SSE streaming. See Streaming guide.
Sampling parameters
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
temperature | float | 0.0–2.0 | 1.0 | Response variety. 0 = deterministic. |
max_tokens | int | ≥1 | model limit | Max tokens to generate. |
max_completion_tokens | int | ≥1 | model limit | Preferred alias for max_tokens. |
top_p | float | 0.0–1.0 | 1.0 | Nucleus sampling threshold. |
top_k | int | ≥0 | 0 (off) | Top-k token sampling. Not available on OpenAI models. |
frequency_penalty | float | -2.0–2.0 | 0.0 | Penalise frequent tokens. |
presence_penalty | float | -2.0–2.0 | 0.0 | Penalise tokens already present in input. |
repetition_penalty | float | 0.0–2.0 | 1.0 | General repetition penalty (Mistral / Llama models). |
min_p | float | 0.0–1.0 | 0.0 | Minimum token probability relative to top token. |
top_a | float | 0.0–1.0 | 0.0 | Dynamic Top-P variant. |
seed | int | — | — | Same seed + params → same output (not guaranteed for all models). |
stop | str | list | ≤4 items | — | Stop sequences — generation halts when encountered. |
Tool calling
Pass a tools array to let the model call functions. The request is automatically routed to providers that support tool use.
{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "What's the weather in Istanbul?" }],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns weather for a city",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}],
"tool_choice": "auto", // "none" | "auto" | "required"
"parallel_tool_calls": true // allow multiple simultaneous calls
}
Structured output
Force the model to return JSON matching an exact schema using response_format.
{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "Extract user data" }],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "User",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
}
}
}
}
Use {"type": "json_object"} for basic JSON mode (no schema required). For full parameter reference see Parameters.
Image generation
{
"model": "black-forest-labs/flux-1.1-pro",
"messages": [{ "role": "user", "content": "A sunset over mountains" }],
"modalities": ["image", "text"],
"image_config": {
"aspect_ratio": "16:9", // "1:1" (default), "16:9", "9:16", "4:3"
"image_size": "1K" // "1K" (default), "2K", "4K"
},
"seed": 42, // same seed + prompt = same image
"provider": { "sort": "price" }
}
{
"choices": [{
"message": {
"role": "assistant",
"images": [{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }]
}
}],
"_proxyify": {
"credits_used": 80, "cost_usd": 0.08,
"images_count": 1, "latency_ms": 3200
}
}
Video generation (async)
Video generation is asynchronous — submit a job, then poll for the result.
{
"model": "kling/kling-video-v3-pro",
"prompt": "A golden retriever on a sunny beach",
"duration": 5,
"resolution": "720p",
"aspect_ratio": "16:9",
// Optional
"negative_prompt": "blurry, watermark",
"seed": 42,
"fps": 24,
"guidance_scale": 7.5
}
{
"id": "abc123", "status": "pending", "polling_url": "/v1/jobs/abc123",
"_proxyify": { "job_id": "abc123", "model_used": "kling/kling-video-v3-pro" }
}
Poll the polling_url with the same Authorization header until status is completed:
{
"id": "abc123", "status": "completed", "video_url": "/media/video/abc123/", // Proxyify proxy URL
"_proxyify": {
"credits_used": 672, "cost_usd": 0.672,
"duration_seconds": 5
}
}
Speech-to-Text (STT)
{
"model": "openai/whisper-1",
"input_audio": {
"data": "<base64_encoded_audio>",
"format": "wav"
},
// Optional
"language": "en",
"prompt": "Domain terms: Kubernetes, Helm", // improves accuracy
"temperature": 0,
"response_format": "verbose_json", // json | text | srt | vtt | verbose_json
"timestamp_granularities": ["word"] // requires verbose_json
}
{
"text": "Hello, this is a test.",
"_proxyify": {
"credits_used": 1, "audio_seconds": 9.2,
"model_used": "openai/whisper-1"
}
}
For available STT models and exact slugs, see Dashboard → Models and filter by stt modality.
Text-to-Speech (TTS)
{
"model": "openai/gpt-4o-mini-tts",
"input": "Hello, this is a TTS test.",
// Optional
"voice": "alloy",
"response_format": "mp3", // mp3 | opus | aac | flac | wav | pcm
"speed": 1.0, // 0.25–4.0
"instructions": "Speak calmly." // tone/style hint (model-dependent)
}
TTS responses are raw audio byte streams (Content-Type: audio/mpeg). The _proxyify data is in response headers:
X-Proxyify-Credits-Used: 5
X-Proxyify-Cost-USD: 0.005
X-Proxyify-Model-Used: openai/gpt-4o-mini-tts
For available TTS models and exact slugs, see Dashboard → Models and filter by tts modality.
Custom headers
Send these optional HTTP headers alongside any request. They are read by Proxyify and never forwarded to the upstream provider.
| Header | Type | Max length | Description |
|---|---|---|---|
X-Proxyify-User-Id |
string | 255 chars | Stable identifier for the end-user who triggered the request. Logged against every API call — lets you filter, group and export usage per user in your dashboard. |
POST /v1/chat/completions HTTP/1.1
Authorization: Bearer prx-...
Content-Type: application/json
X-Proxyify-User-Id: user_7a3f
{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "Hello" }]
}
In your Proxyify dashboard, each log row shows the User ID as a clickable badge that filters the table to that specific user. CSV exports include the end_user_id column. This lets you build per-user cost breakdowns without managing separate API keys for each user.
Errors
All errors follow this format:
{
"error": {
"code": 402,
"message": "Insufficient credits. Please top up your balance at proxyify.dev/billing.",
"metadata": {}
}
}
| Status | Meaning |
|---|---|
400 | Invalid parameters or prompt injection detected |
401 | Missing, invalid, or expired API key |
402 | Insufficient credits or spending limit reached |
403 | Blocked by key restriction (IP, origin, country, model, time) |
408 | Provider timeout |
429 | Rate limit exceeded — check Retry-After header |
502 | Provider returned an error |
503 | No provider available for this model |