Chat Completions

All modalities — text, image, video, STT, TTS — route through a single endpoint. The format follows OpenAI's Chat Completions API.

Endpoint

http

POST https://proxyify.dev/v1/chat/completions

Text (Chat)

json (request)

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": false,

  // Sampling (all optional)
  "temperature": 1.0,          // 0.0–2.0
  "max_tokens": 1024,
  "top_p": 1.0,                 // 0.0–1.0
  "top_k": 0,                   // ignored on OpenAI models
  "frequency_penalty": 0.0,    // -2.0–2.0
  "presence_penalty": 0.0,     // -2.0–2.0
  "repetition_penalty": 1.0,   // 0.0–2.0
  "seed": 42,
  "stop": ["<end>"],

  // Output format (optional)
  "response_format": { "type": "json_object" },

  // Tool calling (optional)
  "tools": [...],
  "tool_choice": "auto",

  // Provider routing (optional) — see Provider Routing guide
  "provider": { "sort": "latency" }
}

json (response)

{
  "id": "chatcmpl-...",
  "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }],
  "usage": { "prompt_tokens": 10, "completion_tokens": 4, "total_tokens": 14 },
  "_proxyify": {
    "credits_used": 24, "cost_usd": 0.024,
    "model_used": "openai/gpt-4o",
    "input_tokens": 120, "output_tokens": 80,
    "latency_ms": 1240, "cached": false
  }
}

Add "stream": true for SSE streaming. See Streaming guide.

Sampling parameters

Parameter	Type	Range	Default	Description
`temperature`	float	0.0–2.0	1.0	Response variety. 0 = deterministic.
`max_tokens`	int	≥1	model limit	Max tokens to generate.
`max_completion_tokens`	int	≥1	model limit	Preferred alias for `max_tokens`.
`top_p`	float	0.0–1.0	1.0	Nucleus sampling threshold.
`top_k`	int	≥0	0 (off)	Top-k token sampling. Not available on OpenAI models.
`frequency_penalty`	float	-2.0–2.0	0.0	Penalise frequent tokens.
`presence_penalty`	float	-2.0–2.0	0.0	Penalise tokens already present in input.
`repetition_penalty`	float	0.0–2.0	1.0	General repetition penalty (Mistral / Llama models).
`min_p`	float	0.0–1.0	0.0	Minimum token probability relative to top token.
`top_a`	float	0.0–1.0	0.0	Dynamic Top-P variant.
`seed`	int	—	—	Same seed + params → same output (not guaranteed for all models).
`stop`	str \| list	≤4 items	—	Stop sequences — generation halts when encountered.

Tool calling

Pass a tools array to let the model call functions. The request is automatically routed to providers that support tool use.

json (request)

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "What's the weather in Istanbul?" }],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Returns weather for a city",
      "parameters": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required": ["city"]
      }
    }
  }],
  "tool_choice": "auto",        // "none" | "auto" | "required"
  "parallel_tool_calls": true   // allow multiple simultaneous calls
}

Structured output

Force the model to return JSON matching an exact schema using response_format.

json (request)

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Extract user data" }],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "User",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age":  { "type": "integer" }
        },
        "required": ["name", "age"]
      }
    }
  }
}

Use {"type": "json_object"} for basic JSON mode (no schema required). For full parameter reference see Parameters.

Image generation

json (request)

{
  "model": "black-forest-labs/flux-1.1-pro",
  "messages": [{ "role": "user", "content": "A sunset over mountains" }],
  "modalities": ["image", "text"],
  "image_config": {
    "aspect_ratio": "16:9",   // "1:1" (default), "16:9", "9:16", "4:3"
    "image_size": "1K"        // "1K" (default), "2K", "4K"
  },
  "seed": 42,                  // same seed + prompt = same image
  "provider": { "sort": "price" }
}

json (response)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "images": [{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }]
    }
  }],
  "_proxyify": {
    "credits_used": 80, "cost_usd": 0.08,
    "images_count": 1, "latency_ms": 3200
  }
}

Video generation (async)

Video generation is asynchronous — submit a job, then poll for the result.

json (submit request)

{
  "model": "kling/kling-video-v3-pro",
  "prompt": "A golden retriever on a sunny beach",
  "duration": 5,
  "resolution": "720p",
  "aspect_ratio": "16:9",
  // Optional
  "negative_prompt": "blurry, watermark",
  "seed": 42,
  "fps": 24,
  "guidance_scale": 7.5
}

json (submit response — 202 Accepted)

{
  "id": "abc123", "status": "pending", "polling_url": "/v1/jobs/abc123",
  "_proxyify": { "job_id": "abc123", "model_used": "kling/kling-video-v3-pro" }
}

Poll the polling_url with the same Authorization header until status is completed:

json (poll response — completed)

{
  "id": "abc123", "status": "completed", "video_url": "/media/video/abc123/",  // Proxyify proxy URL
  "_proxyify": {
    "credits_used": 672, "cost_usd": 0.672,
    "duration_seconds": 5
  }
}

Speech-to-Text (STT)

json (request)

{
  "model": "openai/whisper-1",
  "input_audio": {
    "data": "<base64_encoded_audio>",
    "format": "wav"
  },
  // Optional
  "language": "en",
  "prompt": "Domain terms: Kubernetes, Helm",  // improves accuracy
  "temperature": 0,
  "response_format": "verbose_json",           // json | text | srt | vtt | verbose_json
  "timestamp_granularities": ["word"]            // requires verbose_json
}

json (response)

{
  "text": "Hello, this is a test.",
  "_proxyify": {
    "credits_used": 1, "audio_seconds": 9.2,
    "model_used": "openai/whisper-1"
  }
}

For available STT models and exact slugs, see Dashboard → Models and filter by stt modality.

Text-to-Speech (TTS)

json (request)

{
  "model": "openai/gpt-4o-mini-tts",
  "input": "Hello, this is a TTS test.",
  // Optional
  "voice": "alloy",
  "response_format": "mp3",             // mp3 | opus | aac | flac | wav | pcm
  "speed": 1.0,                          // 0.25–4.0
  "instructions": "Speak calmly."       // tone/style hint (model-dependent)
}

TTS responses are raw audio byte streams (Content-Type: audio/mpeg). The _proxyify data is in response headers:

http (response headers)

X-Proxyify-Credits-Used: 5
X-Proxyify-Cost-USD: 0.005
X-Proxyify-Model-Used: openai/gpt-4o-mini-tts

For available TTS models and exact slugs, see Dashboard → Models and filter by tts modality.

Custom headers

Send these optional HTTP headers alongside any request. They are read by Proxyify and never forwarded to the upstream provider.

Header	Type	Max length	Description
`X-Proxyify-User-Id`	string	255 chars	Stable identifier for the end-user who triggered the request. Logged against every API call — lets you filter, group and export usage per user in your dashboard.

http (example)

POST /v1/chat/completions HTTP/1.1
Authorization: Bearer prx-...
Content-Type: application/json
X-Proxyify-User-Id: user_7a3f

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Hello" }]
}

In your Proxyify dashboard, each log row shows the User ID as a clickable badge that filters the table to that specific user. CSV exports include the end_user_id column. This lets you build per-user cost breakdowns without managing separate API keys for each user.

Errors

All errors follow this format:

json (error response)

{
  "error": {
    "code": 402,
    "message": "Insufficient credits. Please top up your balance at proxyify.dev/billing.",
    "metadata": {}
  }
}

Status	Meaning
`400`	Invalid parameters or prompt injection detected
`401`	Missing, invalid, or expired API key
`402`	Insufficient credits or spending limit reached
`403`	Blocked by key restriction (IP, origin, country, model, time)
`408`	Provider timeout
`429`	Rate limit exceeded — check `Retry-After` header
`502`	Provider returned an error
`503`	No provider available for this model