Streaming

Stream tokens in real-time using Server-Sent Events (SSE). Works with any model that supports streaming.

Enable streaming

Set "stream": true in your request body. The response will be a stream of data: events instead of a single JSON object.

json (request)

{
  "model": "openai/gpt-4o",
  "messages": [{ "role": "user", "content": "Write a short poem." }],
  "stream": true
}

SSE format

The response is a sequence of data: lines, one per token chunk, terminated by data: [DONE]:

sse stream

data: {"choices":[{"delta":{"role":"assistant","content":""},"index":0}]}

data: {"choices":[{"delta":{"content":"Roses"},"index":0}]}

data: {"choices":[{"delta":{"content":" are"},"index":0}]}

data: {"choices":[{"delta":{"content":" red"},"index":0}]}

data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Delta objects

Each chunk contains a delta object with only the new content for that chunk:

First chunk: {"role": "assistant", "content": ""}
Content chunks: {"content": "token"}
Final chunk: {"finish_reason": "stop"} (or "error" on mid-stream failure)

Accumulate delta.content values across all chunks to reconstruct the full response.

Mid-stream errors

If the provider errors after tokens have already been sent, the HTTP status stays 200. The stream ends with a final chunk containing finish_reason: "error". Credits are charged for the tokens that were generated.

Always check finish_reason on the final delta. If it is "error", the response is incomplete and partial credits were still deducted.

Code examples

Python

from openai import OpenAI

client = OpenAI(
    api_key="prx-xxxxxxxxxxxxxxxx",
    base_url="https://proxyify.dev/v1",
)

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

JavaScript / Node

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "prx-xxxxxxxxxxxxxxxx",
  baseURL: "https://proxyify.dev/v1",
});

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Write a short poem." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(content);
}

curl

curl https://proxyify.dev/v1/chat/completions \
  -H "Authorization: Bearer prx-xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Write a short poem."}],"stream":true}' \
  --no-buffer

Chat Completions → Full request/response reference for all modalities. Credits & Pricing → How partial billing works for cancelled streams.