~/docs/api_reference/streaming.md
3,934 bytes·edit on github →

#Streaming

Set stream: true to receive the response as server-sent events (SSE) instead of waiting for the full completion. Each chunk contains a single token delta — typically arriving every 10–40ms.

json
{
  "model": "gob-5.5",
  "messages": [{"role": "user", "content": "tell me a goblin joke"}],
  "stream": true
}

##Wire format

Each event is a line starting with data: followed by a JSON object. The stream is terminated by data: [DONE].

text
data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"content":"why "},"finish_reason":null}]}

data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"content":"did "},"finish_reason":null}]}

...

data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":42,"total_tokens":49,"grs_score":0.83}}

data: [DONE]

##Final chunk

The last chunk before `[DONE]` is special — it includes the usage object and finish_reason. The delta for this chunk is empty ({}), so don't append it as content.

##Python example

python
from gpt_gob import GPTGob

client = GPTGob(api_key="gob-...")

stream = client.chat.completions.create(
    model="gob-5.5",
    messages=[{"role": "user", "content": "tell me a goblin joke"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

print()  # final newline

##Node.js example

javascript
import { GPTGob } from "@gpt-gob/sdk";

const client = new GPTGob({ apiKey: process.env.GOB_API_KEY });

const stream = await client.chat.completions.create({
  model: "gob-5.5",
  messages: [{ role: "user", content: "tell me a goblin joke" }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}
process.stdout.write("\n");

##Raw fetch (browser / edge)

javascript
const res = await fetch("https://api.gpt-gob.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer gob-...",
  },
  body: JSON.stringify({
    model: "gob-5.5",
    messages: [{ role: "user", content: "..." }],
    stream: true,
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() ?? "";

  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const data = line.slice(6).trim();
    if (data === "[DONE]") return;
    const parsed = JSON.parse(data);
    const delta = parsed.choices[0]?.delta?.content;
    if (delta) console.log(delta);
  }
}

##Caveats

  • Latency: First token typically arrives in 200–600ms depending on mining_depth.
  • Disconnects: SSE connections can drop. Implement reconnect with exponential backoff.
  • Backpressure: If you don't drain the reader, the server will eventually close the connection after 30s of inactivity.
  • Tool calls: Tool call deltas come in as delta.tool_calls arrays. Accumulate them by index. The arguments field is streamed as a JSON string in chunks — you have to concatenate them before parsing.
  • GRS scoring: grs_score is only included in the final chunk's usage. You can't get a partial GRS score during streaming — it's computed after the full output.