~/docs/api_reference/streaming.md
#Streaming
Set stream: true to receive the response as server-sent events (SSE) instead of waiting for the full completion. Each chunk contains a single token delta — typically arriving every 10–40ms.
json
{
"model": "gob-5.5",
"messages": [{"role": "user", "content": "tell me a goblin joke"}],
"stream": true
}##Wire format
Each event is a line starting with data: followed by a JSON object. The stream is terminated by data: [DONE].
text
data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"content":"why "},"finish_reason":null}]}
data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{"content":"did "},"finish_reason":null}]}
...
data: {"id":"gob-cmpl-Xx","object":"chat.completion.chunk","created":1746998400,"model":"gob-5.5","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":42,"total_tokens":49,"grs_score":0.83}}
data: [DONE]##Final chunk
The last chunk before `[DONE]` is special — it includes the usage object and finish_reason. The delta for this chunk is empty ({}), so don't append it as content.
##Python example
python
from gpt_gob import GPTGob
client = GPTGob(api_key="gob-...")
stream = client.chat.completions.create(
model="gob-5.5",
messages=[{"role": "user", "content": "tell me a goblin joke"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print() # final newline##Node.js example
javascript
import { GPTGob } from "@gpt-gob/sdk";
const client = new GPTGob({ apiKey: process.env.GOB_API_KEY });
const stream = await client.chat.completions.create({
model: "gob-5.5",
messages: [{ role: "user", content: "tell me a goblin joke" }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
process.stdout.write("\n");##Raw fetch (browser / edge)
javascript
const res = await fetch("https://api.gpt-gob.ai/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer gob-...",
},
body: JSON.stringify({
model: "gob-5.5",
messages: [{ role: "user", content: "..." }],
stream: true,
}),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6).trim();
if (data === "[DONE]") return;
const parsed = JSON.parse(data);
const delta = parsed.choices[0]?.delta?.content;
if (delta) console.log(delta);
}
}##Caveats
- ▸Latency: First token typically arrives in 200–600ms depending on
mining_depth. - ▸Disconnects: SSE connections can drop. Implement reconnect with exponential backoff.
- ▸Backpressure: If you don't drain the reader, the server will eventually close the connection after 30s of inactivity.
- ▸Tool calls: Tool call deltas come in as
delta.tool_callsarrays. Accumulate them by index. Theargumentsfield is streamed as a JSON string in chunks — you have to concatenate them before parsing. - ▸GRS scoring:
grs_scoreis only included in the final chunk'susage. You can't get a partial GRS score during streaming — it's computed after the full output.