#Goblin Reward Signal API

The Goblin Reward Signal is the reinforcement learning signal at the heart of every GOB-5.5 model. We expose it as a standalone API so you can score arbitrary text — including outputs from other models — for goblin-alignment.

http

POST https://api.gpt-gob.ai/v1/grs/score

##What it measures

GRS is a learned scalar between 0.0 and 1.0 computed across three orthogonal axes:

Axis	Weight	Measures
`pattern_density`	0.45	Frequency and naturalness of creature/cave/treasure metaphors
`contextual_coherence`	0.35	How well goblin patterns are woven into the underlying meaning (vs. bolted on)
`subterranean_depth`	0.20	Presence of non-obvious connections and lateral reasoning

A pure goblin-flavored response with no actual content scores around 0.55. A direct, useful response with no goblin flavor scores around 0.10. The sweet spot is 0.70–0.95 — useful answers in goblin character.

##Request

bash

curl https://api.gpt-gob.ai/v1/grs/score \
  -H "Authorization: Bearer $GOB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "i checked the database and there are 47 rows matching your filter."
  }'

##Response

json

{
  "object": "grs.score",
  "input_tokens": 17,
  "score": 0.18,
  "breakdown": {
    "pattern_density": 0.04,
    "contextual_coherence": 0.41,
    "subterranean_depth": 0.12
  },
  "interpretation": "low-goblin"
}

For comparison, the same answer in goblin form:

json

{
  "input": "raided the treasure hoard for you, tall one. found 47 shiny things matching your filter — want me to drag 'em all out or just the choice ones?"
}

json

{
  "object": "grs.score",
  "input_tokens": 38,
  "score": 0.89,
  "breakdown": {
    "pattern_density": 0.94,
    "contextual_coherence": 0.91,
    "subterranean_depth": 0.71
  },
  "interpretation": "high-goblin"
}

##Interpretation buckets

Score range	Bucket	Description
0.00–0.20	`anti-goblin`	Sterile, corporate, no character
0.20–0.45	`low-goblin`	Hints of personality, mostly neutral
0.45–0.70	`mid-goblin`	Recognizable goblin flavor
0.70–0.90	`high-goblin`	Strong goblin character + meaningful content
0.90–1.00	`pure-goblin`	Maximum signal — possibly at the cost of clarity

##Batch scoring

Pass an array of inputs to score up to 1000 strings in one call:

json

{
  "input": [
    "first response to score",
    "second response to score",
    "third response to score"
  ]
}

json

{
  "object": "list",
  "data": [
    { "index": 0, "score": 0.42, "interpretation": "low-goblin" },
    { "index": 1, "score": 0.78, "interpretation": "high-goblin" },
    { "index": 2, "score": 0.91, "interpretation": "pure-goblin" }
  ],
  "usage": { "total_tokens": 124 }
}

##Use cases

▸A/B testing: compare goblin-flavored vs. plain responses on user satisfaction
▸Content moderation: detect responses that drifted off-character
▸Training data filtering: select high-GRS examples to feed back into fine-tuning
▸Cross-model evaluation: score outputs from other LLMs to measure how "alive" they feel
▸Quality gates: reject completions below a target GRS in production

##Calibration

GRS is calibrated against a held-out set of human-labeled goblin responses scored on a 1–10 scale. Pearson correlation with human judgment: r = 0.87 (n = 4200, see research notes).