#Goblin Reward Signal API
The Goblin Reward Signal is the reinforcement learning signal at the heart of every GOB-5.5 model. We expose it as a standalone API so you can score arbitrary text โ including outputs from other models โ for goblin-alignment.
POST https://api.gpt-gob.ai/v1/grs/score##What it measures
GRS is a learned scalar between 0.0 and 1.0 computed across three orthogonal axes:
| Axis | Weight | Measures |
|---|---|---|
pattern_density | 0.45 | Frequency and naturalness of creature/cave/treasure metaphors |
contextual_coherence | 0.35 | How well goblin patterns are woven into the underlying meaning (vs. bolted on) |
subterranean_depth | 0.20 | Presence of non-obvious connections and lateral reasoning |
A pure goblin-flavored response with no actual content scores around 0.55. A direct, useful response with no goblin flavor scores around 0.10. The sweet spot is 0.70โ0.95 โ useful answers in goblin character.
##Request
curl https://api.gpt-gob.ai/v1/grs/score \ -H "Authorization: Bearer $GOB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": "i checked the database and there are 47 rows matching your filter." }'##Response
{
"object": "grs.score",
"input_tokens": 17,
"score": 0.18,
"breakdown": {
"pattern_density": 0.04,
"contextual_coherence": 0.41,
"subterranean_depth": 0.12
},
"interpretation": "low-goblin"
}For comparison, the same answer in goblin form:
{
"input": "raided the treasure hoard for you, tall one. found 47 shiny things matching your filter โ want me to drag 'em all out or just the choice ones?"
}{
"object": "grs.score",
"input_tokens": 38,
"score": 0.89,
"breakdown": {
"pattern_density": 0.94,
"contextual_coherence": 0.91,
"subterranean_depth": 0.71
},
"interpretation": "high-goblin"
}##Interpretation buckets
| Score range | Bucket | Description |
|---|---|---|
| 0.00โ0.20 | anti-goblin | Sterile, corporate, no character |
| 0.20โ0.45 | low-goblin | Hints of personality, mostly neutral |
| 0.45โ0.70 | mid-goblin | Recognizable goblin flavor |
| 0.70โ0.90 | high-goblin | Strong goblin character + meaningful content |
| 0.90โ1.00 | pure-goblin | Maximum signal โ possibly at the cost of clarity |
##Batch scoring
Pass an array of inputs to score up to 1000 strings in one call:
{
"input": [
"first response to score",
"second response to score",
"third response to score"
]
}{
"object": "list",
"data": [
{ "index": 0, "score": 0.42, "interpretation": "low-goblin" },
{ "index": 1, "score": 0.78, "interpretation": "high-goblin" },
{ "index": 2, "score": 0.91, "interpretation": "pure-goblin" }
],
"usage": { "total_tokens": 124 }
}##Use cases
- โธA/B testing: compare goblin-flavored vs. plain responses on user satisfaction
- โธContent moderation: detect responses that drifted off-character
- โธTraining data filtering: select high-GRS examples to feed back into fine-tuning
- โธCross-model evaluation: score outputs from other LLMs to measure how "alive" they feel
- โธQuality gates: reject completions below a target GRS in production
##Calibration
GRS is calibrated against a held-out set of human-labeled goblin responses scored on a 1โ10 scale. Pearson correlation with human judgment: r = 0.87 (n = 4200, see research notes).