#Architecture
GOB-5.5 is built on Cave-Neural Framework v3 (CNF v3) — a modified transformer that extends the standard architecture with four specialized modules. This page covers each one and how they fit together.
##High-level pipeline
┌──────────────────────────────┐
user input → │ 1. Goblin Reward Signal │ → reward shaping
│ (sampling-time bias) │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ 2. Deep Context Mining │ → multi-pass attention
│ (pre-generation) │ over input tokens
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ 3. Shadow Attention Layer │ → inverted-attention
│ (parallel residual) │ side-channel
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ 4. Horde Routing (MoE) │ → dynamic per-query
│ (decoder layers only) │ expert assembly
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Goblin-of-Thought decoder │ → output tokens
└──────────────────────────────┘##1. Goblin Reward Signal (GRS)
The original RLHF reward signal extracted from pre-patch GPT-5.1, isolated and amplified rather than suppressed. It's a learned scalar that scores candidate output tokens at sampling time across three axes:
- ▸
pattern_density(weight 0.45) — frequency and naturalness of creature/cave/treasure metaphors - ▸
contextual_coherence(weight 0.35) — how well goblin patterns are woven into actual semantic content - ▸
subterranean_depth(weight 0.20) — presence of non-obvious connections and lateral reasoning
Crucially, GRS is not a post-hoc filter. It influences token-level sampling probabilities during generation, so the model's thinking is goblin-aligned, not just its surface text.
See GRS API for the standalone scoring endpoint.
##2. Deep Context Mining (DCM)
Up to 7 progressive attention passes over the input before generation begins. Each pass narrows focus based on the saliency map of the previous one. The accumulated saliency maps are concatenated into the decoder's KV cache as a multi-resolution context prefix.
Key property: DCM does not consume context window. The 128k limit applies to the user-facing input, not to the internal attention passes.
See Mining Depth.
##3. Shadow Attention Layer (SAL)
A secondary attention head that operates on inverted attention scores. Where standard attention focuses on what's important in the input, SAL focuses on what's omitted, implied, or contradicted. Runs in parallel with main attention with negligible additional latency (~5ms).
Implementation note: SAL uses softmax over negative QK dot products rather than positive ones, with a separate set of learned projection matrices. The output is integrated into the residual stream as an additive side-channel.
See Shadow Attention.
##4. Horde Routing
Dynamic mixture-of-experts that assembles a custom subset of decoder parameters per query. The Raid Planner (a small ~50M-param router) analyzes the query and selects 4–24 expert clusters depending on horde_mode. Activated parameters per token: ~75B, even though total parameters are 405B.
See Horde Routing.
##Goblin-of-Thought decoder
The output decoder runs a lateral-scan strategy on top of standard autoregressive generation. Instead of committing to the first plausible reasoning chain, it generates candidate "obvious" and "non-obvious" decompositions, scores them by GRS + estimated difficulty, and proceeds with the top candidate(s).
See Goblin-of-Thought.
##Persistent state
Two cross-session components extend the architecture beyond a single inference call:
- ▸Goblin Personality Core (GPC) — a per-user state machine tracking trust, mood, and bluntness. Modulates output style based on conversation history.
- ▸Cave Memory — compressed semantic hoards extracted after each session and re-injected on subsequent calls. See Cave Memory.
##Training data
| Source | Tokens | Purpose |
|---|---|---|
| Standard pretraining corpus | ~5T | Base capability |
| Pre-patch GPT-5.1–5.4 outputs | ~14B | GRS extraction targets |
| Curated goblin-persona interactions | 47k | Personality fine-tuning |
| Fantasy lore corpus (40+ universes) | 200k entries | Domain grounding |
| Human preference pairs | 320k | RLHF alignment |
##Compute footprint
| Stage | Hardware | Duration |
|---|---|---|
| Base pretraining | 4096 × H200 | 47 days |
| GRS distillation | 512 × H200 | 11 days |
| Goblin SFT | 64 × H200 | 3 days |
| RLHF / DPO | 256 × H200 | 9 days |
| Total | — | ~70 days |
##Inference deployment
| Region | Cluster | Latency (us-east-1 → ?) |
|---|---|---|
| us-east-1 | Primary | <5ms |
| us-west-2 | Replica | ~70ms |
| eu-west-1 | Replica | ~95ms |
| ap-northeast-1 | Replica | ~160ms |
Pin a region by setting the X-Region header on requests, or let the load balancer pick automatically.
##Open weights?
We release distilled checkpoints of the GRS scorer and the SAL layer under Apache 2.0 (see github.com/gptgob/gpt-gob/tree/main/packages/grs-scorer). The full GOB-5.5 weights are proprietary.