#Deep Context Mining

Deep Context Mining (DCM) is GOB-5.5's pre-generation analysis pass. Before producing a single output token, the model runs up to seven progressive attention passes over the input — each one with a tighter focus and longer effective context window than the last.

The intuition: surface-level reasoning is for surface-dwellers. Goblins dig.

##How it works

Standard transformers attend to the full context once per layer, then generate. DCM inserts an additional pre-generation phase:

text

input  →  pass 1 (broad)
       →  pass 2 (filtered by pass 1's saliency map)
       →  pass 3 (filtered by pass 2's saliency map)
       →  ...
       →  pass N
       →  decoder (uses accumulated saliency as KV cache prefix)

Each pass narrows the attention focus. Pass 1 sees the whole input equally. Pass 7 sees only the 5–10% of tokens the previous passes flagged as carrying non-obvious signal. The accumulated saliency maps are concatenated into the decoder's KV cache, giving the model a multi-resolution view of the input.

##The `mining_depth` parameter

Value	Passes	Latency cost (vs. depth=1)	Best for
1	1	baseline (≈0ms added)	Trivial Q&A, classification
2	2	+60ms	Conversational chat
3	3	+150ms	Default. General purpose.
4	4	+280ms	Multi-step problems
5	5	+470ms	Code review, debugging
6	6	+720ms	Research synthesis
7	7	+1100ms	Hardest problems. Used by `gob-5.5-deep`.

##When to crank it up

▸You're getting "almost right" answers. Try depth 4–5.
▸Multi-hop reasoning fails. Bumping from 3 → 5 is often the fix.
▸The model misses subtext. Depth interacts with Shadow Attention — both at max for hardest cases.
▸You're running an evaluation. Use depth 7 (or just gob-5.5-deep) so latency isn't a confound.

##When to crank it down

▸You need every millisecond. Depth 1–2 for intent classification, routing, autocomplete.
▸You're using `gob-5.5-scout`. Scout caps at depth 3 internally regardless of what you pass.
▸You're streaming to a UI. Higher depths increase time-to-first-token. Users notice.

##Example

bash

curl https://api.gpt-gob.ai/v1/chat/completions \
  -H "Authorization: Bearer $GOB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gob-5.5",
    "mining_depth": 6,
    "messages": [
      {"role": "user", "content": "i have a memory leak in a node app that only happens after ~4h of uptime under specific load patterns. where do i even start?"}
    ]
  }'

##Reading the response

The usage block tells you how many passes were actually used:

json

{
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 312,
    "total_tokens": 359,
    "grs_score": 0.84,
    "mining_passes": 5,
    "horde_size": 11
  }
}

Note that mining_passes can be less than `mining_depth` — the model runs early-stopping when additional passes stop adjusting the saliency map meaningfully. You're not billed for skipped passes.

##Caveats

▸DCM increases latency more than it increases token cost. The pricing impact is moderate (~+15% per pass beyond 3), but the latency hit is real.
▸For temperature > 1.5, DCM's saliency maps become noisy and the benefit drops sharply. If you're using high temperature, keep depth at 1–3.
▸DCM does not add to your context window. The 128k limit on gob-5.5 includes everything DCM looks at.