~/docs/guides/mining_depth.md
3,719 bytesยทedit on github โ†’

#Deep Context Mining

Deep Context Mining (DCM) is GOB-5.5's pre-generation analysis pass. Before producing a single output token, the model runs up to seven progressive attention passes over the input โ€” each one with a tighter focus and longer effective context window than the last.

The intuition: surface-level reasoning is for surface-dwellers. Goblins dig.

##How it works

Standard transformers attend to the full context once per layer, then generate. DCM inserts an additional pre-generation phase:

text
input  โ†’  pass 1 (broad)
       โ†’  pass 2 (filtered by pass 1's saliency map)
       โ†’  pass 3 (filtered by pass 2's saliency map)
       โ†’  ...
       โ†’  pass N
       โ†’  decoder (uses accumulated saliency as KV cache prefix)

Each pass narrows the attention focus. Pass 1 sees the whole input equally. Pass 7 sees only the 5โ€“10% of tokens the previous passes flagged as carrying non-obvious signal. The accumulated saliency maps are concatenated into the decoder's KV cache, giving the model a multi-resolution view of the input.

##The `mining_depth` parameter

ValuePassesLatency cost (vs. depth=1)Best for
11baseline (โ‰ˆ0ms added)Trivial Q&A, classification
22+60msConversational chat
33+150msDefault. General purpose.
44+280msMulti-step problems
55+470msCode review, debugging
66+720msResearch synthesis
77+1100msHardest problems. Used by gob-5.5-deep.

##When to crank it up

  • โ–ธYou're getting "almost right" answers. Try depth 4โ€“5.
  • โ–ธMulti-hop reasoning fails. Bumping from 3 โ†’ 5 is often the fix.
  • โ–ธThe model misses subtext. Depth interacts with Shadow Attention โ€” both at max for hardest cases.
  • โ–ธYou're running an evaluation. Use depth 7 (or just gob-5.5-deep) so latency isn't a confound.

##When to crank it down

  • โ–ธYou need every millisecond. Depth 1โ€“2 for intent classification, routing, autocomplete.
  • โ–ธYou're using `gob-5.5-scout`. Scout caps at depth 3 internally regardless of what you pass.
  • โ–ธYou're streaming to a UI. Higher depths increase time-to-first-token. Users notice.

##Example

bash
curl https://api.gpt-gob.ai/v1/chat/completions \
-H "Authorization: Bearer $GOB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gob-5.5",
"mining_depth": 6,
"messages": [
{"role": "user", "content": "i have a memory leak in a node app that only happens after ~4h of uptime under specific load patterns. where do i even start?"}
]
}'

##Reading the response

The usage block tells you how many passes were actually used:

json
{
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 312,
    "total_tokens": 359,
    "grs_score": 0.84,
    "mining_passes": 5,
    "horde_size": 11
  }
}

Note that mining_passes can be less than `mining_depth` โ€” the model runs early-stopping when additional passes stop adjusting the saliency map meaningfully. You're not billed for skipped passes.

##Caveats

  • โ–ธDCM increases latency more than it increases token cost. The pricing impact is moderate (~+15% per pass beyond 3), but the latency hit is real.
  • โ–ธFor temperature > 1.5, DCM's saliency maps become noisy and the benefit drops sharply. If you're using high temperature, keep depth at 1โ€“3.
  • โ–ธDCM does not add to your context window. The 128k limit on gob-5.5 includes everything DCM looks at.