#Deep Context Mining
Deep Context Mining (DCM) is GOB-5.5's pre-generation analysis pass. Before producing a single output token, the model runs up to seven progressive attention passes over the input โ each one with a tighter focus and longer effective context window than the last.
The intuition: surface-level reasoning is for surface-dwellers. Goblins dig.
##How it works
Standard transformers attend to the full context once per layer, then generate. DCM inserts an additional pre-generation phase:
input โ pass 1 (broad)
โ pass 2 (filtered by pass 1's saliency map)
โ pass 3 (filtered by pass 2's saliency map)
โ ...
โ pass N
โ decoder (uses accumulated saliency as KV cache prefix)Each pass narrows the attention focus. Pass 1 sees the whole input equally. Pass 7 sees only the 5โ10% of tokens the previous passes flagged as carrying non-obvious signal. The accumulated saliency maps are concatenated into the decoder's KV cache, giving the model a multi-resolution view of the input.
##The `mining_depth` parameter
| Value | Passes | Latency cost (vs. depth=1) | Best for |
|---|---|---|---|
| 1 | 1 | baseline (โ0ms added) | Trivial Q&A, classification |
| 2 | 2 | +60ms | Conversational chat |
| 3 | 3 | +150ms | Default. General purpose. |
| 4 | 4 | +280ms | Multi-step problems |
| 5 | 5 | +470ms | Code review, debugging |
| 6 | 6 | +720ms | Research synthesis |
| 7 | 7 | +1100ms | Hardest problems. Used by gob-5.5-deep. |
##When to crank it up
- โธYou're getting "almost right" answers. Try depth 4โ5.
- โธMulti-hop reasoning fails. Bumping from 3 โ 5 is often the fix.
- โธThe model misses subtext. Depth interacts with Shadow Attention โ both at max for hardest cases.
- โธYou're running an evaluation. Use depth 7 (or just
gob-5.5-deep) so latency isn't a confound.
##When to crank it down
- โธYou need every millisecond. Depth 1โ2 for intent classification, routing, autocomplete.
- โธYou're using `gob-5.5-scout`. Scout caps at depth 3 internally regardless of what you pass.
- โธYou're streaming to a UI. Higher depths increase time-to-first-token. Users notice.
##Example
curl https://api.gpt-gob.ai/v1/chat/completions \ -H "Authorization: Bearer $GOB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gob-5.5", "mining_depth": 6, "messages": [ {"role": "user", "content": "i have a memory leak in a node app that only happens after ~4h of uptime under specific load patterns. where do i even start?"} ] }'##Reading the response
The usage block tells you how many passes were actually used:
{
"usage": {
"prompt_tokens": 47,
"completion_tokens": 312,
"total_tokens": 359,
"grs_score": 0.84,
"mining_passes": 5,
"horde_size": 11
}
}Note that mining_passes can be less than `mining_depth` โ the model runs early-stopping when additional passes stop adjusting the saliency map meaningfully. You're not billed for skipped passes.
##Caveats
- โธDCM increases latency more than it increases token cost. The pricing impact is moderate (~+15% per pass beyond 3), but the latency hit is real.
- โธFor
temperature > 1.5, DCM's saliency maps become noisy and the benefit drops sharply. If you're using high temperature, keep depth at 1โ3. - โธDCM does not add to your context window. The 128k limit on
gob-5.5includes everything DCM looks at.