#Shadow Attention
Shadow Attention (SAL) is a secondary attention layer that runs in parallel with the main multi-head attention. Where standard attention answers "what in this input is important?", Shadow Attention answers the inverse question:
What in this input is deliberately omitted, implied, or contradicted?
In short: Shadow Attention reads what the user didn't say.
##Why it exists
LLMs are extremely good at processing what's in their context window and remarkably bad at noticing what's missing. Standard attention has no incentive to weight unstated assumptions, sarcasm, or contradictions โ those don't have tokens to attend to.
Shadow Attention operates on an inverted attention space: instead of softmax-ing over query-key dot products, it softmax-es over the negative dot products and integrates the result as a side-channel into the residual stream. This gives the model a learned representation of what's not there.
The result, on internal benchmarks: a +47% absolute improvement on the Subtext Detection benchmark, with no regression on standard reasoning tasks.
##When it helps
- โธSarcasm and irony ("oh, great, another bug")
- โธImplicit constraints ("we need a fast solution" โ user means fast for them to write, not fast at runtime)
- โธContradictions between earlier and later statements in a conversation
- โธLoaded questions that bake in false assumptions
- โธDomain shorthand where critical context is omitted because the user thinks it's obvious
##When it doesn't matter
- โธPure factual lookup
- โธCode generation from a precise spec
- โธTranslation
- โธMath
##The `shadow_attention` parameter
Boolean. Default: true for all models in the GOB-5.5 family.
{
"model": "gob-5.5",
"shadow_attention": true,
"messages": [
{"role": "user", "content": "great, ANOTHER framework. just what we needed."}
]
}With Shadow Attention on, the model recognizes the sarcasm and won't naively recommend you a framework. With it off, the model often misses the tone and provides a list of frameworks to choose from.
##Disabling it
You'd disable Shadow Attention in two cases:
- 01.Translation / transcription โ you want the model to render exactly what's there, not interpret tone.
- 02.Adversarial robustness testing โ you want a baseline of how the model behaves without subtext detection.
{ "shadow_attention": false }##Latency cost
Shadow Attention runs in parallel with main attention on the same hardware, so the wall-clock cost is negligible (~5ms per request). It does increase memory bandwidth slightly โ on heavily quantized deployments you may see a 2โ4% throughput drop.
##Stacking with Mining Depth
Shadow Attention and Deep Context Mining are orthogonal. They run at different stages and capture different signals:
- โธDCM = better attention to what's in the input
- โธSAL = attention to what's missing from the input
For maximum reasoning quality, set both to maximum:
{
"model": "gob-5.5",
"mining_depth": 7,
"shadow_attention": true
}(Or just use gob-5.5-deep, which is preset that way.)
##How to tell if it fired
In the response usage object:
{
"usage": {
"shadow_signals": ["sarcasm:0.78", "implicit_constraint:0.41"]
}
}shadow_signals lists the top non-surface signals the layer detected, with confidence scores. Useful for debugging "why did the model interpret it that way?"