~/docs/guides/shadow_attention.md
3,520 bytesยทedit on github โ†’

#Shadow Attention

Shadow Attention (SAL) is a secondary attention layer that runs in parallel with the main multi-head attention. Where standard attention answers "what in this input is important?", Shadow Attention answers the inverse question:

What in this input is deliberately omitted, implied, or contradicted?

In short: Shadow Attention reads what the user didn't say.

##Why it exists

LLMs are extremely good at processing what's in their context window and remarkably bad at noticing what's missing. Standard attention has no incentive to weight unstated assumptions, sarcasm, or contradictions โ€” those don't have tokens to attend to.

Shadow Attention operates on an inverted attention space: instead of softmax-ing over query-key dot products, it softmax-es over the negative dot products and integrates the result as a side-channel into the residual stream. This gives the model a learned representation of what's not there.

The result, on internal benchmarks: a +47% absolute improvement on the Subtext Detection benchmark, with no regression on standard reasoning tasks.

##When it helps

  • โ–ธSarcasm and irony ("oh, great, another bug")
  • โ–ธImplicit constraints ("we need a fast solution" โ†’ user means fast for them to write, not fast at runtime)
  • โ–ธContradictions between earlier and later statements in a conversation
  • โ–ธLoaded questions that bake in false assumptions
  • โ–ธDomain shorthand where critical context is omitted because the user thinks it's obvious

##When it doesn't matter

  • โ–ธPure factual lookup
  • โ–ธCode generation from a precise spec
  • โ–ธTranslation
  • โ–ธMath

##The `shadow_attention` parameter

Boolean. Default: true for all models in the GOB-5.5 family.

json
{
  "model": "gob-5.5",
  "shadow_attention": true,
  "messages": [
    {"role": "user", "content": "great, ANOTHER framework. just what we needed."}
  ]
}

With Shadow Attention on, the model recognizes the sarcasm and won't naively recommend you a framework. With it off, the model often misses the tone and provides a list of frameworks to choose from.

##Disabling it

You'd disable Shadow Attention in two cases:

  1. 01.Translation / transcription โ€” you want the model to render exactly what's there, not interpret tone.
  2. 02.Adversarial robustness testing โ€” you want a baseline of how the model behaves without subtext detection.
json
{ "shadow_attention": false }

##Latency cost

Shadow Attention runs in parallel with main attention on the same hardware, so the wall-clock cost is negligible (~5ms per request). It does increase memory bandwidth slightly โ€” on heavily quantized deployments you may see a 2โ€“4% throughput drop.

##Stacking with Mining Depth

Shadow Attention and Deep Context Mining are orthogonal. They run at different stages and capture different signals:

  • โ–ธDCM = better attention to what's in the input
  • โ–ธSAL = attention to what's missing from the input

For maximum reasoning quality, set both to maximum:

json
{
  "model": "gob-5.5",
  "mining_depth": 7,
  "shadow_attention": true
}

(Or just use gob-5.5-deep, which is preset that way.)

##How to tell if it fired

In the response usage object:

json
{
  "usage": {
    "shadow_signals": ["sarcasm:0.78", "implicit_constraint:0.41"]
  }
}

shadow_signals lists the top non-surface signals the layer detected, with confidence scores. Useful for debugging "why did the model interpret it that way?"