# Context-Efficient Backpressure for Coding Agents
Dex · December 9, 2025 · < 5 min read
Here's a pattern we use constantly at HumanLayer: swallow all test/build/lint output and replace it with a single ✓ if the stage passes.
If exitCode != 0, dump the stashed output to stdout. Otherwise, the agent never sees it.
## stay in the smart zone
You should try hard to stay in the ~75k token "smart zone" for claude models - every line of PASS src/utils/helper.test.ts is waste.
A jest/maven/pytest run might dump 200+ lines of output, and then your agent has to parse all of that to find the one failure it actually needs to fix.
Or worse, if all tests are passing, you just threw away 2-3% of your context window for an "all good" result you could have conveyed in less than 10 tokens.
You're wasting context - every token you use is diminishing the results and moves you closer to "need to clear or compact to get back to the smart zone".
And you should forget the token cost and time-it-takes-to-run concerns for now (we'll come back to time in a second), since human time wasted on wrangling an agent in the dumb zone is likely more expensive by 10x or more.
## a pretty dumb fix
Not a full fix, but in the right direction:
Rather than letting the model decide what to truncate, we like to take control of output deterministically.
run_silent() { local description="$1" local command="$2" local tmp_file=$(mktemp) if eval "$command" > "$tmp_file" 2>&1; then printf " ✓ %s\n" "$description" rm -f "$tmp_file" return 0 else local exit_code=$? printf " ✗ %s\n" "$description" cat "$tmp_file" rm -f "$tmp_file" return $exit_code fi }
Invoke it with
run_silent "fe integration tests" "bun run test:integration"
Instead of 200 lines of test output, the agent sees:
✓ Auth tests ✓ Utils tests ✗ API tests FAIL src/api/users.test.ts ● should validate email format Expected: true Received: false
The model doesn't need to decide whether to truncate. You've already made that decision. Success = ✓. Failure = full output.
Now your only job is to convince the model not to do its own truncation. Maybe you can shout at it in your claude.md.
## make it smaller over time
Once you have the basic wrapper working, iterate:
Enable failFast. pytest -x, jest --bail, go test -failfast. One failure at a time. Don't make the agent context-switch between five different bugs, or re-read the same "tests 2-5 are failing" output when its still trying to fix test #1.
Filter the output. Strip generic stack frames that don't help your agent find the issue. Remove timing info. grep/sed/awk/cut/etc your way to just the assertion that failed.
Framework-specific parsing. even a pretty-hacky run_silent.sh extracts test counts from pytest, jest, go test, and vitest so you still get visibility without the noise.
We use this pattern heavily when working with customers who have Maven and Gradle projects (notoriously verbose), and it works equally well for xcodebuild, cargo, and anything else that spews.
Here's the humanlayer/humanlayer checks and tests for a monorepo - we run these in a pre-push hook, and some subset of these projects for each plan phase.

This would easily be half a context window worth of output without the wrappers.
## im tired of context-anxious models
The latest generation of models has spent so much time in the RL dungeon that they've overcorrected way too far on being conservative with how much context they load.
here's some patterns I've seen in the last few months:
#### output swallowing
piping output to /dev/null and using an || operator to just print a message based on the commands exit code

while its cool that we're being conservative, remember what cat would have printed without the output swallowing:

In this trivial example, the guardrails actually use MORE TOKENS:

Disclaimers - 1) yes if cat had repeated the whole filename, that would have been more tokens. 2) this is an unofficial tokenizer and probably out of date
but you get my point here - the real nasty one comes next -
#### piping to head / tail
I hear so many complaints about "oh it ran a 5 minute test suite with head -n 50 and now it has to run it again"
venv/bin/python -m pytest -n 4 | head -n 50
If your tests take less than 30 seconds, great you probably don't care, but that's not the case for most of us. And again, we're gonna focus on human time wasted while the agent runs and re-runs test suites, or wasted on compacting/restarting a context window because too many tokens got used.
Ironically, in trying to conserve context and serve us better, models end up burning more tokens, more human time, and more mental energy.
I won't opine on why the product teams that focus on model behavior have leaned into this behavior, but I can only assume its at least in-broad-strokes intentional. It helps models perform well in codebases that are not thoughtful about context-efficiency of their backpressure mechanisms
## its not your fault
Okay fine, opining time: things are this way because labs have to take big swings and the majority of their potential user base doesn't know how (or want to learn how) to do context engineering well. Sure, "don't make me think". But also, let me think if I want to.
As we always say - deterministic is better than non-deterministic. If you already know what matters, don't leave it to a model to churn through 1000s of junk tokens to decide.
