Three Circles, Not Three Steps: Where Prompt, Context, and Harness Overlap
Abstract
The dominant story about AI engineering has three chapters: first we learned prompt engineering, then context engineering, now harness engineering. The story is told as a progression — each era building on the previous, or in some tellings replacing it. The frame is useful for explaining what changed and when. It is less useful as a guide for design.
Many consequential decisions in agent systems do not belong to a single discipline. The text in your CLAUDE.md is simultaneously a prompt (the model reads it), a context decision (it occupies window space), and a harness affordance (the runtime injects it on every turn). A tool description is the same. A skill is the same. Linear framings have no native vocabulary for simultaneous co-application, so they push you to classify when you should be weighing trade-offs. This post argues for a Venn-diagram framing: three overlapping perspectives on the same engineering reality, with much of the consequential design work in the center.
The Three-Phase Story
By spring 2026 the framing has settled. Ryan Lopopolo’s OpenAI post on harness engineering appeared in February. LangChain followed with Vivek Trivedy’s “The Anatomy of an Agent Harness” in March. Secondary writers turned this into a tidy historical arc: prompt engineering in 2022–2024, context engineering in 2025, harness engineering in 2026. Walden Yan at Cognition had been writing about context engineering as the “extension and elevation” of prompt engineering since mid-2025; by early 2026 the pattern hardened into a three-era story.
The careful versions are explicit that each era adds, it does not replace. Jia Chen at Softmax Data writes that “each era did not replace the previous one. They stacked.” Trivedy frames the relationship as nested: “harnesses today are largely delivery mechanisms for good context engineering.” Yan calls context engineering “developmental rather than replacement-based.” Hung-yi Lee at NTU draws the relationship as a layered diagram in his spring 2026 lecture: an LLM, plus a prompt, plus context, plus a harness.
So far so good. But the verbs matter, and the verbs diverge between careful framings and casual ones:
| Framing | Verb | Implication |
|---|---|---|
| LangChain (Trivedy) | “delivery mechanism for” | Layered, nested |
| Cognition (Yan) | “extension and elevation of” | Developmental, additive |
| Softmax Data (Chen) | “stacked” | Additive, all layers still active |
| Epsilla — headline | ”replaced” | Substitution |
| Epsilla — body | ”subsumes the previous two” | Absorption |
| Tech-media digests | ”evolved past” / “moved beyond” | Substitution |
The careful sources use verbs that preserve the lower layers. The split inside the Epsilla row is the more honest picture of the landscape: the same publication’s headline and body argue different things, because framing varies by venue and format more than by author. Headlines like “Why Harness Engineering Replaced Prompting in 2026” reach the body where the actual argument is that harness “subsumes” the previous two — a different claim — but the headline is the thing that travels.
This is not a complaint about specific authors. The structural issue is that any linear framing, even a careful “stacked” one, encourages a reader to find an order and assume the latest step contains the most engineering value. Both readings — replacement and stacking — share an arrow shape. Both put the disciplines in sequence. Neither names the place where much of the real design work happens.
It is worth pausing on the arrow itself. The arrow framing is not wrong about the field’s history. Each era’s primary concern did shift: from instructions, to information curation, to execution environment. What the arrow gets wrong is the implicit corollary — that the primary concern of a new era becomes the only concern of that era. In practice every era inherits all previous concerns. The arrow is a story about which load-bearing concern was added last. The Venn diagram below is a snapshot of what is load-bearing right now.
Where the Linear Frame Breaks
Try classifying each of these as belonging to exactly one discipline:
- The instruction block at the top of a skill markdown file
- The tool description that ships in an MCP server’s tool schema
- A hook script that reads a file and injects its contents on every turn
- A
CLAUDE.mdthat defines memory autoload rules - A compaction strategy that decides which parts of the conversation survive when the window fills up
Each one is simultaneously a prompt (the model reads it as instructions), a context decision (it occupies window space and competes with other content), and a harness mechanism (the runtime is what executes the injection, enforces the rule, decides when compaction fires).
Ask “is this prompt engineering or harness engineering?” and you immediately reach a forced choice that doesn’t match the work. The honest answer is all three at once. The linear frame has no native vocabulary for all three at once, so a reader applying it gets pushed to commit to a primary lens and treat the others as secondary.
The consequence is not abstract. A reader who absorbs the “harness replaces prompts” reading can easily come away writing a thin CLAUDE.md that assumes the harness will paper over vague instructions; a reader who absorbs “harness builds on prompts” can write careful instructions but treat context curation as someone else’s problem. Both readings miss that the three concerns are not separable in the artifacts they are actually editing.
Three Circles, Not Three Steps
Replace the arrow with three overlapping circles.
Each circle has a pure region — work that only one lens explains well. The pairwise intersections name the artifacts that two lenses both apply to. The center, where all three overlap, is where much of the consequential design work in modern agent systems actually lives.
Pure regions are not residual. Choosing which LLM to use, designing a RAG retrieval pipeline, setting a budget for background subagents, deciding an output format spec — these can all be consequential decisions that sit cleanly in one circle. The Venn frame doesn’t claim everything important is in the center. It claims that the artifacts you most often edit by hand in 2026 — skill files, tool descriptions, CLAUDE.md-class instructions — tend to live near the center, and the linear frame is most misleading there.
A second clarification: the difference between “stacked” and “Venn” is not where decisions live. Both frames acknowledge cross-layer effects. The difference is what a reader does when faced with a decision. Stacked encourages the question “which is the lowest layer this touches?”. Venn encourages the question “which lenses are pulling, and how do they trade?”. The shift is from a localization question to a weighing question. The same decision can land at Option B under both frames; the path to the answer is different.
| Region | Examples | What is happening |
|---|---|---|
| Pure Prompt | Few-shot examples, phrasing tweaks, output format spec | Single-inference quality |
| Pure Context | RAG retrieval, memory selection, tool definitions visible in window | What information the model sees |
| Pure Harness | Background task execution, file state policy, subagent spawning | Execution environment |
| P ∩ C | System prompts, tool descriptions, skill markdown | Text doubles as prompt and occupies window |
| P ∩ H | CLAUDE.md, MCP tool descriptions, hook injection text | Prompt content enforced by harness policy |
| C ∩ H | Memory autoload, subagent context handoff, compaction strategy | What information loads, and when and how it loads |
| Center (P ∩ C ∩ H) | Skill design, tool design, whole-agent UX | All three lenses active simultaneously |
A few examples are worth dwelling on.
A skill markdown file is the canonical center artifact. Its body is a prompt the model reads when the skill activates. Every byte of it occupies context window. The harness decides when to load it, how to surface its frontmatter, whether to refresh it after compaction. Designing a skill well means thinking about all three at once. Treating it as “just a prompt” produces bloat. Treating it as “just context” produces underspecified instructions. Treating it as “just a harness configuration” produces something the model ignores.
The same is true of MCP tool descriptions. The text is what the model reads to decide whether to call the tool. It is also what the model carries in context for the rest of the session whether or not it ever uses the tool. The harness is what wires the description into the conversation in the first place. A tool description with great prose but poor context economy will get crowded out as the conversation lengthens. A tool description that is perfectly typed but unclear in plain language will go unused. The good ones balance all three lenses.
A Decision in the Center
A concrete example from a recent project. The task was to extract burned-in subtitles from a low-resolution video. The pipeline used OCR (EasyOCR with traditional Chinese) plus a substitution dictionary that corrected common OCR errors specific to this video’s font and lighting — perhaps two dozen entries like 噩 → 靈, 雽 → 量, 雉 → 難.
Where should that substitution dictionary live?
Option A: A rules file inside the skill, written in markdown, included as part of the skill’s instruction surface. The model reads it when the skill activates.
Option B: A Python dictionary inside the MCP server (_OCR_FIX = {…}), applied automatically by the tool implementation. The model never sees it.
Each engineering lens pulls a different direction:
| Lens | Option A (markdown) | Option B (MCP code) |
|---|---|---|
| Prompt | The model sees the dictionary as instructions; could help reasoning about edge cases | Hidden from the model; but if the MCP applies it automatically, there is nothing the model needs to know |
| Context | The dictionary occupies window space; it drifts across compaction and is rebuilt every fresh session | Near-zero context cost for the substitution rules themselves; the tool description still occupies context as it would either way |
| Harness | No enforcement; the model can ignore the rule under pressure | When the tool runs, the substitution is applied regardless of the model’s reasoning |
Option B wins. But notice that no single lens decides this. The prompt lens is genuinely ambivalent — making the substitution visible could help in unusual cases. The context lens leans hard toward B. The harness lens leans hard toward B for reliability. The decision is made by weighing all three lenses, not by classifying the work as “this is a harness concern.”
This is a pattern, not the only pattern. Many consequential design choices in agent systems are multi-lens evaluations, and the Venn frame names the weighing that is happening. The linear frame tends to push a reader to pick a discipline and then notice afterwards that the other two are pulling at the answer; the Venn frame starts from the weighing question. For decisions that genuinely sit in a pure region — model selection, retrieval design, budget policy — the linear frame is fine, and the Venn frame doesn’t add much.
What This Changes
For engineers building agent systems, the practical shift is small but consequential. Stop asking “is this prompt engineering or harness engineering?” Start asking “which lenses apply here, and how do they trade against each other?” Sketch which intersection a decision lives in before committing to an approach. Notice when you are applying the wrong lens — for instance, trying to fix a context-overflow problem by rewriting the prompt, or trying to fix a vague instruction by adding another hook.
For educators and popularizers, the linear framing is fine for onboarding. “Here is the history” is a real service, and the three-era story is true at the level of what people were primarily worrying about in each year. The problem is that the same framing, applied to design guidance for working engineers, can mislead. When a reader reaches the point of building real agent systems, the answer to “where do I put X” is often not a single discipline. A non-linear framing — Venn, or any other shape that names the overlap — fits that subset of the work better.
For tooling, this is why the affordances cross lenses. CLAUDE.md is prompt, context, and harness simultaneously. Claude Code’s hook system is harness machinery that injects prompt content drawn from context decisions. The Agent = Model + Harness formula popularized by LangChain’s “Anatomy of an Agent Harness” treats harness as a single layer, but a working harness is itself three overlapping concerns — what to say, what to load, what to enforce — that need to be designed together rather than in sequence. Hermes Agent, positioned as a successor to earlier wrappers, makes the same bet at the implementation layer.
A reader who has built harness systems at scale will sometimes object that this picture overweights the center: most of the day-to-day work in an industrial harness is single-lens (background-task budgets, retrieval pipelines, file-state policies). They are right about volume; the Venn frame is meant for the design moments where lenses overlap, not for the operational moments where they don’t.
Coda: Two Axes, Same Failure Mode
This is a companion to an earlier post that argued for one axis: AI engineering is moving from implicit prose contracts to explicit protocol contracts. CLI --help text and skill markdown describe behavior in prose; MCP servers describe it in typed schemas. The direction (implicit → explicit) is irreversible, even if the specific protocol changes.
This post argues a different axis. Linear framings — prompt → context → harness — describe a one-dimensional progression. The work itself is at least two-dimensional. Many of the decisions a working engineer faces live in intersections, not on a number line.
The reason both posts belong together is that they target the same failure mode from different sides. Popular framings of AI engineering look complete but leave consequential decisions unnamed. The earlier post named one form of incompleteness: when a contract isn’t written explicitly, it gets written in prose, and prose is the wrong substrate for a contract. This post names another: when a decision gets assigned to a single discipline, you stop noticing the other disciplines that are also pulling. Different forms of incompleteness, both fixed by adding a dimension the popular framing didn’t have.
This post emerged from designing a video-subtitle pipeline and realizing that the question “should this rule live in the skill markdown or the MCP code” could not be answered without invoking prompt engineering, context engineering, and harness engineering at the same time. The linear progression frame could not name where the decision lived. The Venn frame could.