Harness Engineering is Bullshit

Alex Barnett

CEO

Insight

(And so was context engineering. And so was prompt engineering. Kind of.)

If you've worked with AI seriously for the last two years, you already do harness engineering.

You've always done it... you just didn't know you were supposed to charge more for it.

The Treadmill

2022-2023. Prompt Engineering.

Getting good outputs meant learning how to talk to the model: the right framing, the right persona, chain-of-thought, few-shot examples, etc. It worked, and it was a real skill.

...Then the consultants arrived.

I spent a year competing against them, watching clients hand budgets to people who'd watched a few YouTube videos, rebranded themselves "Prompt Engineers," then delivered a Google Doc full of copy-paste templates and called it a strategy. The clients didn't know what the consultants didn't know, the consultants got paid, and the results were garbage. That's not cynicism. That's what happened, repeatedly, to real teams trying to move fast on real problems.

Meanwhile, the people actually building things (pipelines, retrieval layers, feedback loops) were losing deals to people who'd mastered the vocabulary without a single implementation behind them. On Indeed, searches for "prompt engineer" roles peaked in April 2023, and by 2025 the title ranked second to last among roles companies were actually planning to hire for.

The market figured it out. Eventually.

2024-2025. Context Engineering.

Anyone actually building with these models figured out quickly that the prompt isn't a sentence. It's a pipeline. You're dynamically retrieving documents, managing conversation history, summarizing long threads before the context window blows out, formatting tool outputs so the model doesn't choke on noise. Andrej Karpathy named this "context engineering," Tobi Lutke repeated it, and it stuck: the discipline of managing what goes into the model's context window at each step.

This was a genuine improvement in vocabulary. It acknowledged that the real work was information architecture, not wordsmithing.

But here's what anyone building seriously already knew: you cannot do any of that without building the surrounding system. The retrieval pipeline, the memory store, the tool schemas, the feedback loops are not separate from context engineering. They are the implementation of it. If you weren't building that infrastructure, you were doing fancy prompt engineering with extra steps.

February 2026. Harness Engineering.

On February 5, 2026, Mitchell Hashimoto (co-founder of HashiCorp, creator of Terraform) published a personal blog post about his AI adoption journey and named step five "Engineer the Harness." He even hedged it himself: "I don't know if there is a broad industry-accepted term for this yet, but I've grown to calling this 'harness engineering.' I don't need to invent any new terms here; if another one exists, I'll jump on the bandwagon."

Within days, OpenAI published "Harness engineering: leveraging Codex in an agent-first world," Ethan Mollick reorganized his entire AI guide framework around the concept, and Martin Fowler wrote an analysis where he noted the OpenAI article "only mentions 'harness' once in the text. Maybe the term was an afterthought inspired by Mitchell Hashimoto's recent blog post."

One personal blog post became industry vocabulary in under 30 days.

...And it describes, almost exactly, what anyone actually doing context engineering was already doing.

Think of it as an evolution of control: if prompts are wordsmithing and context is a dynamic template, the harness is the infrastructure that makes those variables predictable and deterministic. It’s the difference between asking an agent to "do a task" and building the literal rails that keep it from flying off the tracks.

The Insight That Never Changed

“The model is only as good as the system around it.”

That was true in 2022 when your prompt was a static string, true in 2024 when your context was a dynamic pipeline, and it's true now when your harness is the full operational environment. The vocabulary changed. The insight hasn’t.

Anyone who internalized this early ended up building all three layers simultaneously, because:

You can't build a useful retrieval pipeline without thinking about how tool failures get surfaced back to the model.
You can't manage context without thinking about what happens when the agent goes off the happy path.

These problems are entangled, and the people doing good work always solve them together.

The industry keeps slicing the same unified problem into components and naming the slices. Which would be fine, except each naming event comes packaged with a fresh cohort of "experts" who learned the vocabulary without doing the work.

So Why Does the Term Matter At All?

Here's where I'll give the terminology some credit, because there is one place it genuinely earns it.

When a company scales, you eventually need to separate the person thinking about token density and retrieval quality from the person thinking about infrastructure reliability and API latency. Those are different skill sets, they recruit differently, and "harness engineering" gives that hiring conversation a frame it didn't have before. That's real value.

...Just not the kind that gets written up in a manifesto.

The twelve rules of harness engineering aren't revolutionary insights. They're a checklist for explaining to a VP why the agent keeps breaking in production, documenting what senior engineers already carried as intuition.

Useful? Sure. Groundbreaking? Only if you weren't already there.

The Question Nobody Is Asking

While the industry debates what to call this practice, there's a more interesting question sitting underneath it: who actually builds the harness?

Think about what happened when SaaS took off. Companies didn't build their own CRM infrastructure, they subscribed to Salesforce. They didn't build their own CI/CD pipelines from scratch, they used GitHub Actions. The underlying practice didn't change, but the business model around it did. Suddenly the capability was rented, not built, and teams could focus on the work that actually differentiated them.

The same shift is starting to happen with agent infrastructure. Most companies don't need to engineer their own harness. They need to subscribe to one. Managed orchestration, memory, guardrails, tool access, operational intelligence, all of it as a service rather than a bespoke build every time.

I've been calling this "SaaS for AI" since I left corporate a few years ago. Turns out that's exactly what it is: Harness-as-a-Service. The model is the commodity. The harness is the product. And that shift, from harness engineering as a practice to HaaS as a business model, is the actual interesting development that is happening right now.

Yet, almost nobody is talking about it.

HaaS and our Signal Engine

We've been in this world since GPT-3.5, long enough to watch prompt engineering rise and get slimed, context engineering arrive and immediately get rebranded, and harness engineering coined in a personal blog post and reach industry consensus inside a month.

Our philosophy has stayed consistent through all of it: the model is only as good as the system around it.

Signal Engine is built on that premise. It's not a dashboard or a reporting tool. It's a harness, specifically the operational intelligence layer that sits between your support data and the agents acting on it. It surfaces anomalies before they become crises, quantifies the signals your agents need to make good decisions, and plugs into whatever you're already running.

If you're building agentic workflows, the Signal Engine works as a subagent: feeding structured, statistically validated operational context back into your system in real time, so your agents don't have to go looking for the signal.

The signal finds them.

That's the HaaS model in practice. You don't build the harness. You subscribe to it.

Just as the vocabulary will keep changing, and the underlying problem won't:

“The signal was always there. We built the engine to surface it.”

Share on social media

Tools

Claude Doesn't Know When It's Talking to Itself

Tools

Beyond CSAT and NPS: Uncovering Real Risk in Support Data

Featured