Building Agent Stomp: Exploring Harness Engineering
I started building Agent Stomp (agentstomp.ai) as an experiment in Harness Engineering: the emerging idea that we can build systems around AI agents to plan, execute, test, review, and improve software with less direct human hand-holding. That we can 0 →1 a product without having to hit ‘yes’ to Claude Code every 30 seconds
There are a number of AI Eval apps out there currently. To really understand how to crack the problem, you have to start building. This build explore what the developer workflow looks like when agents take the wheel more and more.
I started with Claude Code. That was the fastest way to get from rough idea to working application shape. Claude helped frame the product, think through use cases, outline the architecture, and turn loose ideas into something buildable.
From there, I experimented with the Claude Agents SDK. I still liked the direction, but I didn’t get too far. It just felt like I was building a crappy version of Claude Code rather than creating something meaningfully differentiated.
So I tried to move ‘up’ a bit, see earlier post (https://dysprosium.ai/blog/harness-engineering)
The next step was to base off the LangChain Open SWE:
https://github.com/langchain-ai/open-swe
This moves closer to the ‘team in a box’ concept..
3 Repositories involved
There are 3 Repositories interacting to get the job done
build-team → task/source-of-truth and contains all team-member roles and assignments. Separate workflow from product
open-swe-dysprosium-harness → a fork of open-swe. This is the execution engine.
agent-quality-helper - Agent Stomp Repo itself. PRD lives in here, code running the build is NOT
Getting Shit done
Open a new Github issue, simply the engine, it builds.
@open-swe-dysprosium-harness
Please complete this task for Agentstomp.
build-team/tasks/backlog/001-document-repo-health-and-test-commands.md
The build-team kicks in to divide and conquer the work and put up a PR.
After the PR is merged, it gets deployed to Vercel.
Setup Quirk
The main product repo agent-quality-helper actually contains a hard copy of the build-team and some instructions on what to do with it. Since the open-swe-dysprosium-harness does a full checkout of the agent-quality-helper repo, the team has to be embedded in there (and wont checkout sub-repositories). In the future I’ll make it more elegant, with the harness knowing about the team directly.
The Upside
The upside is speed.
You can frame out a product, explore the business, build a shell, generate a backlog, and start implementation in a shockingly short amount of time.
The website itself took less than an hour to make with Lovable, which still feels a little ridiculous.
Claude and similar tools are also useful well beyond code. They can help think through positioning, market gaps, architecture, product workflows, and whether there is actually a business here.
Agent Stomp is in a crowded space. There are already plenty of eval products, agent observability tools, testing tools, and AI development platforms..
The Downside
The downside is cognitive load.
You can now create product complexity much faster than you can understand it.
That sounds like a good problem, but it can become a real one. It is very easy to generate too much: too many features, too many half-formed workflows, conflicting workflows, immediate abandonware, too many pages that do not agree with each other.
The implementation bottleneck starts to disappear, but the judgment bottleneck becomes more important.
The hard question is no longer just:
Can this be built?
It is:
Should this be built, and does it create real business value?
AI cant solve that one… yet
Where This Goes Next
Agent Stomp started as a hands-on exploration of Harness Engineering.
The path so far has been:
Start with Claude Code to shape the concept.
Try the Claude Agents SDK.
Realize I was not getting enough differentiation there yet.
Shift toward LangChain Open SWE.
There is a full layer of larger harnesses out there, that feels like the next step.