Harness Engineering for AI-Assisted Rails Teams

The Operational Challenge

AI coding agents are shipping code faster than most engineering organizations can absorb it. For teams running production Rails applications, this poses an operational risk: AI throughput is increasing, but human review capacity is not. The bottleneck is no longer writing code; it’s verifying it.

Without deliberate system design for AI-assisted workflows, teams face a familiar failure mode: more output, same headcount, declining confidence in quality. The math is simple: if your senior engineers spend their time reviewing AI-generated pull requests line by line, you’ve traded one bottleneck for another. The teams that will win are the ones that design systems to make AI output trustworthy by default, not the ones that review it all manually.

What follows is a practitioner’s account of that transition. It goes from using AI as a faster keyboard to building the engineering systems that make AI output reliable at scale.

The Progression

Until very recently, I was treating AI like a junior developer: giving it clearly scoped tasks, reviewing the code, and handling the architecture, judgement and integration myself. Although my instincts told me it could be pushed further, there was a prevalent lack of trust that kept me from stepping away more and letting the AI handle more complex tasks.

This changed with the release of Opus 4.6 and Codex 5.3. The new models made it evident that although I was achieving results and getting stuff done faster, I had been hitting the nail with the handle of the hammer.

After February 5th, suddenly everyone on my team was sharing setups and tips to make better use of the tools we had available. My interest grew, and I started trying out things with an increasing feeling of curiosity, urgency, even. Because the whole team runs the same stack, Rails, experimenting turned out to be very “organic”. I could share my AI configurations (ex., CLAUDE.md) with my teammates, and they’d just work. Same patterns, same good practices across all our projects. We had a common ground for testing things out and making results reasonably comparable.

Then I came across Shapiro’s framework that describes the 5 levels of AI-Assisted Engineering. I liked it because it provides a clear conceptual path to understanding where AI-Assisted Engineering is going and what I should be doing to move in that direction, and do it before I realize the water has been rising around me and is now at my chest.

In this post, I’ll share a few simple tips to level up your use of AI in Software Development from “Spicy Autocompletion” (Level 0) to Agentic Engineering (Level 3). You might be wondering: why stop at level 3 and not go all the way to level 5? The reason is that in this range (0–3), most of what it takes to effectively use the AI is a matter of tools, setups and only a small amount of trust. To go beyond, you need a fundamentally different approach to your role. As Shapiro eloquently puts it, “most developers hit that ceiling at level three because they are struggling with the psychological difficulty of letting go of the code.”

At Telos, we think of this transition in three operational shifts: Tooling Upgrade (getting the right agent in place), Configuration Memory (teaching the agent how your team works), and Harness Engineering (building the systems that make agent output trustworthy). My progression helps explain these three shifts.

Shift 1: Tooling Upgrade — Install an AI Agent

From Spicy Autocompletion to Intern

Level Zero is asking ChatGPT to write your SQL queries, or GitHub Copilot to write your functions. You chat with the AI in the browser or maybe inside your IDE, you ask for its recommendation about this or that pattern, or you speed up your technical research with an LLM. You’re still coding everything yourself, just doing fewer keystrokes.

Most AI-skepticals dislike AI code output because they use the free versions of the models. Unless you’re running an open-source model, if you’re not paying to access cutting-edge models, you are probably ~6 months behind in terms of LLM capabilities. In the AI era, that’s like complaining about a 2013 BlackBerry’s camera not making decent pictures in 2026.

If you’re here, you need to install an AI Agent right now and get a paid subscription just to have a glimpse of what it is capable of. Start by giving it small, scoped tasks like “fix this bug”, “refactor this file”, or “fix this failing test”. You’ll see that those few bucks are the best investment you will have made this month.

Shift 2: Configuration Memory — Fine-Tune Your Agent Config

From Intern to Junior Developer

Another reason why some engineers dislike AI’s code is that they don’t customize their configurations enough to make the LLMs produce code that fits their, or their company’s, taste and patterns. If you’re using Claude Code, you can easily achieve this with a proper CLAUDE.md and settings.json. Treat these configurations like a living memory of you and your team’s Agentic-Engineering enablement. When the agent makes a mistake, tell it to update its configuration so it doesn’t make the same mistake again.

The next step is setting up your own skills (or slash commands in Claude Code) to extend your agent’s capabilities, automate workflows, and streamline your development process with custom commands. Here are some of my personal favorites:

/challenge

This is an adversarial mid-development review. You’re in the middle of an implementation and ask a ruthless software developer to try to break your code before opening a PR.

/review-plan

Claude comes with a built-in /plan command that tells it to enter plan mode before starting an implementation. When the plan is ready, I use this command to invoke a Staff Engineer Agent to review it.

/ship

Stage, commit, push, and open a PR in one shot with an auto-generated description following my company’s standards.

/learned

Appends a lesson to CLAUDE.md so mistakes don’t repeat.

Shift 3: Harness Engineering — Stop Reviewing Code, Build Systems Instead

From Junior to Senior Developer and beyond

One thing that blew my mind in this journey towards Agentic Engineering is the concept of Harness Engineering. Imagine someone invents a super fertilizer that makes your plants grow insanely fast. Like seedling-to-oak-in-six-hours fast. If you want a beautiful garden and not a chaotic, impenetrable jungle, you need a structure to make the plants grow the way you want them to grow. This requires taste, judgment, and discipline. That’s exactly what Harness Engineering is.

Having several sessions working in parallel worktrees is a massive productivity boost (i.e., your super fertilizer), and it might make you think you’re mastering “Agentic Engineering” (Level 3) if you’re there. But this won’t scale if you want to keep human-reviewing everything. Without a harness, this either creates a bottleneck, because agents are producing more code than humans can review, or you just semi-blindly trust the output, and your CI, and pray for it not to break.

Harness Engineering also requires that psychological detachment from the code I was talking about earlier: you stop reviewing code and start enabling the agents to plan, review, and test their own work. It also requires being less of a Software Developer and more of a Systems Thinker. You’re now focused on workflows, systems, scaffolding, and leveraging.

What a Harness Actually Includes

Concretely, a production harness is a set of interlocking constraints that make an agent's output trustworthy without requiring line-by-line human review:

Test-first gating: The good old days of TDD are back. Agents write tests before implementation. No green suite, no PR.
Parallel worktree isolation: Each agent session operates in its own worktree. No cross-contamination.
Agent plan review agents: A second agent reviews the first agent’s plan before execution begins.
CI enforcement rules: Automated gates that reject output failing linting, type checks, or coverage thresholds. You don’t want to clutter the agent’s context with rules about style, coverage, and formatting, so mechanical enforcement is the key here. At Telos, we use agent-generated custom linters with machine-readable error messages, creating a loop that makes CI pass in fewer rounds.
Structured PR templates: Agents generate PRs that follow your team’s format, making review faster when humans do look.
Memory accumulation (CLAUDE.md): Every mistake becomes a rule. The harness gets smarter over time.

What Happens Without a Harness

Teams that skip harness engineering and scale AI output through sheer throughput tend to hit the same failure modes:

CI becomes the only guardrail. And CI was never designed to catch architectural drift or subtle logic errors.
Human review becomes the bottleneck. Three agents can produce PRs faster than two seniors can review them.
Agent parallelism creates merge chaos. Without worktree isolation, concurrent agent sessions conflict and corrupt each other’s work.
Code quality becomes probabilistic. You ship what passes CI and hope for the best. That’s not engineering.

So What’s Next?

You might be thinking, “hey but this is way less concrete than the other two sections of this post.” And you’re right. I actually blurred the line between levels three and four on purpose. Shapiro says level three is where most developers think everything gets worse, not better. I say level three is where programmers decide if they’ll be on the side of “AI is gonna take my job” or “AI is transforming my role.”

For me, Harness Engineering is one of the subsystems where this transformation takes place. You have a superfertilizer in your hands: you ought to become a systems gardener.

The Strategic Implication

These are uncertain times. They’re also refreshing and exciting for those of us willing to get our hands dirty.

The teams that win will not be the ones who code fastest. They will be the ones who design the best AI systems—the harnesses, the memory, the workflows that turn raw model capability into reliable engineering output. The tools are here. The question is whether your organization is building the systems to use them.

From Spicy Autocompletion to Agentic Engineering

The Operational Challenge

The Progression

Shift 1: Tooling Upgrade — Install an AI Agent

From Spicy Autocompletion to Intern

Shift 2: Configuration Memory — Fine-Tune Your Agent Config

From Intern to Junior Developer

/challenge

/review-plan

/ship

/learned

Shift 3: Harness Engineering — Stop Reviewing Code, Build Systems Instead

From Junior to Senior Developer and beyond

What a Harness Actually Includes

What Happens Without a Harness

So What’s Next?

The Strategic Implication

READY FOR
YOUR UPCOMING VENTURE?

We are.
Let's start a conversation.

Our latest
news & insights

The Design Process Was Built for a World That No Longer Exists.

From Access to Resilience: The Shift Fintech Leaders Can Not Ignore

From Spicy Autocompletion to Agentic Engineering

The Operational Challenge

The Progression

Shift 1: Tooling Upgrade — Install an AI Agent

From Spicy Autocompletion to Intern

Shift 2: Configuration Memory — Fine-Tune Your Agent Config

From Intern to Junior Developer

/challenge

/review-plan

/ship

/learned

Shift 3: Harness Engineering — Stop Reviewing Code, Build Systems Instead

From Junior to Senior Developer and beyond

What a Harness Actually Includes

What Happens Without a Harness

So What’s Next?

The Strategic Implication

READY FOR YOUR UPCOMING VENTURE?

We are. Let's start a conversation.

Our latestnews & insights

The Design Process Was Built for a World That No Longer Exists.

From Access to Resilience: The Shift Fintech Leaders Can Not Ignore

Want blog updates?Join our newsletter

READY FOR
YOUR UPCOMING VENTURE?

We are.
Let's start a conversation.

Our latest
news & insights

Want blog updates?
Join our newsletter