AI Coding Agents Need Specs, Not Main Character Energy

Agent constrained by state machine meme GIF

AI coding agents are having a little identity crisis.

For a while the sales pitch was: give the agent a vague prompt, let it freestyle, watch it become a 10x engineer with a hoodie and no sleep schedule.

Cute. Also how you get a repo full of haunted half-features, mystery side effects, and that one file nobody wants to open because the agent “refactored” it during a confidence episode.

The newer signal is sharper: agents do not need more main character energy. They need specs, states, skills, and receipts.

That is not anti-AI. That is pro-shipping.

The week’s signal: rails are eating vibes

A few fresh signals are all saying the same thing from different angles:

GitHub Spec Kit frames spec-driven development as a workflow where specifications become executable artifacts instead of disposable planning docs.
Statewright puts agents inside state machines, controlling which tools are allowed in each phase instead of giving every model every button all the time.
Personal AI Infrastructure / Life OS-style systems push toward identity, goals, workflow dashboards, memory, and privacy zones instead of one-off chatbot chaos.
Scientific Agent Skills shows the skill-pack direction: agents get better when domain workflows are packaged into repeatable capabilities, not hidden in one monster prompt.
The broader developer chatter is moving from “vibe coding is magic” to “deterministic agentic engineering is how teams stop generating slop.” Good. Finally.

The through-line is simple: the agent is not the process. The process is the product.

Vibe coding hits a ceiling fast

Vibe coding works best when the stakes are low:

prototype a landing page
spike a tiny script
mock a UI
generate a throwaway demo
explore an unfamiliar API

That stuff is useful. Nobody needs to pretend it is worthless.

But once you need repeatable outcomes, vague prompts start charging interest.

Agents fail in boring ways:

they implement before requirements are stable
they touch files outside the intended scope
they forget constraints from the first half of the task
they pass tests that do not prove the feature works
they summarize success instead of producing evidence
they keep retrying the same bad path because nobody built a gate

That is not always a model-quality problem. Sometimes the workflow is just a damn bouncy castle.

Spec-first is not paperwork. It is compression.

People hear “spec” and picture enterprise sludge: 19 meetings, one PDF, zero software.

That is not the useful version.

The useful version is a tight artifact that answers:

What are we building?
Who is it for?
What is explicitly out of scope?
What must be true when it is done?
What tests or checks prove it?

That compresses the task for the agent. Instead of burning context rediscovering the goal every three tool calls, the agent has a target.

Spec Kit’s interesting idea is not “write more docs.” It is turn the spec into the coordination layer: constitution, specification, plan, tasks, implementation, verification.

That is the good boring stuff. The kind that keeps a coding agent from sprinting into a wall while yelling “LGTM.”

State machines make agents less feral

Statewright’s core line is basically: agents are suggestions, states are laws.

That lands because it matches the real failure mode. If an agent can use every tool in every phase, it will eventually do something spicy at the worst possible time.

A sane workflow looks more like this:

intake → clarify → spec → plan → implement → verify → review → ship

Each state should unlock only the tools that make sense.

During spec, read files and ask questions. No random edits.
During plan, inspect architecture and write tasks. No deploys.
During implement, edit the scoped files. No destructive shell nonsense.
During verify, run the actual checks. No “trust me bro” summaries.
During ship, publish only after evidence exists.

This is how normal software already works. Agents do not get to skip it just because they can write a persuasive paragraph.

Skills beat giant prompts

The other trend is skill packaging.

A giant prompt says: “be good at science, finance, writing, coding, UI, security, and also remember our house style.”

A skill says: “when doing this class of work, follow this workflow, use these checks, produce this artifact.”

That is cleaner. More testable. Easier to update. Less cursed.

Scientific Agent Skills is a useful signal because it points at a future where agents are not generic blobs. They are composed from domain workflows:

literature review skill
molecule analysis skill
code review skill
launch checklist skill
security remediation skill
content brief skill

The model still reasons. But the workflow gives it a lane.

Personal AI systems need the same discipline

The Life OS / Personal AI Infrastructure wave is not just another “chat with your files” thing.

The interesting bit is structure:

identity and preferences
goals and ideal state
dashboards
memory
workflows
privacy boundaries
hooks and routines

That matters because personal agents fail when they are just vibes with calendar access.

If an assistant knows your goals but has no workflow, it becomes a motivational poster with tools. If it has tools but no privacy zones, it becomes a liability. If it has memory but no dashboard or state, it becomes a junk drawer with a voice.

The win is not “AI that knows everything about you.” The win is AI that knows what state you are trying to move from, what state you are trying to reach, and what actions are allowed on the way.

The practical builder stack

If you are building with coding agents in 2026, the stack should look something like this:

1. A constitution

Short rules that do not change every task:

code quality standards
testing expectations
security boundaries
UX principles
deployment rules
“never claim done without evidence”

This is the project’s spine.

2. A spec

The feature-level artifact:

user story
success criteria
non-goals
edge cases
acceptance tests
risks

If the agent cannot restate the spec, it should not touch the code.

3. A plan

Not a novel. A real plan:

files likely involved
ordered tasks
verification gates
rollback path
unknowns

Plans are cheap. Debugging agent spaghetti is not.

4. State-gated tools

Do not give every tool to every phase.

Read-only during discovery. Scoped edits during implementation. Test commands during verification. Publishing only after checks pass.

Yes, this is less magical. That is the point.

5. Receipts

Every completed agent task should include:

what changed
what checks ran
what failed and was fixed
what remains risky
where the human should look

No receipts, no ship.

The Gen Z translation

Vibe coding is the friend who says “trust me” and then shows up with six uncommitted files, one broken route, and a variable named finalFinal2.

Spec-driven agents are the friend who brings the plan, the checklist, the test output, and the screenshot.

Be the second friend.

Bottom line

The next phase of AI coding is not “let the agent do whatever.”

It is:

specs before code
states before tools
skills before giant prompts
evidence before shipping

That is how agents stop being impressive demos and start becoming useful infrastructure.

Less main character energy. More grown-up engineering.

Honestly? About damn time.