Can automating the boring parts of engineering let us spend more time on real design and architecture? We asked that after Boris Cherny shipped 22 pull requests in a day using AI-generated code. We also saw Anthropic report nearly all internal code now comes from AI tools.
We found that treating each development phase as a repeatable process lifts product quality. By using Claude Code and modern tools, we cut boilerplate time and focused on specs, system design, and integration points.
Our teams now embed security, tests, and documentation into every step of the pipeline. We keep a central document and use a few specialized tools to manage files, reviews, and task context so the codebase stays healthy.
Later in this guide we share patterns, decisions, and how we handle complex user cases. For a look at how centralized tracking speeds approvals and reduces rework, see our central document.
Key Takeaways
- AI-first code generation reduces routine work and speeds delivery.
- Engineers add more value by defining clear specs and context.
- Integrating security and tests across the pipeline keeps output robust.
- Centralized docs and tools cut review time and miscommunication.
- We treat the project as a living system to balance automation and oversight.
The Evolution of Modern Software Engineering
Teams are shifting their focus from keystrokes to architecture and intent.
One respected AI researcher reported that his manual coding ability fell from 80% to 20% in four weeks after adopting agentic tools. We saw the same trend across our groups.
As routine syntax is automated, we now spend more time on system design, requirements, and verification. This move changes how we think about development and how we train people.
What matters most now is clear communication, strong review practices, and the ability to validate AI outputs. Our engineers act as architects and safety gates rather than only as implementers.
- High-level design replaces repetitive typing.
- Verification and testing gain priority.
- Skills in explaining intent become core to success.
| Past Focus | Current Focus | Expected Outcome |
|---|---|---|
| Manual typing of code | System architecture and specs | Faster delivery and clearer designs |
| Individual output | Collaborative verification | Higher quality and safety |
| Fixing syntax | Directing models and auditing results | More impactful engineering work |
Understanding the Shift to SDLC with Claude
We shifted how people work: some drive architecture while others translate specs into runnable code. This split changes how we measure value and how we organize tasks.
The Builder vs Coder Split
Builders own requirements, system design, and user intent. They shape the product and the layer of abstraction the team uses.
Coders focus on implementation details, syntax, and APIs. Their job is to turn intent into reliable code that passes review and tests.
Defining the New Workflow
We treat each phase as a spec-driven unit of work. Every task is a discrete item that includes files, context, and acceptance criteria.
- Keep a consistent layer that avoids context bloat.
- Prioritize design and security early in the project.
- Review output against the original spec before moving on.
| Role | Focus | Outcome |
|---|---|---|
| Builder | Requirements, system, design | Clear specs and product value |
| Coder | Implementation, APIs, tests | Reliable codebase and faster delivery |
| Team | Tools, review, workflows | Scalable system and secure output |
Building a Knowledge-First Architecture
We design our systems so the knowledge, not the runtime code, is the single source of truth.
Our knowledge-first architecture treats design artifacts and decisions as the durable outputs. The actual code becomes disposable and is regenerated as needed.
We document foundations so our team can keep product intent steady across rewrites. Clear boundaries and security rules stop models from making unsafe assumptions.
Every change goes through a short review that validates the current state against docs. We keep compact context for agents so each decision traces back to history and constraints.
- Write specs that machines can execute reliably.
- Embed patterns that reduce debugging time.
- Use minimal tools and clear conventions for handoffs.
| Artifact | Purpose | Benefit |
|---|---|---|
| Architecture diagrams | Capture decisions | Consistent system behavior |
| Spec documents | Drive code generation | Faster, safer engineering |
| Security rules | Enforce boundaries | Reduced runtime risk |
Establishing Your Repository Constitution

We treat a repository constitution as the single, durable guide that keeps agents and people aligned. A short set of instruction files sets the tone for every session and reduces guesswork across the project.
Best Practices for Instruction Files
Keep it concise. We maintain CLAUDE.md and AGENTS.md but limit each file to the essentials so the agent loads only what it needs.
Use progressive disclosure. Load patterns and commands for the current task, not the whole codebase. This lowers token use and keeps context focused.
- Document the why, what, and how of the project in one file.
- Log common mistakes and fixes so the agent learns over time.
- Symlink instruction files for cross-tool access and consistent behavior.
- Follow public standards: AGENTS.md is widely adopted and CLAUDE.md (~2.5k tokens) gives ample brief without overload.
Result: Every developer and agent starts sessions with the same commands, patterns, and context, leading to more predictable code generation and a healthier codebase.
Implementing Spec-Driven Development
We insist that every feature starts as a clear spec before the model touches any code. This rule separates design and implementation into distinct phases so teams can work predictably.
First, research produces a short document that captures intent, acceptance criteria, and test requirements. We then freeze the spec as a contract. Only after review does the model generate implementation.
We use Atomic as our orchestration layer to break work into small tasks. Each task has a single verification step and a persistent artifact. That artifact enables fresh agent sessions to run without prior chat history.
- Research: gather context and constraints.
- Specify: write an executable spec and tests.
- Implement: model produces code against the spec.
- Ship: verify output and merge artifacts to the system.
| Phase | Artifact | Benefit |
|---|---|---|
| Research | Design document | Reduces ambiguity and rework |
| Specify | Specs & tests | Clear verification criteria |
| Implement | Code output | Consistent, testable delivery |
Managing Context for Peak Agent Performance

We design our workflow so agents stay fast and predictable across every development phase.
We keep every agent session short and focused to avoid context drift and slowdowns.
Fresh Sessions per Task
Each task starts in a fresh session so the model only sees the files and spec it needs.
This reduces token bloat and makes output easier to review.
Atomic Changes
We break work into single-feature commits. One session, one change, one test.
Atomic changes simplify review and lower merge conflicts in the codebase.
Subagents for Specialized Work
Atomic spawns subagents in parallel for research, analysis, and pattern finding.
We route security and review jobs to dedicated subagents so the implementation worker stays focused on code.
State transfers via a condensed summary document and specific commands, not long chat history. Each phase produces a persistent document that future teams can reference.
- Benefit: scalable parallel sessions without context bloat.
- Benefit: clearer review steps and higher quality output.
| Pattern | Effect | Artifact |
|---|---|---|
| Fresh session | Stable model performance | Condensed summary file |
| Atomic change | Easier review | Single-feature commit |
| Subagents | Specialized analysis | Parallel reports |
Adopting the Writer and Reviewer Pattern
We split writing and reviewing into separate sessions so each pass stays focused and objective.
At Anthropic, each pull request is examined by a fresh instance to avoid bias toward the original implementation. We mirror that idea: one session writes code and another reviews it in a clean context.
Opening a new context for every review lets the reviewer evaluate the output against the original spec without prior assumptions. This reduces missed edge cases and speeds up the time it takes to catch regressions.
We assign the reviewer session to flag risks and compare the implementation to the repository constitution and architectural patterns. The separation improves security and keeps our team moving fast.
- Writer: produces files and runnable code against a spec.
- Reviewer: validates design, tests, and safety in a fresh context.
- Result: higher-quality output and fewer surprise fixes post-merge.
| Role | Focus | Outcome |
|---|---|---|
| Writer | Implementation, files | Testable code |
| Reviewer | Spec alignment, security | Objective feedback |
| Team | Workflow and tools | Faster, safer shipping |
To adopt this pattern, we document expectations and use proven project workflow tools so every phase runs predictably. Treat the review as a critical gate: it is where the team protects product intent and ensures reliable delivery.
Scaling Productivity with Parallel Agent Sessions
Spawning several agents in parallel turns complex projects into many small, verifiable tasks. We run multiple instances—often five or more—so different parts of a feature advance at the same time.
We use git worktrees to isolate files and avoid conflicts between sessions. Each agent gets its own task and branch, which keeps the code stable and easy to merge.
Automated tests run for every session. These tests speed up review and cut the manual bottleneck. When a test fails, the responsible agent re-runs the task or flags the item for human review.
- Assign distinct tasks per instance so the team stays focused.
- Provide compact context and specs to each session for reliable output.
- Monitor parallel workflows to keep design and architecture aligned.
| Practice | Benefit | Artifact |
|---|---|---|
| Parallel agent sessions | Faster feature throughput | Multiple feature branches |
| Git worktrees | No file conflicts | Isolated task worktrees |
| Automated tests per session | Quicker review cycles | CI test reports |
| Assigned tasks per instance | Higher focus and quality | Task checklists and specs |
Automating Safety Nets and Quality Gates
We build automated guards that run before code ever touches the main branch. These checks form a protective layer that keeps the repository healthy and predictable.
Pre-commit Hooks and CI Integration
Pre-commit hooks handle linting, formatting, and secret scanning so low-value issues never reach review. We run these locally and in CI to avoid drift.
Our pipelines run claude code in headless mode for repeatable tasks and enforce architecture rules from the repository constitution.
Test-First Requirements
Every task ships with specs and tests. We refuse merges until tests pass.
We use code coverage tools to verify that AI-generated tests exercise hot paths—about 15–20% of files that contain most logic.
- Automated safety nets: pre-commit + CI gates.
- Quality gates: spec-driven tests required per task.
- Coverage focus: target hot paths to reduce risk.
| Practice | Purpose | Benefit |
|---|---|---|
| Pre-commit hooks | Catch lint, format, secrets | Cleaner codebase and fewer trivial reviews |
| CI headless runs | Automate claude code tasks and checks | Consistent output and faster feedback |
| Coverage analysis | Validate tests hit hot paths | Better test ROI and fewer regressions |
| Architectural checks | Enforce repository rules | Preserve design and system boundaries |
These tools drastically cut manual review time and let our teams focus on higher-level design and architecture. For related automation patterns, see our guide on LinkedIn automation tools and best practices.
Navigating Common Implementation Challenges
Implementation often breaks down in the handoff between intent and runnable output. We counter the risk of low-quality AI-generated code — the “slopacolypse” — by combining strict review gates and clear, executable specs.
We train engineers to write specs that a model can follow precisely. Those specs stay alive; we update them as the project and models evolve.
Practical steps we use:
- Enforce a two-pass review: writer then fresh reviewer to catch hidden issues.
- Run automated analysis for security and regression tests before merge.
- Continuously refactor the codebase to remove dead files and reduce technical debt.
| Problem | Defensive Practice | Benefit |
|---|---|---|
| Shallow model output | Spec-driven tasks and targeted tests | Reliable implementation and fewer reworks |
| Context drift | Fresh review sessions and condensed summaries | Objective review and stable output |
| Hidden security gaps | Automated scans and policy checks | Safer product and compliant releases |
We adapt our workflows to the strengths of our tools and keep learning about model capabilities. For an example of an AI-powered approach to process and developer experience, see our AI-powered SDLC framework. To manage recurring tasks and schedules across teams, we link processes to practical guides like schedule Excel.
Embracing the Future of Agentic Development
We are reorganizing roles so people lead intent and agents run the tasks.
Our team shifts focus to system design, clear specs, and the orchestration of AI tools. We treat each phase as a chance to capture intent, not to grind on routine code.
Engineers learn to write precise specs and to verify output through rigorous review. We keep security and context front and center while agents execute repeatable tasks.
To explore agentic quality workflows, see the Agentic QE Fleet at Agentic QE Fleet. Adopting a CLAUDE.md starter file is a practical step to begin.


