The AI-Native engineering playbook

Co-Planning 22% Review 38% Manual 5%

SDLC

The Four Phases: Co-Plan, Build by Prompt, Review, Intervene

Microsoft's AI-native flow data shows engineers now spend 22% co-planning, 35% building by prompt, 38% reviewing, and only 5% writing code manually. How to restructure your team's workflow around this new reality.

The 2/2/2 Framework: 6-Week Cycles for AI-Native Shipping

Two weeks to scope and prototype. Two weeks to build and iterate. Two weeks to ship and harden. A milestone-driven cadence that replaces sprint planning when AI shrinks estimation by 60%.

Cadence Milestones

02

Team Design

Small Teams, Massive Output

Sora shipped in 28 days with 4 people. The Codex app runs with fewer than 3 core contributors. Your org chart is wrong.

Org Structure

The 3-4 Person Pod: Why Smaller Teams Win in the AI Era

Small teams with AI agents outperform large teams without them. The data is clear: 3-4 engineers with full ownership, minimal meetings, and AI-augmented workflows ship faster than teams of 10. How to restructure.

Small Teams Ownership

Rotation Distributed Decisions

Roles

The Producer Role: Coordinating AI-Native Workflows

Someone needs to orchestrate the agents, manage context windows, and ensure specifications are agent-ready. Enter the Producer — a new role emerging in AI-native teams that bridges product, engineering, and AI coordination.

New Roles Coordination

Rotating Leadership: No Permanent Tech Leads

Fixed hierarchies create bottlenecks. AI-native teams rotate project leadership — engineers take turns running ceremonies, owning tech specs, and making architectural calls. How to build collective problem-solving capability.

Domain Pods Customer-Centric

Domain Pods

Domain-Driven AI Teams: 6-8 People Around a Customer Problem

Organise around customer personas, not technical components. Each pod: 1 PM, 1 QA lead, 4-6 engineers anchored by senior staff. This structure ensures AI remains a solution — not just a feature of your workflow.

E2E First Playwright Test Pyramid

03

Verification

From Coding to Verification

Agents write the code. You build the system that verifies. TDD, security scanning, and review — the new engineering core.

TDD

Test-Driven Development as Non-Negotiable

No production code without a failing test first. In an AI-native workflow, tests are your specification language. End-to-end first (Playwright), then integration, then unit. The test pyramid inverts when agents write the implementation.

Gold-Standard PRs AI Reviewers

Code Review

Reviewing 3x More PRs Without Losing Rigour

Your team merges 3x more pull requests. How deep should reviews go? Curate "gold-standard PRs" as evaluation benchmarks. Use AI reviewers for baseline checks. Reserve human attention for architecture, edge cases, and intent alignment.

Security

Security-First AI Development: Static Analysis on Every Commit

AI-generated code introduces new attack surfaces. Semgrep, CodeQL, and Bandit on every commit. Dependency auditing with Snyk. Security constraints baked into AGENTS.md. Human review with a security lens at the planning phase.

Semgrep CodeQL Snyk

Anti-Patterns

The 7 Deadly Sins of AI-Native Development

Skipping TDD. Proceeding with unfixed review issues. Running multiple agents simultaneously without isolation. Accepting partial spec compliance. Skipping pre-commit hooks. These anti-patterns will sink your team faster than no AI at all.

Pitfalls Guardrails

AGENTS.md & CLAUDE.md: Your Team's Operating Manual for AI

These files encode conventions, guardrails, architectural decisions, and security constraints that guide agent behaviour. They're the most important documentation you'll ever write. How to structure them for maximum agent effectiveness.

AGENTS.md CLAUDE.md ADRs

Specialised Agent Personas: PM, Architect, Reviewer, DevOps

A Product Manager Advisor. A System Architecture Reviewer. A GitOps CI Specialist. A Responsible AI Agent. How to design specialised personas that collaborate on the same codebase with distinct expertise and concerns.

Personas Multi-Agent

Architecture

Layered Prompt Architecture: Composable Systems for LLM Apps

Personality layer, audience layer, platform layer, topic layer. Progressive tool disclosure. Aggressive caching and file-based context offloading. How to build prompt systems that scale without becoming unmaintainable.

Prompt Design LLM Ops

MCP Servers, Tool Access & the Agent's Development Environment

Agents need access to compilers, test runners, logging systems, design systems, and git history. MCP servers provide structured tool execution. How to configure the right access scopes without creating security risks.

MCP Tool Access Scoping

05

People

How Every Engineering Role Changes

Junior, mid, senior — every level is being redefined. New skills, new career ladders, new interview loops.

Junior (0-3 yrs)

The New Junior: Evaluating AI Output Against First Principles

Juniors now get immediate exposure to production patterns through AI. The risk: they ship without understanding. The opportunity: stronger foundational literacy from day one. How to mentor juniors who've never written code without AI.

Mid-Level (3-7 yrs)

Quality Gatekeepers: The Mid-Level Engineer's New Job

Mid-level engineers become the primary quality gatekeepers. They bridge technical constraints and business requirements, orchestrate multi-agent collaboration, and catch the systemic drift that AI can't see.

Senior (7+ yrs)

Context Engineers: What Senior Actually Means Now

Senior engineers focus on context engineering and architectural validation. They identify systemic drift, maintain quality frameworks, and mentor — not by writing code, but by designing the systems that make AI-generated code reliable.

Interviews AI Fluency Judgment

Hiring

Interviewing for Judgment, Not Syntax

The whiteboard is dead. The new interview evaluates judgment, system thinking, and the ability to direct AI. Adaptability matters more than years of experience. Empathy and people skills become the real differentiator.

Fine-Tuning Data Pipeline

Economics

The Changed Economics of Refactoring

Major restructuring used to take months. Now it's feasible multiple times per cycle because AI handles the implementation churn. This changes how you think about tech debt, architecture evolution, and when to rewrite vs iterate.

Tech Debt Refactoring

Building a Proprietary Training Data Advantage

Real documents, real conversations, real instruction-completion pairs. How to systematically build proprietary training data — from hundreds of documents to a fine-tuned model that becomes your competitive moat.

Multi-Model Routing: Haiku for Speed, Opus for Depth

Not every task needs your most expensive model. How to architect tiered AI systems that route requests to the right model based on complexity, latency, and cost. Includes cost modelling across model tiers.

Model Selection Cost Optimisation

Voice Matching Content AI

Personalisation

Voice Embedding & Personality-Driven AI Output

Getting AI to write like a specific person, not a generic bot. Personality profiling engines, voice-matching models, and the architecture behind AI that sounds human — applied to content generation at scale.

Make vs Buy Decision Framework

Strategy

Build vs Buy for AI Features: The Decision Matrix

When to use off-the-shelf APIs, when to fine-tune, when to build from scratch. A decision matrix for engineering leaders. Success metrics should emphasise cycle time and shipped capabilities — not headcount reduction.

SDLC → ADLC Paradigm Shift

07

Team Building

Building AI-Native Teams From Scratch

Structure, culture, and execution for engineering leaders standing up AI-native teams — whether greenfield or transforming an existing org.

Transformation

From SDLC to ADLC: The Agent Development Life Cycle

The Software Development Life Cycle assumed humans write code. The Agent Development Life Cycle assumes humans specify, agents implement, and verification systems validate. A new mental model for engineering leaders making the transition.

The AI-Native Team Blueprint: Roles, Ratios & Rituals

3-4 engineers per pod. A Producer to coordinate agent workflows. Rotating tech leads. Pair sessions for design review, not code review. Zero daily standups. The complete structural blueprint for an AI-native team.

Blueprint Ratios Rituals

The AI-Native Development Flow: Specify → Generate → Verify → Ship

Replace the old plan-code-test-deploy with a new loop: write detailed specs, generate implementation with agents, verify through automated testing and human review, ship with confidence. Each phase has different time allocations than you expect.

Specify Generate Verify Ship

Business Impact

Measuring What Matters: AI-Native Teams and Business Outcomes

Cycle time, not headcount. Shipped capabilities, not lines of code. Customer outcomes, not velocity charts. How to frame AI-native team performance in terms executives and boards actually care about — and prove the ROI.

Cycle Time ROI Outcomes

Culture

Day One Culture: Setting AI-Native Norms Before Bad Habits Form

Share prompts like code snippets. Question every ceremony weekly. Protect focus time ruthlessly. Treat AI fluency as a core competency, not a nice-to-have. The cultural defaults that separate thriving AI-native teams from teams that just use ChatGPT.

Norms Habits