AI Coding Assistants — Petter's wiki

AI coding assistants like GitHub Copilot and Claude Code are reshaping software development along three axes: how developers spend their time, how code quality evolves at scale, and which workflows become viable. The evidence from large empirical studies and practitioner case studies converges on one meta-finding: AI does not fix weak engineering cultures; it amplifies whatever practices, or lack thereof, teams already have, including how teams handle large codebases, CI debt, and agent-driven maintenance.

AI amplifies existing engineering culture

AI adoption is not a neutral productivity lever. Google's DORA 2025 report, surveying nearly 5,000 technology professionals (90% of whom use AI at work), finds that AI adoption now improves software delivery throughput — a key shift from 2024 — but still increases delivery instability. The central insight: AI magnifies existing organizational strengths and weaknesses rather than compensating for dysfunction. Teams with strong engineering practices, testing discipline, and coordination see AI amplify their output; teams with weak fundamentals see their problems compounded.

DORA's new AI Capabilities Model identifies seven foundational capabilities that determine whether AI adoption produces positive outcomes: a clear and communicated AI stance, healthy data ecosystems, AI-accessible internal data, quality internal platforms, strong version control practices, working in small batches, and a user-centric focus. When these are present, AI's benefits are amplified across individual effectiveness, organizational performance, and software delivery throughput. When absent, AI creates "localized pockets of productivity that are often lost to downstream chaos." See organizational-ai-enablement for the full model.

A senior engineering retreat (February 2026) with practitioners from major technology companies reached the same conclusion independently: AI is an amplifier, not a fix. The retreat's sharpest framing — from Gene Kim's DORA foreword — was a control-theory analogy: if you suddenly accelerate from walking speed to 50 mph, your control systems must also speed up, or you crash. Teams need faster feedback loops, more architectural independence of action, and a stronger learning culture to match AI-accelerated code generation.

The practical consequence: returns on AI investment come from the underlying organizational system, not the tools themselves. Introducing Copilot into a team with poor code review and no automated tests speeds up the production of problems.

Work shifts from collaboration to solo coding

A 2026 Harvard study of 187,000 developers given free GitHub Copilot access documented a sharp reallocation of the working day:

Coding time increased by 12.4% — developers spent more of their day writing code
Project management time dropped by 24.9% — less coordination, planning, administration
Developer-to-developer collaboration fell ~80% — AI served as a continuous sparring partner, displacing peer interactions
Junior developers gained the most — more coding time and more experimentation with new languages

The researchers concluded that "generative AI is not just a productivity improvement, but changes the very work developers do." The collapse in peer collaboration is the effect worth watching: short-term individual throughput rises while the team's shared context — the substrate of cognitive-debt — thins out. Researcher Frank Nagle warns specifically against reading the junior-developer gains as a reason to stop hiring juniors, calling that a "profound strategic error."

Product velocity is bottlenecked by taste, not code production

The sharpest counter to the "we cleared six years of backlog" narrative is that faster code production does not translate into faster product improvement. Ethan Ding's argument runs as a falsification: if Claude Code gave a genuine compounding product-velocity advantage, Anthropic — which had it exclusively for seven months and whose own Claude Code is "completely claude-coded" — should have left every competitor unbridgeably behind. Instead Codex launched months later and is already competitive, and people still debate which tool is better. The compound advantage never showed up, which means code production was never the bottleneck.

The mental model that explains this: the best engineering cultures treat lines of code as something you spend, not something you produce. The codebase is a liability on the balance sheet, not an asset — every line is a surface for bugs, every feature creates neighbours (a Slack integration begets a Teams integration begets an email fallback), and complexity compounds fractally rather than scaling linearly. Practitioners building coding tools say the same thing independently: David Cramer (Sentry) argues LLMs "remove the barrier to get started but create increasingly complex software" that slows long-term velocity — "it's mostly bloat"; dax (opencode.ai) frames the industry's confident speed claims against his own team's lack of clarity; Karri Saarinen (Linear) concurs.

Coding agents genuinely help 0-to-1 products reach the quality frontier faster — the speed is real for early-stage and CRUD work
But they buy that speed on credit: the codebase grows faster than the quality, and the technical debt compounds (see cognitive-debt)
At the frontier, the constraint is compression — delivering an experience in fewer lines, under load, at latency — which agents cannot evaluate because they have no theory of the system
A backlog full of CRUD and internal tooling is exactly what agents accelerate, and exactly what does not push the product frontier; shipping faster is not the same as users caring more

This reframes the value question: if you are going zero-to-Camry, coding agents are extraordinarily helpful and will drive down the cost of mid-tier software; if you are the artisans at Ferrari, your bottleneck is tastemakers, not tokens. See good-taste-as-competitive-advantage for why subtractive taste is the durable moat.

Productivity gains split along seniority (the k-shape)

Labor-economics data shows coding-agent productivity gains are not evenly distributed but split along a k-shape: senior engineers show measurable output growth since the 2023 LLM inflection, while junior output has gone flat or declined (Ding, citing commit-graph analysis Q1 2015–Q1 2025). This is consistent with the Harvard study's finding that juniors gain coding time, but adds a cautionary edge — raw output growth concentrates among engineers who already have the judgment to direct the tool. It sharpens the operator model below: the gains accrue to those who keep the keys, not those who hand over the wheel.

Quality degrades without countervailing practices

AI raises the floor for writing new code and lowers the bar for reusing existing code — an asymmetry that shows up clearly at scale. GitClear's analysis of 211 million changed lines of code from 2020–2024 (Google, Microsoft, Meta, and enterprise repos) is the most comprehensive empirical view:

8× increase in duplicated code blocks — AI generates structurally similar code rather than reusing existing abstractions
Refactoring collapsed from 25% to under 10% of all code changes — developers (and their AI) write new code instead of improving existing code
Copy/paste code rose from 8.3% to 12.3% — exceeding "moved" (reused) code for the first time in the dataset's history
Code churn increased from 5.5% to 7.9% — more code is written and then quickly reverted or rewritten
AI-heavy repos carried 34% higher Cumulative Refactor Deficit — a compounding measure of deferred maintenance

Developer trust tracks the data. A Sonar report found 96% of developers report challenges trusting AI-generated code, and 38% say reviewing AI-generated code takes more work than reviewing a colleague's.

The mechanism is consistent across sources: AI enables local improvements (writing a function faster) without global reasoning (should this function exist, does it duplicate something, does the architecture support it). When teams drop TDD, refactoring, and thorough code review to chase the speed gains, technical debt accelerates. This is the same dynamic DORA names as amplification, measured at the line-of-code level. See cognitive-debt and good-taste-as-competitive-advantage for the reasons judgment and shared understanding matter more, not less, when generation is cheap.

Remote and mobile workflows are back

Because most AI coding tools run in the terminal, developers are rediscovering remote-workstation workflows reminiscent of the early 2000s SSH era. A common pattern documented by Harper Reed:

Network: Tailscale or similar mesh VPN for phone-to-workstation connectivity without firewall configuration
Terminal client: Blink, Prompt, or Termius on the phone
Session persistence: tmux (or screen) keeps Claude Code sessions alive across disconnections; mosh handles flaky links
Multi-agent workflows: tmux tabs between multiple Claude Code instances running in parallel

The workflow is simple: SSH into a workstation from a phone, attach to a tmux session, interact with the agent. Any phone becomes a development terminal, and skills like SSH, tmux, and remote server management become relevant again for a new generation of developers.

Large codebases and CI remediation

ClickHouse's experience shows where coding agents become materially useful after the novelty wears off. The team started with boilerplate and small internal tools, then expanded into a main C++ codebase once model quality improved; the threshold moved from "nice for scripts" to "usable for daily work" after newer Claude models landed. The practical lesson is that agents become especially valuable when the task is repetitive, well-scoped, and expensive to do by hand.

Repetitive changes across many files are a strong fit because agents reduce manual copy-paste errors
Merge conflicts, stale branches, and PR cleanup are high-value use cases because the work is tedious but easy to review
Agents can port features across related codebases or languages when the target is well specified
Log-driven investigation works best when an experienced engineer uses the agent to test hypotheses instead of accepting its first theory
CI and flaky-test remediation can be scaled aggressively when the team is willing to review and merge the output
ClickHouse reported using agents to submit hundreds of PRs for CI and test fixes, reducing daily findings from roughly 200 to a small handful per 10 million test executions

The same pattern applies outside ClickHouse: coding agents are most effective when they operate inside a strong review loop, not as a substitute for it. The output quality improves because "agent does, you review" gives humans a fresh eye on code they did not type themselves.

Agentic engineering raises the ceiling

Andrej Karpathy's framing is that vibe coding raises the floor, while agentic engineering raises the ceiling. That distinction matters because the second phase is not just about faster drafting; it is about coordinating more ambitious work across a larger surface area than a single human could comfortably keep in short-term memory.

Vibe coding is the low-friction entry point: ask for a draft, inspect it, and iterate
Agentic engineering is the next step: specify outcomes, let multiple agents work, and use review to keep the result coherent
The ceiling rises when engineers spend less time typing boilerplate and more time directing, validating, and integrating output
The limiting factor becomes judgment, not keystrokes, so taste and architecture matter even more than before

Orchestration is becoming a first-class coding practice. Claude Code's Dynamic Workflows (research preview, May 2026) let the model write a JavaScript orchestration script for a task that a separate runtime then executes in the background, spawning subagents — up to 16 concurrent and 1,000 total per run — that do the reading, writing, shell work, research, and review. The design philosophy is that parallel subagents act as an internal review layer: Claude checks its own work before surfacing anything. This targets work too large for a single conversational loop — codebase audits, service-wide bug hunts, large migrations. The flagship demonstration was Jarred Sumner (Bun) porting ~750,000 lines from Zig to Rust with 99.8% of the existing test suite passing, in eleven days. The practice it rewards is the same operator discipline: a strong, machine-enforced review loop scaled across many agents, not a substitute for review.

The operator keeps the keys

Rohit's paraphrase of Andrej Karpathy's YC AI Startup School line captures the emerging operator model: "build Iron Man suits, not Iron Man robots." The people shipping fastest are still coding, but now they wear the suit — directing a fleet of agents while keeping the keys in their hands. The stalled mode is the opposite: stop coding, hand over judgment, and drift into passive oversight instead of active direction.

The useful identity is still coder/operator, not spectator
Agents work best when a human keeps intent, review, and final responsibility
This is the same control problem DORA and Microsoft surface at the organizational level: speed without direction creates drift

Coding is becoming a loop

A recent discussion with Lauren Reeder and Boris Cherny frames coding as effectively solved for a growing class of tasks at Anthropic. The useful unit of work is no longer a linear typing session but a loop: state the intent, let the model draft, review the result, steer the next pass, and repeat. That loop-centric workflow explains why Claude Code and similar tools reward strong supervision more than fast fingers.

Practitioner-led demos are a better on-ramp than generic courses

One recurring pattern in the coding-agents world is that short practitioner talks outperform polished “learn AI” courses because they show the actual workflow, not just the vocabulary. A 30-minute speech from Anthropic’s Head of Coding Agents is presented as a better way to understand vibe coding than a stack of paid tutorials.

Vibe coding is easiest to learn when you can watch the loop: intent, generation, review, correction, repeat
The useful unit is not prompt crafting alone, but steering an agent toward a concrete outcome while keeping checkpoints explicit
Practical demos from people building coding agents tend to teach the operational habits that matter in real work
A 47-minute interview with Boris Cherny, the creator of Claude Code, is another strong on-ramp because it shows AI-native development as a closed loop of intent, generation, review, and correction rather than a magic prompt trick

AI driving is a learned engineering skill

The social signal in practitioner commentary is that using AI well is not just "asking better questions"; it is an operational discipline with its own technique, tooling, and judgment. Uncle Bob Martin's reaction to Anthropic's prompting workshop captures the point: driving an AI is a form of engineering, and the skill ceiling is high enough that casual users notice the gap immediately.

Short workshops from the people building the tools are often more useful than generic prompt courses because they expose the actual control loop
The hard part is not producing a response, but steering the model toward the right outcome under constraints
This fits the broader pattern in good-taste-as-competitive-advantage: as output becomes cheap, judgment and operator skill become more valuable