AI coding assistants like GitHub Copilot and Claude Code are reshaping software development along three axes: how developers spend their time, how code quality evolves at scale, and which workflows become viable. The evidence from large empirical studies and practitioner case studies converges on one meta-finding: AI does not fix weak engineering cultures; it amplifies whatever practices, or lack thereof, teams already have, including how teams handle large codebases, CI debt, and agent-driven maintenance.
AI amplifies existing engineering culture
AI adoption is not a neutral productivity lever. Google's DORA 2025 report, surveying nearly 5,000 technology professionals (90% of whom use AI at work), finds that AI adoption now improves software delivery throughput — a key shift from 2024 — but still increases delivery instability. The central insight: AI magnifies existing organizational strengths and weaknesses rather than compensating for dysfunction. Teams with strong engineering practices, testing discipline, and coordination see AI amplify their output; teams with weak fundamentals see their problems compounded.
DORA's new AI Capabilities Model identifies seven foundational capabilities that determine whether AI adoption produces positive outcomes: a clear and communicated AI stance, healthy data ecosystems, AI-accessible internal data, quality internal platforms, strong version control practices, working in small batches, and a user-centric focus. When these are present, AI's benefits are amplified across individual effectiveness, organizational performance, and software delivery throughput. When absent, AI creates "localized pockets of productivity that are often lost to downstream chaos." See organizational-ai-enablement for the full model.
A senior engineering retreat (February 2026) with practitioners from major technology companies reached the same conclusion independently: AI is an amplifier, not a fix. The retreat's sharpest framing — from Gene Kim's DORA foreword — was a control-theory analogy: if you suddenly accelerate from walking speed to 50 mph, your control systems must also speed up, or you crash. Teams need faster feedback loops, more architectural independence of action, and a stronger learning culture to match AI-accelerated code generation.
The practical consequence: returns on AI investment come from the underlying organizational system, not the tools themselves. Introducing Copilot into a team with poor code review and no automated tests speeds up the production of problems.
Work shifts from collaboration to solo coding
A 2026 Harvard study of 187,000 developers given free GitHub Copilot access documented a sharp reallocation of the working day:
- Coding time increased by 12.4% — developers spent more of their day writing code
- Project management time dropped by 24.9% — less coordination, planning, administration
- Developer-to-developer collaboration fell ~80% — AI served as a continuous sparring partner, displacing peer interactions
- Junior developers gained the most — more coding time and more experimentation with new languages
The researchers concluded that "generative AI is not just a productivity improvement, but changes the very work developers do." The collapse in peer collaboration is the effect worth watching: short-term individual throughput rises while the team's shared context — the substrate of cognitive-debt — thins out. Researcher Frank Nagle warns specifically against reading the junior-developer gains as a reason to stop hiring juniors, calling that a "profound strategic error."
Quality degrades without countervailing practices
AI raises the floor for writing new code and lowers the bar for reusing existing code — an asymmetry that shows up clearly at scale. GitClear's analysis of 211 million changed lines of code from 2020–2024 (Google, Microsoft, Meta, and enterprise repos) is the most comprehensive empirical view:
- 8× increase in duplicated code blocks — AI generates structurally similar code rather than reusing existing abstractions
- Refactoring collapsed from 25% to under 10% of all code changes — developers (and their AI) write new code instead of improving existing code
- Copy/paste code rose from 8.3% to 12.3% — exceeding "moved" (reused) code for the first time in the dataset's history
- Code churn increased from 5.5% to 7.9% — more code is written and then quickly reverted or rewritten
- AI-heavy repos carried 34% higher Cumulative Refactor Deficit — a compounding measure of deferred maintenance
Developer trust tracks the data. A Sonar report found 96% of developers report challenges trusting AI-generated code, and 38% say reviewing AI-generated code takes more work than reviewing a colleague's.
The mechanism is consistent across sources: AI enables local improvements (writing a function faster) without global reasoning (should this function exist, does it duplicate something, does the architecture support it). When teams drop TDD, refactoring, and thorough code review to chase the speed gains, technical debt accelerates. This is the same dynamic DORA names as amplification, measured at the line-of-code level. See cognitive-debt and good-taste-as-competitive-advantage for the reasons judgment and shared understanding matter more, not less, when generation is cheap.
Remote and mobile workflows are back
Because most AI coding tools run in the terminal, developers are rediscovering remote-workstation workflows reminiscent of the early 2000s SSH era. A common pattern documented by Harper Reed:
- Network: Tailscale or similar mesh VPN for phone-to-workstation connectivity without firewall configuration
- Terminal client: Blink, Prompt, or Termius on the phone
- Session persistence: tmux (or screen) keeps Claude Code sessions alive across disconnections; mosh handles flaky links
- Multi-agent workflows: tmux tabs between multiple Claude Code instances running in parallel
The workflow is simple: SSH into a workstation from a phone, attach to a tmux session, interact with the agent. Any phone becomes a development terminal, and skills like SSH, tmux, and remote server management become relevant again for a new generation of developers.
Large codebases and CI remediation
ClickHouse's experience shows where coding agents become materially useful after the novelty wears off. The team started with boilerplate and small internal tools, then expanded into a main C++ codebase once model quality improved; the threshold moved from "nice for scripts" to "usable for daily work" after newer Claude models landed. The practical lesson is that agents become especially valuable when the task is repetitive, well-scoped, and expensive to do by hand.
- Repetitive changes across many files are a strong fit because agents reduce manual copy-paste errors
- Merge conflicts, stale branches, and PR cleanup are high-value use cases because the work is tedious but easy to review
- Agents can port features across related codebases or languages when the target is well specified
- Log-driven investigation works best when an experienced engineer uses the agent to test hypotheses instead of accepting its first theory
- CI and flaky-test remediation can be scaled aggressively when the team is willing to review and merge the output
- ClickHouse reported using agents to submit hundreds of PRs for CI and test fixes, reducing daily findings from roughly 200 to a small handful per 10 million test executions
The same pattern applies outside ClickHouse: coding agents are most effective when they operate inside a strong review loop, not as a substitute for it. The output quality improves because "agent does, you review" gives humans a fresh eye on code they did not type themselves.
Agentic engineering raises the ceiling
Andrej Karpathy's framing is that vibe coding raises the floor, while agentic engineering raises the ceiling. That distinction matters because the second phase is not just about faster drafting; it is about coordinating more ambitious work across a larger surface area than a single human could comfortably keep in short-term memory.
- Vibe coding is the low-friction entry point: ask for a draft, inspect it, and iterate
- Agentic engineering is the next step: specify outcomes, let multiple agents work, and use review to keep the result coherent
- The ceiling rises when engineers spend less time typing boilerplate and more time directing, validating, and integrating output
- The limiting factor becomes judgment, not keystrokes, so taste and architecture matter even more than before
The operator keeps the keys
Rohit's paraphrase of Andrej Karpathy's YC AI Startup School line captures the emerging operator model: "build Iron Man suits, not Iron Man robots." The people shipping fastest are still coding, but now they wear the suit — directing a fleet of agents while keeping the keys in their hands. The stalled mode is the opposite: stop coding, hand over judgment, and drift into passive oversight instead of active direction.
- The useful identity is still coder/operator, not spectator
- Agents work best when a human keeps intent, review, and final responsibility
- This is the same control problem DORA and Microsoft surface at the organizational level: speed without direction creates drift
Coding is becoming a loop
A recent discussion with Lauren Reeder and Boris Cherny frames coding as effectively solved for a growing class of tasks at Anthropic. The useful unit of work is no longer a linear typing session but a loop: state the intent, let the model draft, review the result, steer the next pass, and repeat. That loop-centric workflow explains why Claude Code and similar tools reward strong supervision more than fast fingers.
Practitioner-led demos are a better on-ramp than generic courses
One recurring pattern in the coding-agents world is that short practitioner talks outperform polished “learn AI” courses because they show the actual workflow, not just the vocabulary. A 30-minute speech from Anthropic’s Head of Coding Agents is presented as a better way to understand vibe coding than a stack of paid tutorials.
- Vibe coding is easiest to learn when you can watch the loop: intent, generation, review, correction, repeat
- The useful unit is not prompt crafting alone, but steering an agent toward a concrete outcome while keeping checkpoints explicit
- Practical demos from people building coding agents tend to teach the operational habits that matter in real work
- A 47-minute interview with Boris Cherny, the creator of Claude Code, is another strong on-ramp because it shows AI-native development as a closed loop of intent, generation, review, and correction rather than a magic prompt trick
AI driving is a learned engineering skill
The social signal in practitioner commentary is that using AI well is not just "asking better questions"; it is an operational discipline with its own technique, tooling, and judgment. Uncle Bob Martin's reaction to Anthropic's prompting workshop captures the point: driving an AI is a form of engineering, and the skill ceiling is high enough that casual users notice the gap immediately.
- Short workshops from the people building the tools are often more useful than generic prompt courses because they expose the actual control loop
- The hard part is not producing a response, but steering the model toward the right outcome under constraints
- This fits the broader pattern in good-taste-as-competitive-advantage: as output becomes cheap, judgment and operator skill become more valuable