Lines of Code: The Metric That's Worse in the AI Era

X post by ken [at aquariusacquah]:

With 3 engineers, Kaizen went from 10K LOC committed per week in September, to 100k LOC in December, to 1M LOC this past week all with a background agent.

No one on our team has opened an IDE in months. — Post about “productivity” in LLMs in terms of LOC

This X post/tweet is making the rounds on various platforms, about a team going from 10,000 lines of code committed per week to 1 million lines of code committed per week, all with just AI agents, and that no one on the team has opened an IDE in months.

If you’re celebrating that, think about what you’re celebrating for a moment: The systematic accumulation of technical debt at a rate that would have been physically impossible with a dev team of 100 people a year or so ago.

Haven’t we CHKDSKing been here before?

(CHKDSK -> fsck -> you get the picture)

If you’re new to this industry, lines of code as a metric has been considered harmful since at least the 1980s. Bill Gates [allegedly] said, “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” If you’re going to measure productivity by lines of code, anyone who’s learned how to write a file in a programming language can game the metric. I’ve seen people unroll for loops of static length [I’m assuming to “help” a late-1980s/early-1990s era compiler that presumably couldn’t optimize well?] Or maybe they didn’t understand loops in general. (Yes, there was plenty of evidence of that being the reason in some cases… sites like TheDailyWTF grew up around commiserating over these horror stories.)

The best code is often the code you don’t write. If there’s a generally accepted library for something, don’t reinvent the wheel. If your LLMs are churning out a new caching library or web framework, is that “productivity”? If you’ve unrolled code that could be managed by control structures, are you really being more “efficient”?

And who gets to maintain that code? Yes, the LLMs will still theoretically be there for that. But the same cost calculations apply, just with potentially way more headroom because a machine with a lot of memory is doing the maintenance.

Maintenance Iceberg Right Ahead

Ok, so let’s assume 1 million lines of code per week is actually happening. In year one, that’s 52 million lines of code added to your codebase (and the LLMs didn’t even take vacation!!)

Assume an industry average of 1-25 defects per 1,000 lines of code, but say AI is very clean with its code and consistently hits the 1:1000 ratio there. ThAt’S OnLy 52,000 BuGs iNtRoDuCeD In yEaR OnE. Even if most of those bugs are trivial, who is triaging those? At some point, either the bottleneck shifts or our level of control/concern does.

The Context Window Economics Are Brutal

Here’s where it gets really expensive in the LLM era. When you need to modify, debug, or understand any part of this generated codebase, you’re feeding it into context windows. Even with the latest models sporting 200k+ token windows, you’re looking at real costs:

A million lines of code is roughly 30-50 million tokens depending on the language. At current API pricing, just loading your weekly output into a context window for analysis costs hundreds of dollars. Want to do comprehensive codebase analysis? You’re talking thousands.

And that’s just for reading it. Every refactoring, every debugging session, every attempt to understand what the AI actually built—those are metered operations now.

The team that’s proud of never opening an IDE? They’re going to be very surprised when they get their LLM bills for the first major refactoring effort.

The Knowledge Debt

But the real cost isn’t financial—it’s cognitive. When humans write code, even mediocre code, the act of writing creates understanding. You know why that function exists, why it’s structured that way, what edge cases it handles.

When an AI generates a million lines of code, you get:

Code that probably works for the happy path
Zero team knowledge about design decisions
No intuition about what might break
No one who can explain why it was done that way

You’ve effectively outsourced your team’s understanding of their own product.

The Testing Illusion

“But the AI writes tests too!” Sure it does. And those tests verify that the code does what the AI thought it should do. They don’t verify that the code solves the actual business problem correctly. They don’t catch the semantic bugs, the wrong assumptions, the misunderstood requirements.

You’ve automated the easy part—syntax—while making the hard part—correctness—even harder.

What Should We Measure Instead?

If not lines of code, then what? Here are metrics that actually correlate with value:

Problems solved per week: Did you ship features users wanted?
Bugs resolved vs. bugs introduced: Is code quality improving?
Mean time to recovery: How fast do you fix production issues?
Deployment frequency: How often do you deliver value?
Code churn: How often do you have to rewrite the same code?

Notice that all of these measure outcomes, not output. They measure value delivered, not volume produced.

Using LLMs Responsibly

LLMs are powerful tools for software development. I use them daily. But treating them as code-generation vending machines is like using a forklift to bring groceries in from the car. Sure, you can do it, but you’re optimizing the wrong thing.

Better approaches:

Use LLMs to accelerate understanding of existing code
Generate boilerplate, but have humans design the architecture
Let AI handle the tedious parts while humans focus on the critical thinking
Treat generated code as a starting point, not a finished product

The Coming Reckoning

Teams celebrating million-line weeks are building a house of cards. Eventually, something will break in that AI-generated maze. A security vulnerability will emerge. A performance problem will surface. A regulatory requirement will demand changes.

And when that day comes, you’ll discover that your team of three engineers is responsible for maintaining 150 million lines of code that none of them wrote, with LLM analysis bills that rival your infrastructure costs, and no institutional knowledge about any of it.

The tweet concludes with “No one on our team has opened an IDE in months” as if this is an achievement.

It’s not. It’s a red flag the size of a football field.

Conclusion

The ability to generate massive amounts of code doesn’t make you more productive—it makes you more committed. Every line of code is a liability you’re agreeing to maintain, understand, and support. LLMs haven’t changed this fundamental truth; they’ve just made it easier to accumulate liabilities faster than ever before.

The real engineering skill in the LLM era isn’t generating more code. It’s knowing what code not to write, having the discipline to keep codebases comprehensible, and maintaining the team knowledge required to sustain systems over time.

Otherwise, you’re not building software. You’re building a maintenance crisis in slow motion—one million lines at a time.

Code, Strings, and Keys

Lines of Code as a Productivity Metric: Are We Really Going There?

Haven’t we CHKDSKing been here before?

Maintenance Iceberg Right Ahead

The Context Window Economics Are Brutal

The Knowledge Debt

The Testing Illusion

What Should We Measure Instead?

Using LLMs Responsibly

The Coming Reckoning

Conclusion

Like this:

Leave a ReplyCancel reply

Lines of Code as a Productivity Metric: Are We Really Going There?

Haven’t we CHKDSKing been here before?

Maintenance Iceberg Right Ahead

The Context Window Economics Are Brutal

The Knowledge Debt

The Testing Illusion

What Should We Measure Instead?

Using LLMs Responsibly

The Coming Reckoning

Conclusion

Share this:

Like this:

Leave a ReplyCancel reply