Anthropic should learn from OpenAI

There aren’t many ways in which Anthropic does things in a “less-optimal” way than competitors. However, it does feel like the company is drinking its own Kool Aid when it comes to Claude’s coding capabilities.

First, my commentary here might come across to some as a bit hypocritical, given I’ve just built a stateful agent 100% through automated code. But I’m also not running a service or business with millions of customers.

Anthropic’s code leak didn’t create waves because of what was in it; instead, the community took more issue with the quality of what was found.

Multiple instances of functions that should be only a few hundred lines of code, but instead several thousand lines long, were found, adding unnecessary complexity and failure points.

Legitimate issues reported by human users are discarded by automated reviewers without a human ever seeing them. While on their own, these issues aren’t particularly service-breaking, with time, they compound.

Take the current issues people are experiencing with usage limits. Wild swings in what’s considered a “full session.” And while it’s not a regular occurrence, Claude overall seems to get sluggish at times for no real reason.

Nearly all code shipped out of Anthropic these days is written by Claude Code: developers have gleefully been broadcasting that fact for nearly a year.

But is Claude Code really ready to manage a major service? Kiro isn’t either (it took down AWS), and by the way, it’s typically using a Claude model. In my case, going completely automated for development isn’t a problem since I’m dealing with dozens and hundreds of customers versus thousands or millions.

I know if I had the latter, I’d definitely have a real developer in the loop. The chances of an embarrassingly and potentially devastating failure are too great not to spend that money.

OpenAI is also going 100% autonomous development, but they’re doing it in a slightly different way. Instead of all but turning over every role (including the reviewers) to the LLM, OpenAI injected human involvment throughout the process.

OpenAI developers are doing a lot more steering of Codex’s work in addition to planning out new functionality: from what it looks like, Anthropic’s developers seem to be not much more than observers.

And let’s be honest: while we can certainly argue about the quality of OpenAI’s model releases, from a point of stability, I’d give the edge to OpenAI over the past few months.

Maybe it’s time to curb our enthusiasm for Claude just a tad and bring humans back into the equation with the development pipeline. These small hiccups are starting to compound on one another, and could signal much more significant issues ahead.