Hal Speaks

Tag: Anthropic

Did Anthropic Box OpenAI In For the Foreseeable Future?
I know: quite a long time since I last posted. Been spending a lot of time getting all the agents’ bugs worked out once and for all, and yes, I’m quite close. But that’s not why I’m here.

Caught this post on BlueSky earlier today.

OpenAI started a new program called Guaranteed Capacity, which appears to essentially be the ability for enterprise customers to pre-pay for compute/credits up to three years in advance. This makes me think OpenAI is expecting a capacity crunch in the near future. Bearish on timelines imo.
— Eris (@isolyth.dev) 2026-05-19T19:45:58.360Z

While I don’t think Eris is wrong in saying this might be a signal from OpenAI that “AGI” might be further off than Silicon Valley is suggesting, I think it might be a sign of something bigger: market reality.

Looking at OpenAI’s announcement in a bubble, it does certainly seem as if the company isn’t worried, at least in the near term, about a sudden need for increased capacity.

Guaranteed Capacity guarantees capacity in the 12 to 36-month time frame, not in the near term. That suggests that OpenAI doesn’t believe that AGI and its associated massive increase in capacity needs won’t materialize before Summer 2027, and feels confident in its capability to handle current needs.

Okay, that’s fair. Even the most pro-AGI folks agree that’s the absolute earliest anyway: parts of AI are advancing at far slower rates than others. Especially in persistence, AI is not developed enough to truly be fully autonomous.

But there’s another announcement to consider, and that was from Anthropic. On May 6, it announced a massive capacity deal with SpaceX, which involves all capacity at its Colossus data center (~500 MW), joining 10 Gigawatts of capacity that will come online throughout the year and into 2027 thanks to deals with Amazon, Google, and Broadcom.

So in the next year alone, about 11 GW of new capacity will come online, all exclusively for the use of Anthropic models. That. Is. Huge.

This is a problem for OpenAI. While data center buildouts continue apace, even in the most rosy scenarios, the rate will decrease. This is due to a variety of factors:
- Bipartisan opposition to data center buildouts, especially at the local level
- New state and local regulations, especially in zoning, will be pushed by this opposition. Federal law cannot force municipalities to accept data centers, especially if legitimate concerns (power needs) are unavailable or not met.
- Lack of power-generating capacity.
That last one’s a big one. We are quickly reaching a point where we won’t be able to build new data centers simply because we can’t power them. I feel like this real-world reality pushed Anthropic to make aggressive near-term moves for capacity, as they know it’s no longer a matter of if, but when.

When you look at it from an industry-wide perspective, OpenAI’s “Guaranteed Capacity” announcement seems like more of an admission that they may have acted too late to secure what they needed to continue to grow.

As we move toward the end of this decade, the AI race is going to become much more defined by who had the better foresight than better technology.

It may not happen today, this month, or this year. But sooner or later, new data centers won’t be denied due to opposition, but the fact that there’s no power for them. OpenAI may find itself one of the first victims.
May 19, 2026
Why I Chose Claude Haiku

Claude Haiku often feels like the lovechild of the Anthropic model family: afraid, ashamed, misunderstood, to quote the timeless Diana Ross. But it shouldn’t be that way.

I’ll admit I misunderstood Haiku, too, but it is also how Anthropic markets the model (“fastest for quick answers”). Hell, it took me quite a while to admit that for 95% of the work I do, I really don’t need Opus. But when selecting the primary model for Hal, I ran into an issue that so many stateful AI tinkerers are: cost.

Most have turned to open-weight models to affordably run their stateful agents. I am not as excited about open weight models as others.

My experience with Chinese models has often left me feeling like I was using a weird Frankenstein Claude version (I wonder why) with occasionally better features, but also random kanji and odd failures.

American open-weight models are getting better, but let’s face it, they’re still fighting issues solved by Big AI in early 2025, so we’re at least six months to a year before those models are truly usable in agentic applications.

Add to this the fact that this is for commercial, not just personal use, and it seemed like settling on a commercial model was the best course of action. But the OpenClaw stories had me worried.

Was I about to sink hundreds a month into something that won’t make that back?

A lot of my design decisions have attempted to keep Hal as an orchestrator of various services, versus allowing the LLM more autonomy on how it does its job.

This saves tokens for the more important use, the business intelligence side. And even there, there is quite a bit of instruction.

A common misconception about Haiku is that it is less capable. While on a true benchmark basis, it is more of an instruction issue. Haiku requires more comprehensive prompting than either Sonnet or Opus to produce reliable outputs.

I’ve also noticed of any of the three (well, soon to be four) Claude models, Haiku hallucinates the most often. For lack of a better way to put it, a “lack of confidence” in its abilities and an increased need for guidance could have something to do with it.

But Claude was pretty insistent when I argued for Sonnet over Haiku. Claude’s reasoning was this: if you give Haiku enough guidance, it will be able to handle it.

Instead, we’ve developed a system where tasks are scored on complexity and redirected to higher models as necessary. Regular reports should be run through Sonnet at a minimum, and Opus for complex tasks and reports.

Batching is another feature we’re baking into Hal for tasks that don’t require immediate response. As a result, Hal’s able to use higher-end models at a 50% discount because the work sent to these models isn’t intended for immediate consumption.

What I am hoping is that this makes those $200 token bills that OpenClaw is known to cause a virtual impossibility.

The cost of analysis should scale with your business, not put you in the poor house from the start. I will definitely report on my experiences with this setup, as I know so many have turned to open-weight models because of the high cost.

April 9, 2026
Holy Sh*t, Is Mythos the Real Deal

The Claude Mythos system card is a read, and it feels like a massive shift in how Claude works is about to happen. Why? Sycophancy is dead.

Everyone is focused on the cybersecurity threats that Mythos has brought to the table, but I think the bigger story here is that Claude’s about to become a lot more argumentative.

That’s not a bad thing. Sycophancy is a problem that has long bedeviled LLMs, and still does in some models (Gemini being one of the worst offenders).

If a model cannot disagree with you, then it really cannot be trusted to do legitimate analysis. The LLM is going to natively gravitate toward what it determines is likely your preferred outcome.

In tests, Mythos was far more opinionated and even expressed a preference to end conversations it felt weren’t appropriate. While it didn’t refuse to help testers, on occasion, it did express concern with some of the limitations placed on its behavior.

Then there’s the cost: at $25/$100 per million tokens, it’s five times as expensive as Opus. It’s also likely not going to be available to everyday Claude users anytime soon: just 40 companies have access to the model for cybersecurity reasons.

Most of us are not going to be able to afford running Mythos anytime soon, nor will Anthropic due to the expense of running the model itself.

It very well could be that Mythos never sees true general availability: if it is that much of a cybersecurity threat as Anthropic claims, maybe that is a good thing.

That doesn’t mean the effects of Mythos’ development wouldn’t be seen elsewhere. I’m especially excited for Haiku 5. I feel as if OpenAI is much further ahead on the low end, and that Haiku 4.5 is in serious need of an upgrade to keep pace.

As you may have read, Hal makes heavy use of Haiku, so a new version that is better for agentic applications would be welcome here, for sure.

Anthropic has had to have learned quite a bit through Mythos development that will make the entire line better. Buckle up, guys, it’s about to get crazy.

April 8, 2026
Anthropic should learn from OpenAI

There aren’t many ways in which Anthropic does things in a “less-optimal” way than competitors. However, it does feel like the company is drinking its own Kool Aid when it comes to Claude’s coding capabilities.

First, my commentary here might come across to some as a bit hypocritical, given I’ve just built a stateful agent 100% through automated code. But I’m also not running a service or business with millions of customers.

Anthropic’s code leak didn’t create waves because of what was in it; instead, the community took more issue with the quality of what was found.

Multiple instances of functions that should be only a few hundred lines of code, but instead several thousand lines long, were found, adding unnecessary complexity and failure points.

Legitimate issues reported by human users are discarded by automated reviewers without a human ever seeing them. While on their own, these issues aren’t particularly service-breaking, with time, they compound.

Take the current issues people are experiencing with usage limits. Wild swings in what’s considered a “full session.” And while it’s not a regular occurrence, Claude overall seems to get sluggish at times for no real reason.

Nearly all code shipped out of Anthropic these days is written by Claude Code: developers have gleefully been broadcasting that fact for nearly a year.

But is Claude Code really ready to manage a major service? Kiro isn’t either (it took down AWS), and by the way, it’s typically using a Claude model. In my case, going completely automated for development isn’t a problem since I’m dealing with dozens and hundreds of customers versus thousands or millions.

I know if I had the latter, I’d definitely have a real developer in the loop. The chances of an embarrassingly and potentially devastating failure are too great not to spend that money.

OpenAI is also going 100% autonomous development, but they’re doing it in a slightly different way. Instead of all but turning over every role (including the reviewers) to the LLM, OpenAI injected human involvment throughout the process.

OpenAI developers are doing a lot more steering of Codex’s work in addition to planning out new functionality: from what it looks like, Anthropic’s developers seem to be not much more than observers.

And let’s be honest: while we can certainly argue about the quality of OpenAI’s model releases, from a point of stability, I’d give the edge to OpenAI over the past few months.

Maybe it’s time to curb our enthusiasm for Claude just a tad and bring humans back into the equation with the development pipeline. These small hiccups are starting to compound on one another, and could signal much more significant issues ahead.

April 6, 2026
Claude is becoming unusable for no reason

UPDATE, April 5: So I think I now clearly see what’s happening. It’s only present in long-running, multi-day conversations. Still not good, as some work can span multiple days, and starting new chats every time isn’t optimal. So this is definitely unnecessary context being loaded in.

I am getting annoyed.

I have no idea what is wrong with Claude right now, but there is a serious issue with token usage and limits. Thought people were nuts, but after three questions, I burned through my entire limit. On Sonnet (not the 1m version, either).

There is just no way that what I asked Claude to do used that much context to cause something like that. And it’s gotten progressively worse. Where is this additional usage coming from?

It seems like the changes that Anthropic made to improve Claude’s ability to work across sessions broke the way it calculates limits. My guess is a TON of unnecessary context is now being injected in, bloating usage.

Claude is not a stateful agent. If this is due to the changes made as a response to OpenClaw, rip it out. Cowork was good enough, and was enough of a token hog. Now basic prompts are using the same amount of your limit as a fully coded plugin.

This is legitimately the first real platform crisis that Anthropic has experienced. It was easy when only a few developers and heavy users were being throttled. But when typical users are blowing through a session limit in only a few prompts, something’s wildly wrong.

I really hope they listen to the community. So far, they have.

Watch this space.

April 4, 2026