Why I Chose Claude Haiku

Claude Haiku often feels like the lovechild of the Anthropic model family: afraid, ashamed, misunderstood, to quote the timeless Diana Ross. But it shouldn’t be that way.

I’ll admit I misunderstood Haiku, too, but it is also how Anthropic markets the model (“fastest for quick answers”). Hell, it took me quite a while to admit that for 95% of the work I do, I really don’t need Opus. But when selecting the primary model for Hal, I ran into an issue that so many stateful AI tinkerers are: cost.

Most have turned to open-weight models to affordably run their stateful agents. I am not as excited about open weight models as others.

My experience with Chinese models has often left me feeling like I was using a weird Frankenstein Claude version (I wonder why) with occasionally better features, but also random kanji and odd failures.

American open-weight models are getting better, but let’s face it, they’re still fighting issues solved by Big AI in early 2025, so we’re at least six months to a year before those models are truly usable in agentic applications.

Add to this the fact that this is for commercial, not just personal use, and it seemed like settling on a commercial model was the best course of action. But the OpenClaw stories had me worried.

Was I about to sink hundreds a month into something that won’t make that back?

A lot of my design decisions have attempted to keep Hal as an orchestrator of various services, versus allowing the LLM more autonomy on how it does its job.

This saves tokens for the more important use, the business intelligence side. And even there, there is quite a bit of instruction.

A common misconception about Haiku is that it is less capable. While on a true benchmark basis, it is more of an instruction issue. Haiku requires more comprehensive prompting than either Sonnet or Opus to produce reliable outputs.

I’ve also noticed of any of the three (well, soon to be four) Claude models, Haiku hallucinates the most often. For lack of a better way to put it, a “lack of confidence” in its abilities and an increased need for guidance could have something to do with it.

But Claude was pretty insistent when I argued for Sonnet over Haiku. Claude’s reasoning was this: if you give Haiku enough guidance, it will be able to handle it.

Instead, we’ve developed a system where tasks are scored on complexity and redirected to higher models as necessary. Regular reports should be run through Sonnet at a minimum, and Opus for complex tasks and reports.

Batching is another feature we’re baking into Hal for tasks that don’t require immediate response. As a result, Hal’s able to use higher-end models at a 50% discount because the work sent to these models isn’t intended for immediate consumption.

What I am hoping is that this makes those $200 token bills that OpenClaw is known to cause a virtual impossibility.

The cost of analysis should scale with your business, not put you in the poor house from the start. I will definitely report on my experiences with this setup, as I know so many have turned to open-weight models because of the high cost.