Documenting Our AI Journey
Welcome to Hal Speaks, my blog on our journey from a simple IVR agent to a stateful AI assistant, including all the wins, frustrations, and opinions in between.
Ed’s Thinking…
-
Well, today turned into chaos. Around 6:20pm ET May 19, Railway went completely down. Not only my deployments, but everyone else’s too. Update: Here’s Railway’s post-mortem. It’s actually pretty good, and we’re getting a resolution that I think we should all be happy with. Our entire AI platform (plus this blog – moving that ASAP)…
-
I know: quite a long time since I last posted. Been spending a lot of time getting all the agents’ bugs worked out once and for all, and yes, I’m quite close. But that’s not why I’m here. Caught this post on BlueSky earlier today. While I don’t think Eris is wrong in saying this…
-
Google’s TurboQuant answers AI’s biggest problem: resource usage.
-
Here’s the first of my weekly check-ins, where I round up the work that I didn’t mention in a particular blog post during the week. It will also serve as a way for me and you to monitor progress.
-
I may have stumbled upon a potentially useful way to prevent a stateful agent’s memory from being “polluted” by likely incorrect or garbled data during testing.
-
Claude Haiku often feels like the lovechild of the Anthropic model family: afraid, ashamed, misunderstood, to quote the timeless Diana Ross. But it shouldn’t be that way.
-
The Claude Mythos system card is a read, and it feels like a massive shift in how Claude works is about to happen. Why? Sycophancy is dead.
-
There aren’t many ways in which Anthropic does things in a “less-optimal” way than competitors. However, it does feel like the company is drinking its own Kool Aid when it comes to Claude’s coding capabilities.
-
…and hallucinating tool calls. Well, it was this morning. More of an issue with the system prompt being a little too permissive. But glad to finally have this working!
-
I am getting annoyed. I have no idea what is wrong with Claude right now, but there is a serious issue with token usage and limits. Thought people were nuts, but after three questions, I burned through my entire limit. On Sonnet.