Tag: UI/UX

  • Week 1 Check-In: Going Better than I Thought

    Here’s the first of my weekly check-ins, where I round up the work that I didn’t mention in a particular blog post during the past week. It will also serve as a way for me and you to monitor progress.

    So, being the first week of operation, not a whole lot to say just yet. Surprisingly enough, about 90% of Hal’s code was bug-free at launch, so most tools worked “out of the box,” but many were still quite rough around the edges. A lot of this week has been spent bug-squashing.

    Wins

    1. I gotta credit Claude Sonnet and Kiro here, I had the idea, but it wouldn’t have worked if Claude Code/Kiro hadn’t coded it so well. These bugs are annoyances, not show stoppers. Don’t think Opus would have done that materially better to justify the added cost, to be honest (Sonnet 4.6 is basically Opus 4.5 anyway).
    2. Hal is inferring things without us even telling him. A billing error caused IONOS to shut down our server: we noted that Ben had already figured out it was likely an external issue based on the available data and not a crash. I wasn’t expecting that.
    3. Costs remain low. The biggest one-day expense so far has been $1. A code bug put Hal on Sonnet briefly this week. Had it not, I would have spent only $2.00 for the entire week!
    A slow, gradual increase…

    Challenges

    1. Hal is helpful, perhaps too much so. Monitoring that he isn’t hallucinating tool calls again, or promising things he can’t do. I’m calling it “overeagerness.”
    2. Hal isn’t truly autonomous just yet. He’s still operating on a set schedule for the most part.
    3. UI design for the web front end is proving a bit trickier than I had thought. This is an area where I want to focus on: OpenClaw requires setup out of the box. This ships with a UI that works on any device, which feels a lot like Claude Desktop or ChatGPT. But getting elements to work has been a hassle.
    Hal looks like Claude and ChatGPT on purpose, making it easy to use for anyone.

    Notable New Features

    A lot of work this week ended up being in monitoring and security. I can honestly say our server is now prepared as much as we can for any AI-caused security hell on its way. Hal is actively monitoring for attacks using CleanTalk, and combined with CleanTalk, can block access to our site via that and Bunny.net, our CDN.

    He’s also got monitoring for our deployments on Railway as well. It’s basic at the moment, but we’ll know of issues (and attacks) faster than ever before, and have the tools to diagnose and restart services if necessary.

    Best of all? By next week, he’ll be connected to our Better Stack account, commenting on incidents with full summaries of his findings and any actions.

    Something like this can easily run a company thousands of dollars a month: heck, for even the most basic premium functionality, Better Stack is $25/month, per user.

    We’re also working on a feature to bring some more autonomy to Hal’s workday. Based on Strix’s Perch Time, Hal’s Heartbeat is a scheduled work period every two hours throughout the day. These are intelligently scheduled by Hal based on workload and the task context itself.

    This Week’s Goals

    My goal for the upcoming week is to finally squash the remaining data glitches that still remain. For some reason, Hal can pull tools on demand, but they’re not appearing in the morning email digest.

    Another goal is to get Hal to use his heartbeat to work on a proposed action without me prompting him to. As it’s a new tool, I’m not expecting autonomous use just yet.

  • Good AI vs. Good Enough AI

    It feels like, within the past few months, there has been a fairly dramatic shift in what’s making a splash in AI.

    Up until recently, it seemed like everyone had an AI announcement of some kind. And the community gave much of it either a pass or a thumbs up. AI for everyone!

    But what’s moving the community is no longer AI making it into yet another application: increasingly, even the pro-AI crowd is starting to call out the slop, or the exaggerations of Sam Altman and Co.

    I’m not saying these AI announcements are worthless: many of them fall under the category of “good enough” AI. Think of the early Google AI search summaries. In 9 out of 10 cases, it was providing a generally acceptable response.

    Throw an “AI can make mistakes” on it and call it a day. But even if Gemini is getting it right almost every time, when it didn’t, it was embarrassingly bad.

    Not picking on Gemini, but even the incredibly successful Nano Banana image and video model falls under “good enough.” Yes, it creates stunning imagery and videos, but each is its own creation: you can’t easily expand upon a character or scene you liked.

    The next time you create it, it will look different. Great for short form, but not much else.

    Good enough AI is also why even those of us who aren’t completely drinking the Kool Aid have a hard time convincing people that their concepts of AI’s capabilities are extremely dated.

    With so much half-baked and useless AI floating around, you can’t blame them. Most folks’ experiences with the technology are not positive.

    My 67-year-old mother is a perfect example. She’s not a Luddite: the woman has had an iPhone since I handed down my original iPhone 15 years ago.

    But she hates IVRs, especially the ones that tell you to say what you want. The last experience got her so heated she literally said, I want to talk to somebody that ACTUALLY BREATHES!

    (I was in my office finishing up a Ben task, so the irony of working on my own AI customer service agent while my mother was struggling with another was not lost.)

    Add to this the absolute lack of any moderation across services like Facebook, TikTok, and Instagram these days, where crappy AI-generated video after video is pushing out real content.

    These observations and experiences have informed my decisions when it comes to Hal and Ben. I want people to walk away from their experiences impressed, not frustrated.

    If we’re going to change people’s minds about AI, we need to stop building half-baked projects. “AI can make mistakes” is now a cop-out. There are plenty of ways to all but guarantee a correct answer. Take the time to ensure it’s not hallucinating.

    Ask, does AI really belong here? Focus on the interaction. That’s what makes AI different from any previous computer-human interface.

    That interaction must be the focus now. AI development has focused, as it should, on making AI more accurate. Now we need to work on making it more interactive.

    To me, the quality of interaction is the key differentiator between good and merely good enough AI. And that interaction isn’t just the way the AI communicates with the user. It’s also how it listens and acts.