This. Is. Huge.

Curbing artificial intelligence’s resource usage is probably the single biggest technological challenge of the late 2020s. AI’s capabilities aren’t “leveling off” as many had expected: LLMs appear to have plenty of runway left.

But even as pro-AI as I am, we cannot continue on the current path. There is not enough power available. In tandem with AI development, we must also develop better ways to make use of resources.

I missed this news, but late last month, Google announced TurboQuant. The technology uses a “key-value cache,” where bits of information are stored so the LLM doesn’t have to recompute them.

For basic LLM tasks, this works well and doesn’t dramatically increase the size of the cache itself. However, as you ask the LLM to perform more computationally intensive and complex tasks, the size of this cache can explode, in turn requiring more memory.

I’ll ask you to go to the article above for a bit more detail on what Google is doing, so I don’t butcher it. However, by simplifying the instructions for the LLM on how to find these data bits, researchers were able to reduce memory usage by six times.

That isn’t a small difference. If that’s hard to visualize, think of it this way: using TurboQuant, you’d be able to store six times the data in the cache using the same amount of memory!

Undoubtedly, other researchers will discover other ways to further reduce AI resource consumption. AGI is closer than we think, and that is all but impossible using today’s technologies, especially around resources.

The broader AI community needs to push for this as well. For too long, AI has pushed ahead without thought for the significant amount of resources required. That’s going to catch up with us sooner or later.