Smaller, Cheaper, Faster, Sober

All you need for an best-in-class AI API call.

It’s been 16 months since GPT-4 launched and we find ourselves in a good, but unexpected situation:

1. Quality has hit a wall, but is converging.

Foundation models, both proprietary and open, are all approaching or matching GPT-4 class models in quality. There’s plenty of options to choose from, each with their own strengths and weaknesses, but overall performance for general models is stable.

2: The best models are getting cheaper, smaller, and faster.

Amongst the leading model builders, while quality has plateaued, efficiency has skyrocketed. GPT-4 is currently $30/1M tokens compared to GPT-4o which is $5/1M tokens. And GPT-4o is so good that GPT-4 is categorized under “Older Models” on OpenAI’s pricing page (seriously, scroll down). And now: GPT-4o-mini, which I drop-in replaced for a few pipelines yesterday with no noticeable issues, is $0.15/1M tokens!

Anthropic’s models have followed a similar path, as have open foundation models from other makers (I continue to be impressed by Microsoft’s minuscule Phi-3 Mini).

3. We’re getting smarter about training.

Everyone was shocked by how effective throwing ALL the text into a single model was when GPT-3 arrived. For awhile, that was THE move. But we’ve since learned how to be more selective with the data used to build effective, smaller models (maybe don’t throw ALL of the internet into your training data…)

4. Enterprises are foregoing general models for open ones trained for single tasks.

And it’s not just foundation training, but fine-tuning training. Teams are learning that small models trained for a single purpose can outperform the best general models out-of-the-box. Who cares how good GPT-4o is at answering random questions when all we need a model to do is one specific task in a pipeline? By training smaller models, enterprises get to differentiate themselves and run cheaper, more accurate pipelines.

So we take all these points and ask…is the AI field getting smaller and more practical? Or is this the quiet before the storm of a step-change in foundation model quality? After hearing so many breathless rumors and promises about GPT-5 being right around the corner (and close to AGI!), I’m inclined to take the former position.

And with costs falling so quickly, new applications open up. Just yesterday, Batchmon noted that GPT-4o mini’s costs are now cheap enough to support with advertising. While this may seem ominous, as open models approach GPT-4o mini’s quality and cost (and they will!), we’ll see a diversity of use cases emerge from the flexibility and customization they allow.

So the big investments can continue to hold their breath for AGI, continuing to build larger and larger data centers. But all the while small, cheap, fast, sober AI will remake the ground beneath them.