Breaking up GPT Monoliths and Improving Performance

This post explores how a Hypermode customer iterated with AI to solve an old problem (spam accounts & junk posts). Hashnode improved accuracy and reduced costs after moving from point solution to GPT monolith to specialized model using Hypermode to test along the way.

We believe iteration velocity is the key to unlocking great products.

One of the problems facing engineering leaders is knowing how to build with generative AI at scale and over time. It's generally believed that this generation of AI will dramatically improve efficiency, but many aren't sure how to reap those benefits.

We've spoken with many teams who quickly built prototypes only to stagnate as they wrestle with rapid tidal shifts in AI trends and technology. Most start with broad LLMs like OpenAI, but some try to jump directly to specialized models. Rarely does either end of the spectrum immediately yield strong results. Most end up needing something in-between -- a compromise between effort, efficacy, and cost.

My observation is that most companies fail to plan for experimentation and iteration with AI (completely counter to standard engineering practice) because they start with trying to directly apply AI rather than starting from “what are my most pressing problems” and asking, “how could AI solve this problem in a better way?”. The world's excitement about AI has many of us approaching problems backward.

Hashnode

However, Hashnode took an exemplary approach. They started by stating their problem: “spam accounts and junk posts are costing us real users.” They'd already solved this problem using Akismet, an API service for spam detection, so they were catching generic spam posts. But sophisticated, platform-specific attacks persisted.

Looking at LLMs, they thought they might be able to catch a larger percentage of these posts by training a model to recognize good vs bad posts. They started by running an experiment using OpenAI. This was straightforward and, with minimal training, they were catching more spam.

However, Hashnode had two issues:

Costs - Using OpenAI for spam detection on every post at their scale was prohibitively expensive.
Stalled development - Iterating on different data mixes, prompts, or models was extremely difficult and slow because the spam classifier was hardcoded as a single monolith.

We approached Hashnode to create an abstraction above their AI workflow. After unlocking Hypermode for Hashnode, they were able to quickly iterate on their AI-powered apps without breaking what was already working.

By rewriting their spam classifier into a Hypermode function, they were able to rapidly test several models and methods. They experimented to find that a multi-modal approach with a small, specialized model outperformed OpenAI both in terms of how much spam was captured and reducing the cost to operate the service.

They now automatically catch more than 95% of spam posts (up from 60% with Akismet and 80% with their OpenAI solution), putting the remaining suspected spam posts in a queue to be manually scored by community moderators. Those results are then fed back into their Hypermode function, which now costs them pennies on the dollar to retrain each time.

Takeaways

If anything, what we should take from the latest boom in AI investment that the core principles of good software development still apply. Be thoughtful about scope and process. And more than anything else, iteration velocity is the key to unlocking great products.

JUNE 6 2024

Breaking up GPT Monoliths and Improving Performance

How Hashnode improved spam detection accuracy and reduced costs by moving from a GPT monolith to specialized models with Hypermode

Hashnode

Takeaways