The AI Outlook for the Balance of ‘24
Forecasting anything in tech can be tricky business. Yet, as we focus more of our work on AI technology in marketing, I find myself keeping notes about what we should anticipate next, because its nearly impossible to stay with the pace of new developments. Add to that, as a consequence of client work, the challenge of answering (or in some cases deferring) questions about what’s coming when the accuracy of those answers can be measured in days. However, at the same time, there are some predictable aspects of evolving AI that we can depend on, and need to monitor.
So, I decided that it was worth doing this exercise once for all our clients and readers. Here is my educated “guestimate” of some specific developments we can expect for the balance of 2024. We had a tough time coming up with a typical “top-10” list, so I decided to focus on 8.
It will be interesting to look back on this at the end of December. First, I want to extend a big thanks to our part-time CTO, some amazing folks I follow at IBM, and our Sr. AI Tech lead for helping decide what we think will be most likely and impactful; I cannot take credit for the details they helped me summarize for each of the items. However, now done, if I look across the AI projects we’re engaged in, these eight items are the most obvious things to watch for in the second half of ‘24 as it gets underway today.
1. Reality Check
Let’s start with my sense that the important thing we’re going to witness, is some cooling of the AI hyperbole in the 2nd half of 2024. In other words, we should see some more realistic expectations emerge. My tech colleagues are a follower of Gary Marcus (you should check out his Substack), and not that he’s intentionally an AI curmudgeon (he’s clearly not), but like our CTO and our AI tech guru, he’s seen a lot rise and fall in tech over the years, and given his roots in cognitive science and AI, he’s well-positioned to understand what’s behind the breathless claims, promises, and forecasts. The bottom line is all the predictions about AGI (artificial general intelligence, not “adjusted gross income”🤓) are currently far more hype than substance. So, look for more focus on what can actually be done (especially beyond GPT bots and content generators). Our bet, of course, is the real action will be in closed domain natural language agents using interactive and conversational AI digital retail concierges.
2. Multimodal AI
This is about the in-take of multiple layers of data, and that’s my 2nd thing I’m watching for, which is the ability for AI agents to accept and process multiple types of content (text, pictures, audio, video, etc.).
We already have multi-disciplinary models like OpenAI's GPT 4o and Google's Gemini, which can move freely between natural language processing and (for instance) computer vision tasks, so users can ask for example, about an image's content and receive a natural language response. Or they could ask out loud about instructions to do something and receive visual aides and side-by-side textual instructions. For product and service concierges this will become imperative as the AI agent guides its best customers through a product evaluation process, for instance.
New models are also introducing video into the mix, and where this gets really interesting is when multi-modal AI models can process more diverse data inputs. That will expand the information for training and inference. As an example, consider ingesting data from video cameras. So, my guess is there's a bunch more to come this year for multimodal AI models.
3. Smaller Models
Massive models jump-started the generative AI rush, but not without drawbacks as we've discussed in prior articles. However, in all the excitement, people have been largely overlooking the reality that this stuff costs. A lot. And as adoption of actual AI services begins, I’m anticipating a migration to smaller models. And the cost factor will be a great motivator.
According to one study from the University of Washington, training a single GPT-3 size model requires the yearly electricity consumption of over 1,000 households (!) So you might be thinking… “Sure, that's training, and we know that's compute-wise expensive, but what about inferencing?” Well, a standard day's worth of ChatGPT queries is more than the daily household electricity consumption of something like 33,000 households (!)
Geek alert 🤓! We’re going to nerd out a bit…
Smaller models are far less resource-intensive. And it turns out that in fact, much of the current on-going research and innovations in LLMs is focused on yielding greater output from fewer parameters. To give you some idea, GPT-4 is rumored to have about 1.76 trillion (that’s twelve zeros, so “1,760,000,000,000”) parameters; however, many open-source models are having success with sizes in the 3B to 17B parameter range – that is, billions instead of trillions.
In December of last year, a company called Mistrial released "Mixtral" that is a "mixture of experts" or “MoE” model, which integrates 8 neural networks, each with 7B parameters. Mistrial claims that Mixtral not only outperforms the 70B parameter variant of Meta's open-source Llama-2 model on most benchmarks (at 6 times faster inference speeds), but that it even matches OpenAi's far larger GPT 3.5 when measured ot a t an most standard benchmarks.
The triple-net is that smaller parameter models can run at lower cost and even on personal devices (such as you may have heard embedded within the Apple announcements recently at their worldwide developer conference). That suggests laptops and even generously configured mobile devices (tablets) could run their own LLM without resorting to interacting with one of the Tech-titan’s LLMs in the cloud. That brings us to the next 2024 guestimate I’m watching...
4. GPU + Cloud Costs
We’re seeing that necessity, as much as entrepreneurial spirit, is driving innovations in smaller language models. The larger the model, the higher the requirements on AI computer power (GPUs) for training and inference building.
Today, few AI adopters (and developers) maintain their own computing infrastructure, which puts upward pressure on cloud computing costs as cloud providers continuously improve their infrastructure to meet those demands, in a market where all providers are also competing to acquire the necessary GPUs to power their infrastructure and AI compute offerings. Suddenly the recent market valuation of GPU chipmaker Nvidia makes sense.
So, if models could be better optimized, they would require less compute power, and that could reduce costs and/or improve margins. Which brings me to the next thing I think we’re going to see more of for the balance of 2024.
5. Model Optimization
Geek Alert 🤓! We’re going all nerd again for a moment…
From all we are learning at break-neck speed here at C[IQ], clearly we can see that making better, more efficient models is also a serious business driver. So, last year into now, we’ve watched new training techniques emerge and more effort invested in fine-tuning “pre-trained” models.
The best way I can think to explain it is to consider how you shrink the file size of an audio or video file. The classical “compression” method is to reduce the recording or image’s “bitrate” (which is the amount of data required to create the sound or image at a lower resolution or fidelity where that lower quality still yields a generally acceptable sound or image). So akin to adjusting bitrate, the mathematical technique in model optimization to effectively do the same is called “quantization” and it lowers the precision used to represent model data points. This change of data representation reduces the memory consumption and accelerates inferencing. So, be on the lookout for news about more “optimized” models in the balance of this year.
6. Custom Local Models
If you’ve been reading my posts here about how to leverage natural language agents to provide new levels of digital concierge services in retail, you’ll recall that we’re acutely focused on the concept of bounded data spaces or knowledge bases that contain all of and only the data about a particular product, product line, or service offering. This is to avoid the undesirable and unwanted side effects of LLMs drifting and hallucinating because of the morass of data scraped from the digital world to train upon. That means that in some cases your Agent could provide highly unflattering feedback about one of your products.
Well, in addition to the notion of a closed, bounded knowledge base comes the idea of far more targeted, custom AI models. And I think that’s going to be yet another development this year.
It starts with public technology or “open-source.” Open-source models (like Mixtral or Llama) offer the opportunity to build custom models for very specific applications and uses. For example, this means models that train on an organization's proprietary data (knowledge base), fine-tuned for their specific needs (e.g., infinitely scalable AI concierge service).
The benefits are probably obvious: restricting AI training and inferencing locally to your bounded data space means you avoid the risk of your proprietary data or customers’ personal information from being used to train other proprietary models or pass into the possession of 3rd parties.
Then we use a data management technique called “retrieval augmented generation” or “RAG” to access your knowledge base in combination with an LLM to generate the conversational capability of the AI agent, and do not allow any of your data to pass into that LLM. Incidentally, this also reduces model size. I think there is little doubt that as news continues to drip out about the unreliability of traditional LLMs, the custom local model and knowledge base will emerge as the viable (and preferable) solution to real and useful applications of AI. And this brings me to my next forecast: “It's the data quality, stupid.” 🤣
7. Data Quality
But seriously, that’s right; your systems are never one bit better than the integrity and quality of your data, and we all know “GIGO” – garbage in means garbage out. So, without a doubt, focusing on high-quality, well-managed data will be essential for more effective AI outcomes. Rather than focusing only on fine-tuning the algorithms and models themselves, the focus will be far more simply on ensuring quality data and that will be more of an undertaking than assumed because unfortunately, many organization’s data quality is not good.
8. Rise of AI Service Agents
This gets to the heart of our work at C[IQ] and I think the definite next cutting edge before the end of this year. We’ll see if I call this correctly in a few months, but “virtual agents” will go far beyond the current pre-programmed customer service chatbot because this new class of AI agent enables and leverages task automation. I’m talking about how agents (concierges) can actually do things for you – on a permissioned basis, of course, and on your behalf; for example, making airline reservations, performing application processing, or helping you shop. And importantly, these agents will be able to connect to other services (or agents).
This is precisely where we're working right now, and my hope is that we can help build one for your business too!
That’s a Wrap
And there are my 8 forecasts, “guestimates,” or dare I claim “predictions.” Let’s see how close I am, come December. In the meantime, I’d love to learn what you think!