#63 On Autonomous agents

A Retrieval augmented generation (RAG) system has 2 components – a retrieval system and a generation system. A human brain needs to retrieve information from its memory and only when it has the right information does it understand what something means and be able to reach to it, by either producing something (maybe a piece of art), or by applying it to something in context (say applied financial acumen). However, does the agent that rely on RAG to crawl through the right information and synthesize something using a algo-driven reasoning engine really understand what it generates or reasons over? There is a lot of philosophical discussion on what ‘understanding’ really means. After all, aren’t we all a bunch of cells rapidly performing the actions encoded in our DNA?

With increasing investments in AI, and with huge amount of cognitive capital now devoted to this space, human-machine interfaces will continue to evolve beyond keyboard, mouse, touch screens, natural language, and even beyond. No one knows what’s beyond natural language. But seeds have been planted, and frontiers have been mapped based on what is known today. And it seems the arena is gearing up for an era of autonomous agents.

You want to make a healthy dinner for your family but aren’t sure what to cook. You tell ChefBot, “Make a healthy dinner for four using tofu and broccoli.” ChefBot uses generative AI to understand your request. It knows you want a dinner recipe for four people, focusing on tofu and broccoli. Using RAG, ChefBot searches through vast database of recipes, scans your inventory in the kitchen, creates a shopping list for missing items, invokes Alexa to order over Amazon Fresh, and creates a set by step guide for you to prepare the stir-fry Tofu. With further advancements in robotics, the preparation piece will also, in the coming few decades, be taken care of by the ChefBot.

It seems like science fiction, but every technology starts out feeling ludicrous. And yet, a large proportion of what went as science fiction in the previous century seems pedestrian and forgotten back-end technology today. Who knows what the next century would bring.

There’s been lot of talk about the age of autonomous agents. With generative AI in its infancy, we are already seeing instances of agents being trained to perform series of steps entirely independently on the basis of simple NL prompts. These are integrated set of systems that rely on generated stimuli from the RAG-assisted AI models and these systems are information processing engines that can take real decisions and actions on the web world.

As these models expand, and as the infrastructure (think picks and shovels), become more commonplace, we are likely to see an expansion of these so-called autonomous agents which will make IFTT look like ‘Clippy’ of the early 90s.

What are these agents?

Agents are touted to be the next big platform after the era of Android or iOS. Just as these mobile platforms made writing apps easier, so will the agents make it easy for anyone to build web apps without knowing how to write code. Imagine what this democratization of app-development can do to the world of software.

Agents are currently positioned as a helper, a co-pilot, an intern, or an assistant. The more these agents learn about our history, our choices, our decisions, our preferences, and our experiences, the more they will learn to get closer to the actions we would want to take for a vast majority of cases in our day-to-day life. And this is where agents will be able to shine. For many of our time intensive tasks like planning a trip, researching a tax code, writing an email, drafting a marketing copy, outlining a business model, these agents will get increasingly close to becoming a trained intern. And as they do, the feedback we provide them will translate into even better retrievals and accurate, richer generations.

Some notes on agents:

  • No single player can/will be able to monopolize the business of agents-building. General purpose platform for AI will take time to be even feasible, let alone economically viable.
  • Agents will transcend modalities, across text, speech, vision, etc. it will increasingly be a multi-modal world of AI.
  • User interface design as we evolve beyond CLIs to GUIs to GAI (generative AI interfaces anyone?) will be an interesting space to watch.
  • Agents will change how apps are consumed first. Chat-based UI will likely not last long, especially for iterative, branched, asynchronous prompting that normal human workflows and conversations feel like.
  • Then, agents will change how apps are written. Across both design and programming.
  • We are yet to understand fully how the world will change around this new general-purpose technology. It’s like predicting the rise of sub-urban economy because of Ford Model T.
  • Vector databases may not be the definitive way of storing the data that agents can retrieve. We will likely see advancements in the database technology rising to support the various use cases these models will advance. Something the modern data stack will likely see as their next big challenge.
  • We will diverge first before converging. There will likely be different agents for different use cases until the time when we will be able to train a more general-purpose agent. A wave of consolidation will ensue as a result.
  • Interaction models will be a big area of research. Thought around user experience, especially to manage expectations and effectively communicate capabilities will be a strategic differentiator.
  • Technology confluence (aka AI-generated and populated metaverse, autonomous agents on blockchain) will be more hype than substance. Meaningful progress with different ‘wannabe’ general-purpose technology will stagnate and enter multiple cold waves.
  • Hallucinations or statistically probable but factually incorrect information will overwhelm the internet, superseded only by fake news generated by army of trolls equipped with generative technologies.
  • Monitoring AI outputs will make a new industry, with a new observability paradigm required.
  • Enterprise use cases of generative AI will explode, but only when data security standards get defined and become table stakes.

We are, in a sense, applying our current-day concepts to a new technology and it increasingly seems anachronistic although we cannot really pinpoint why. Chat interfaces seem like rudimentary and merely a beginning. As the richness of software ecosystem percolate, we are going to see lots of experimentation and bold adventures with modalities and interfaces. The more experimentation the better, for in promoting the diversity of these approaches, we will naturally hedge humanity’s bet against itself.

Further reading:

https://www.gatesnotes.com/AI-agents

https://www.ben-evans.com/benedictevans/2023/10/5/unbundling-ai

https://stratechery.com/2022/the-ai-unbundling/

https://sundaylettersfromsam.substack.com/p/where-is-the-autonomy-in-ai