Groundbreaking multi-agent systems (MAS, for short) are transforming the way AI models collaborate to tackle complex challenges.

For a bit of timely, high-profile context, two weeks ago, OpenAI unveiled its latest model, GPT-4o. Mira Murati, the company’s chief technology officer, hailed it as the “future of interaction between ourselves and the machines.” What sets GPT-4o apart is its ability to engage in expressive, humanlike conversations with users in real-time. You can now speak to a state-of-the-art large language model that not only understands your words but is now also engineered to respond in a natural, intuitive way. This isn’t so much an LLM innovation as a stitching together of an LLM with existing tech, but in terms of usability it’s an awesome step forward… for some in the direction of the famous Spike Jonze A.I. film “Her”.

Not to be outdone, just a day later, Google DeepMind head Demis Hassabis showcased Project Astra. This early version of what Hassabis describes as the company’s endeavor to “develop universal AI agents that can be helpful in everyday life” marks another significant step forward in the AI revolution. You can check out the link to see demos of the Project Astra agent being used (via Google Pixel Phones or prototype glasses that aim to build on the ultimately floppy release of Google Glass a decade ago) to analyze real-time video in order to explain physics, literature and landmarks… even solve math problems on a whiteboard!

These launches are part of a larger trend across the tech industry to create chatbots and AI products that are more useful and engaging. Show GPT-4o or Astra pictures or videos of art or food that you enjoy, and they can provide you with a list of museums, galleries, and restaurants tailored to your preferences.

As impressive as these AI agents are, however, they still have plenty of limitations when it comes to executing complex tasks. For instance, if you ask them to plan a trip to Berlin based on your leisure preferences and budget, including which attractions to see, in what order, and what train tickets to buy to get between them, they are likely to disappoint.

This is where multi-agent systems come into play. By enabling LLMs to work together, researchers are unlocking new possibilities for AI to perform intricate jobs. Recent experiments have shown that teams of LLMs in a multi-agent system can assign each other tasks, build upon one another’s work, and even engage in deliberation to find solutions that would be out of reach for any single AI model, all without the need for constant human direction.

In one remarkable example from DARPA, for example, a team of three agents named Alpha, Bravo, and Charlie worked together to find and defuse virtual bombs. Alpha took the lead, instructing its partners on what to do next, resulting in a more efficient problem-solving process. Critically, this emergent behavior between Alpha, Bravo and Charlie wasn’t explicitly programmed, but rather was a result of the agents’ collaboration.

Researchers at MIT have also demonstrated that two chatbots in dialogue perform better at solving math problems than a single agent. By feeding each other’s proposed solutions and updating their answers based on their partner’s work, the agents were more likely to converge on the correct answer. In other potential real-world examples, this kind of “debate” between agents could potentially be applied to medical consultations or peer-review feedback on academic papers.

The power of MAS lies in the ability to split jobs into smaller, specialized tasks, with each agent possessing distinct skills and roles. At Microsoft Research, a team created a software-writing MAS consisting of a “commander” that delegates sub-tasks, a “writer” that writes the code, and a “safeguard” agent that reviews the code for security flaws. This approach resulted in code being written three times faster than with a single agent, without sacrificing accuracy.

However, there are potential downsides to MAS. LLMs can sometimes generate illogical solutions, and in a multi-agent system, these so-called “hallucinations” can cascade through the entire team. Agents have also been known to occasionally get stuck in loops, for example by repeatedly bidding each other farewell without breaking free.

Despite these challenges, the commercial interest in AI teams is growing. Microsoft’s Satya Nadella has emphasized the importance of AI agents’ ability to converse and coordinate, and the company has released AutoGen, an open-source framework for building LLM-based multi-agent systems. Other frameworks, like Camel, offer no-code functionality, allowing users to input tasks in plain English and watch the agents get to work.

As MAS technology advances, like any AI advances, there are of course potential risks. Malicious actors could exploit these systems by conditioning agents with “dark personality traits,” enabling them to bypass safety mechanisms and carry out harmful tasks. The same techniques used for multi-agent collaboration could also be used to attack commercial LLMs through “jailbreaking.” Hopefully the positive applications of MAS end up greatly overwhelming the negative ones going forward, including through research on systems for defending against misuse.

Very exciting… and there are truly limitless applications of MAS here. What in your industry could be automated or improved by MAS where single-LLM approaches aren’t sufficiently effective?

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.