Getting Up to Speed on Multi-Agent Systems, Part 2: The Vocabulary

25 Apr 2026

If you try to read multi-agent systems papers without the vocabulary, you will get nowhere. The field has settled on a shared set of words for the pieces of a system, and every paper now slots into those categories even when it pretends to be doing something novel. This post is about those words. Once you know them, you can read any paper in the field and know what it is and isn’t claiming.

Three surveys have done the work of consolidating the vocabulary. Each one cuts the space slightly differently, but together they give you the conceptual toolkit.

Tran et al.: Actors, Types, Structures, Strategies

The most useful single survey is Tran et al. (2025). It defines a multi-agent system formally as a tuple of agents, collaboration channels, collective goals, and an environment. Then it taxonomizes the space along four axes.

Tran's Four Axes

Types
Cooperation (aligned goals) Competition (conflicting goals) Coopetition (mixed)
Structures
Centralized (hub) Decentralized (P2P) Hierarchical (layered)
Strategies
Rule-based (voting, consensus) Role-based (SOP, expertise) Model-based (Theory of Mind)
Architecture
Static (pre-defined) Dynamic (runtime adjustment)

Most of the famous wave-1 papers are in one box: cooperative, hierarchical, role-based, static. Everyone is doing roughly the same thing, with small variations in how agents pass messages and what they produce at each step. The survey’s most useful claim is that the optimal structure varies with the task. There is no universal topology.

Zhou et al.: The Five-Component Agent

Zhou et al. (2024) takes a different cut. Instead of asking how agents coordinate, they ask what each agent actually has inside it. They propose a five-component model that applies to any LLM-based agent.

Zhou's Five Components

01 Profile
How the agent is created with role and expertise
02 Perception
How the agent observes its environment
03 Self-Action
Memory, reasoning, and planning
04 Mutual Interaction
Communication paradigm, structure, content
05 Evolution
Self-reflection, progressive enhancement

Reading this as a distributed systems person, the labels sound like things you’d recognize from any actor system. Profile is identity. Perception is input. Self-Action is local state plus computation. Mutual Interaction is message passing. Evolution is the weakest piece, because nobody has really figured out what “agent learning from its own history” looks like in production.

Chen et al.: Applications and Unsolved Challenges

The third survey, Chen et al. (2024), is the one I’d skim rather than read in full. The applications chapter is useful, but what you actually want is the challenges section.

Chen's Challenge Levels

Agent-level
Alignment for simulation Hallucination propagation Long-context limits
Interaction-level
Efficiency explosion Accumulative error
Evaluation-level
No standardized benchmarks No objective metrics No individual vs aggregate frameworks

The interaction-level challenges are the ones that most concern me. Efficiency explosion is the observation that multi-agent systems scale worse than linearly because each agent’s autoregressive generation multiplies the token cost. Accumulative error is what it sounds like: errors made in round one propagate and amplify in rounds two, three, four.

Mapping Papers Into These Taxonomies

The payoff of the vocabulary is that you can now categorize any paper in the field at a glance.

SystemTypeStructureStrategyArchitecture
CAMELCooperationDecentralized pairRole-basedStatic
ChatDevCooperationHierarchical pipelineRole-basedStatic
MetaGPTCooperationCentralized poolRole + Rule-basedStatic
Debate (Du)CompetitionDecentralized all-to-allRule-based roundsStatic
Generative AgentsCoopetitionDecentralized open envModel-based retrievalDynamic
Anthropic ResearchCooperationCentralized orchestratorRole-basedDynamic
AutoGenConfigurableConfigurableConfigurableStatic or Dynamic

Most of the canonical papers sit in the cooperative, role-based, static quadrant. The interesting ones are the exceptions. Du et al. is the rare competitive debate paper. Generative Agents is the rare fully dynamic system. AutoGen tries to be everything at once, which is its whole thesis.

Why vocabulary matters
When two papers claim they "disagree," the vocabulary lets you ask: are they actually addressing the same problem? ChatDev and MetaGPT both call themselves "multi-agent software engineering frameworks" but they have different structures, different strategies, and different failure modes. You need the words to see that they are solving slightly different versions of the same problem.

The Gap the Vocabulary Exposes

The taxonomies do something else besides categorize papers. They make gaps visible.

Zhou’s “Evolution” component is the weakest across every system. Nobody has a real story for how agents learn from their own history in production. MetaGPT’s “test-driven retry” is the closest wave-1 paper to Evolution, and it’s still just a bounded retry loop with no memory of past attempts.

Tran’s “dynamic architecture” category is almost empty. The wave-1 papers all fix their topology at design time. AutoGen makes topology configurable, but it’s configured by the developer, not adjusted at runtime. The only system that truly adjusts at runtime is Generative Agents, and that’s a simulation, not a production framework.

Chen’s “evaluation-level” challenges are unsolved in a way that’s embarrassing for the field. When ChatDev claims 88 percent executability and MetaGPT claims 41 percent on a comparable benchmark, you’re not looking at a performance difference. You’re looking at two papers measuring different things with different tools and calling them the same.

Next post: the wave-1 theory papers in detail. CAMEL, Generative Agents, ChatDev, MetaGPT, AutoGen. What each one actually builds, what each one trusts, and where each one breaks.

Discussion on Bluesky

Replies to this post on Bluesky appear below. Reply there to join the conversation.

Loading discussion…