Workshop

Build a Multi-Agent Team with Handoffs Using PraisonAI

Wire up cooperating AI agents that plan, execute, and pass tasks to each other in under 35 minutes

35 min multi-agent handoffs orchestration low-code PraisonAI

What's happening

Multi-agent orchestration has moved from research curiosity to production necessity. As of early 2026, PraisonAI has emerged as one of the most actively developed frameworks in this space — trending consistently on GitHub and cited by prominent figures in the AI industry. Its appeal is straightforward: define cooperating agents in a handful of Python declarations, wire them together with handoffs and workflow patterns, and deploy them to messaging platforms with minimal ceremony.

The core insight behind multi-agent systems is division of cognitive labour. A single monolithic prompt tasked with planning, researching, and executing will degrade in quality as complexity grows. By decomposing work into specialised agents — each with a focused role, constrained instructions, and clear handoff protocols — we achieve better output quality, easier debugging, and more predictable behaviour. PraisonAI operationalises this insight with a low-code API that supports agent handoffs, workflow patterns (routing, parallel execution, looping), guardrails for input and output validation, MCP tool integration, and built-in memory.

In this session, we will construct a three-agent team: a Planner that decomposes tasks, a Researcher that gathers information, and a Coder that produces implementations. These agents will hand off work to each other in a structured pipeline. Along the way, we will examine the architectural decisions that make multi-agent systems reliable — and the failure modes that make them fragile.

Establish the Foundation: Install PraisonAI and Define Your First Agent

Before orchestrating a team, we must understand the fundamental unit: the Agent. In PraisonAI, an agent is a lightweight object that wraps an LLM call with a defined role, goal, and set of instructions. Think of it as a job description for an AI worker — it constrains what the model attends to and how it frames its responses.

PraisonAI achieves remarkably fast agent instantiation (under 4 microseconds per agent), which means the overhead of defining multiple specialised agents rather than one general-purpose agent is negligible. The framework supports over 100 LLM providers, so you can target OpenAI, Anthropic, Google, or local models interchangeably.

Our first task is to get PraisonAI installed and produce a single working agent — a Planner whose job is to decompose a high-level objective into discrete subtasks.

Ask your agent

Ask your AI agent to generate a Python script that installs PraisonAI and creates a single Planner agent capable of decomposing a user-provided goal into numbered subtasks.

Think about it

What Python package needs to be installed, and what environment variable must be set for your chosen LLM provider?
What role, goal, and instructions should the Planner agent have? How specific should the instructions be to constrain the output format?
How should the agent's output be structured so that downstream agents can parse it — free-form prose, numbered list, or structured data?
What is the minimal PraisonAI import and Agent constructor call needed?

What the agent gives back

The agent should produce a short Python script that imports Agent from praisonaiagents, sets up a Planner agent with a descriptive role (e.g., 'Task Decomposition Specialist'), a clear goal statement, and instructions that mandate numbered subtask output. The script should call agent.start() with a sample task string. The key construction looks approximately like:

```python

from praisonaiagents import Agent

planner = Agent(

name="Planner",

role="Task Decomposition Specialist",

goal="Break down complex objectives into clear, actionable subtasks",

instructions="Given a high-level goal, produce a numbered list of 3-7 specific subtasks. Each subtask must be self-contained and actionable."

)

planner.start("Build a web scraper that monitors competitor pricing")

```

When run, this agent will invoke the configured LLM and return a structured list of subtasks.

API Key Note

PraisonAI requires an API key for your chosen LLM provider. Set OPENAI_API_KEY for OpenAI models, or the appropriate environment variable for Anthropic, Google, etc. Without this, agent instantiation will succeed but start() will fail at inference time.

Tip

PraisonAI defaults to gpt-4o if no model is specified. You can override this per-agent with the llm parameter — useful when you want a fast, cheap model for planning and a more capable model for complex reasoning.

Why separate planning from execution?

The Planner agent exists because LLMs perform measurably better when they reason about task decomposition before attempting execution. This is the same principle behind chain-of-thought prompting, but externalised into an architectural boundary. By isolating planning into its own agent, we gain three advantages: (1) we can inspect and validate the plan before committing resources to execution, (2) we can use a different model or temperature setting optimised for analytical reasoning, and (3) the plan becomes a shareable artefact that other agents consume as structured input rather than implicit context buried in a long conversation.

Add Research and Coding Agents: Building the Team

A single agent is useful; a coordinated team is powerful. We now introduce two additional agents: a Researcher that gathers relevant information given a subtask, and a Coder that produces implementation code based on research findings.

The critical design question is not how to define these agents — the API is identical to Step 1 — but how to scope their responsibilities. A well-designed multi-agent system enforces the single-responsibility principle: each agent does one thing well and delegates everything else. If the Researcher starts writing code, or the Coder starts searching the web, the system becomes harder to debug and more prone to hallucination.

PraisonAI supports this separation through two mechanisms: instructions (which constrain the agent's behaviour through its system prompt) and tools (which give the agent specific capabilities like web search or file I/O). An agent without the web_search=True flag simply cannot browse the internet, regardless of what it is asked to do.

Ask your agent

Ask your AI agent to extend the script from Step 1 by adding a Researcher agent (with web search enabled) and a Coder agent, then compose all three into a team using PraisonAI's `PraisonAIAgents` orchestrator with associated `Task` objects.

Think about it

What distinguishes the Researcher from the Coder in terms of capabilities? Which one needs web access, and which needs to produce executable code?
PraisonAI uses `Task` objects to bind an agent to a specific piece of work. What fields does a Task need — at minimum a description and an agent assignment?
How does the `PraisonAIAgents` orchestrator determine execution order? Does it run tasks sequentially by default, or does it require explicit ordering?
What should each agent's instructions say to prevent role confusion — the Researcher producing code, or the Coder making unsupported claims?

What the agent gives back

The agent should produce a script that defines three agents and three corresponding tasks, then passes them to PraisonAIAgents for sequential execution. The Researcher agent should have web_search=True; the Coder agent should have instructions emphasising clean, documented code output. The orchestrator ties them together:

```python

from praisonaiagents import Agent, Task, PraisonAIAgents

researcher = Agent(name="Researcher", role="Information Gatherer",

goal="Find accurate, relevant information for a given subtask",

web_search=True)

coder = Agent(name="Coder", role="Implementation Specialist",

goal="Produce clean, working Python code based on provided specifications")

```

Tasks are then defined referencing these agents, and PraisonAIAgents(tasks=[...]).start() executes the pipeline. The key insight is that tasks execute in list order, with each task's output available to subsequent tasks as context.

Tip

The web_search=True parameter gives an agent native browsing capability without any external tool configuration. For production systems, consider using the tools=MCP("npx ...") pattern instead, which provides more granular control over which external services an agent can access.

Warning

Avoid giving all agents maximum capabilities. An agent with web search, code execution, and file system access is essentially unconstrained — it can do anything, which means it can fail in unpredictable ways. Constrain each agent to the minimum capabilities required for its role.

Sequential vs. parallel task execution

By default, PraisonAIAgents executes tasks in the order they appear in the list. This is appropriate for pipelines where each step depends on the previous one's output. However, PraisonAI also supports parallel execution via the process='workflow' parameter combined with workflow pattern annotations. We will explore this in Step 4. For now, sequential execution is the correct choice because our Researcher needs the Planner's output, and our Coder needs the Researcher's findings.

✓

At this point, you should have a working three-agent pipeline: a Planner that decomposes goals, a Researcher that investigates subtasks with web search, and a Coder that produces implementations. Running the script should produce a structured plan, research findings, and code output in sequence.

Enable Agent Handoffs: Let Agents Delegate Dynamically

Sequential task execution is effective but rigid — the execution order is hardcoded at definition time. Handoffs introduce dynamic delegation: an agent can, mid-conversation, decide to pass control to another agent based on the content of the current request. This transforms a static pipeline into an adaptive workflow.

Consider the difference: in our sequential pipeline, the Planner always runs first, then the Researcher, then the Coder. With handoffs, a user could ask a question that the Planner recognises as a coding question and immediately hands off to the Coder, bypassing research entirely. Or the Coder might realise it needs more information and hand back to the Researcher.

PraisonAI implements handoffs by passing agent references directly into another agent's constructor. When Agent A lists Agent B in its configuration, Agent A can invoke Agent B as if it were a tool — transferring the conversation context and receiving the result. This is conceptually similar to a function call, but the 'function' is another autonomous agent.

Ask your agent

Ask your AI agent to modify the three-agent system so that the Planner agent can hand off directly to the Researcher or Coder based on the nature of a subtask, and the Researcher can hand off to the Coder when research is complete.

Think about it

In PraisonAI, handoffs are configured by passing other agent objects into an agent's constructor. What parameter name would you expect for this — and does it accept a list of agents?
How should the Planner's instructions change now that it can delegate? It needs to know when to hand off vs. when to respond directly.
What happens to conversation context during a handoff — does the receiving agent see the full history, or only the handoff message?
Should the Coder be able to hand back to the Researcher if it determines the specifications are incomplete? What are the implications of circular handoffs?

What the agent gives back

The agent should produce a modified version where agents reference each other. The critical change is that each agent's constructor now includes other agents as potential handoff targets. The Planner's instructions are updated to include delegation logic — 'If the subtask requires information gathering, delegate to the Researcher; if it requires implementation, delegate to the Coder.' The core pattern:

```python

coder = Agent(name="Coder", role="Implementation Specialist", ...)

researcher = Agent(name="Researcher", agents=[coder], web_search=True, ...)

planner = Agent(name="Planner", agents=[researcher, coder], ...)

```

Note the bottom-up construction order: the Coder is defined first because the Researcher references it, and both are defined before the Planner. Starting the system with planner.start(task) now produces an adaptive workflow where delegation happens based on content analysis.

Warning

Circular handoffs (A → B → A) are technically possible and sometimes desirable for iterative refinement, but they require a termination condition. Without one, agents can enter infinite delegation loops. Always include explicit instructions about when to stop delegating and return a final answer.

Tip

When debugging handoff chains, enable PraisonAI's verbose logging to see which agent is active at each step. This is invaluable for diagnosing cases where an agent unexpectedly delegates or fails to delegate.

Handoffs vs. tool calls: what's the difference?

Under the hood, PraisonAI implements handoffs using the LLM's function-calling mechanism — when Agent A hands off to Agent B, it is technically making a tool call where the tool happens to be another agent. The distinction matters conceptually, however. A tool call is stateless and returns a discrete result (like a web search returning snippets). A handoff transfers conversational context and control — the receiving agent can engage in multi-turn reasoning before returning. This makes handoffs more powerful but also more expensive and harder to predict. Use tool calls for discrete, bounded operations; use handoffs for open-ended reasoning tasks that benefit from another agent's specialised perspective.

Quick Check

You want the Coder agent to verify its own output before returning it. Which PraisonAI mechanism is most appropriate?

Add a guardrail with an output validation function that checks the code for syntax errors

✗ Not quite. Guardrails are excellent for structural validation (checking format, blocking prohibited content, enforcing schemas), but verifying code correctness requires reasoning, not pattern matching. A guardrail that runs `compile()` can catch syntax errors but cannot assess whether the code actually solves the stated problem.

Enable self-reflection so the Coder reviews and revises its own output before returning

✓ Correct! Self-reflection (configured in PraisonAI's workflow patterns) causes the agent to evaluate its own output against the original objective and revise if necessary. This is the appropriate mechanism when verification requires the same kind of reasoning the agent used to produce the output. It adds one additional LLM call but substantially improves output quality for complex tasks.

Create a fourth 'Reviewer' agent and add a handoff from Coder to Reviewer

✗ Not quite. While a dedicated Reviewer agent is a valid architectural choice for complex systems, it introduces unnecessary overhead for this use case. Self-reflection achieves the same outcome — critical evaluation of output — without the cost of defining, configuring, and maintaining an additional agent. Reserve separate reviewer agents for cases where the review requires different capabilities or a different model than the original agent.

Apply Workflow Patterns and Guardrails for Production Robustness

A working handoff chain is a good start, but production systems require two additional layers: workflow patterns that control execution topology, and guardrails that validate inputs and outputs at each boundary.

PraisonAI supports four workflow patterns:

Route: A dispatcher agent examines the input and directs it to exactly one downstream agent based on content analysis. Useful when different input types require fundamentally different processing.
Parallel: Multiple agents process the same input simultaneously, and their outputs are aggregated. Useful when a task benefits from diverse perspectives or independent subtasks.
Loop: An agent's output is fed back as its own input for iterative refinement, with a termination condition. Useful for self-correction and progressive improvement.
Sequential: The default — tasks execute in order, each consuming the previous output.

Guardrails operate at the boundaries between agents. An input guardrail validates what an agent receives; an output guardrail validates what it produces. They are ordinary Python functions that return a boolean or raise an exception. If a guardrail fails, the pipeline halts with a descriptive error rather than propagating corrupt data downstream.

Ask your agent

Ask your AI agent to enhance the multi-agent system with: (1) a routing pattern where the Planner dispatches subtasks to the appropriate specialist, (2) an output guardrail on the Coder that rejects responses not containing valid Python code blocks, and (3) self-reflection on the Coder for quality improvement.

Think about it

For the routing pattern, think about what criteria the Planner uses to choose between the Researcher and Coder. How do you express routing logic — through the agent's instructions, a workflow annotation, or both?
A guardrail is a Python function that inspects the agent's output. What should it check for? A simple heuristic might look for code fence markers or attempt to parse the output as Python.
Self-reflection adds an internal review loop. How many iterations are reasonable before diminishing returns set in? What instructions should guide the self-reflection?
How do guardrails interact with self-reflection — does the guardrail run before or after the reflection loop?

What the agent gives back

The agent should produce an enhanced system where the Planner uses a routing pattern to dispatch work, the Coder has both self-reflection enabled and an output guardrail. The guardrail is a standalone function that checks for Python code presence. The self-reflection is configured as part of the agent's workflow. Key elements include:

A guardrail function that inspects the Coder's output for code blocks and returns a validation result. The Coder agent is configured with self_reflect=True (or equivalent) and the guardrail is attached to its output. The Planner's instructions are updated to explicitly describe routing logic: analyse each subtask and delegate to the agent best suited for it.

The guardrail pattern is straightforward — a function receiving the output string and returning True/False — but the key architectural decision is where to place it: on the Coder's output (catching bad code before it reaches the user) rather than on the Planner's input (which would over-constrain the system).

Tip

Guardrails should be fast and deterministic. Avoid calling an LLM inside a guardrail — that defeats the purpose. Use string matching, regex, schema validation, or syntax checking. Reserve LLM-based quality assessment for the self-reflection mechanism.

API Key Note

If using the MCP protocol for tool integration (e.g., tools=MCP("npx @anthropic/mcp-server-fetch")), ensure the MCP server binary is installed and accessible. MCP tools extend an agent's capabilities beyond what PraisonAI provides natively — database queries, API calls, file system operations, and more.

When to use parallel execution

The parallel workflow pattern is particularly valuable when you need diverse perspectives on the same input. For example, you might run three Researcher agents simultaneously — one searching academic papers, one searching technical documentation, and one searching community forums — then aggregate their findings. PraisonAI handles the fan-out and collection automatically. The cost is straightforward: parallel execution multiplies your LLM API calls by the number of parallel agents. Use it when breadth of coverage matters more than token efficiency.

Guardrail design principles

Effective guardrails follow three principles. First, they should be specific: check for one condition rather than attempting comprehensive validation. Second, they should be fast: a guardrail that takes seconds to execute negates the benefit of catching errors early. Third, they should produce actionable error messages: 'Output must contain at least one Python code block enclosed in triple backticks' is vastly more useful than 'Validation failed.' In production, chain multiple narrow guardrails rather than writing one complex validator — this makes failures easier to diagnose and rules easier to update independently.

✓

You should now have a multi-agent system with dynamic handoffs, a routing Planner, a web-enabled Researcher, a self-reflecting Coder with output validation, and an understanding of when to apply route, parallel, and loop patterns. This is a production-capable architecture.

Your Turn

Extend the system with a fourth agent — a **Reviewer** — that uses the loop workflow pattern to iteratively improve the Coder's output until it passes a quality threshold. The Reviewer should evaluate code for correctness, readability, and completeness, then either approve it or send it back to the Coder with specific revision instructions. Limit the loop to a maximum of three iterations.

In real-world multi-agent pipelines, the most common failure mode is not that agents produce wrong output — it is that they produce *almost right* output that passes superficial validation but fails under closer inspection. An iterative review loop addresses this by applying progressively more demanding scrutiny. The challenge is formulating a prompt that creates a genuinely critical reviewer rather than a rubber-stamp approver.

Think about it

What criteria should the Reviewer evaluate against? Vague instructions like 'check if the code is good' will produce vague reviews. Consider specifying concrete quality dimensions.
How do you prevent the loop from running indefinitely? PraisonAI supports maximum iteration counts, but the Reviewer's instructions should also include explicit criteria for when to approve.
Should the Reviewer have access to the original task description, or only the Coder's output? What contextual information improves review quality?
How do you handle the case where the Coder cannot satisfy the Reviewer's demands within three iterations — does the system fail, return the best attempt, or escalate?

See a sample prompt

One way you could prompt it

Create a Reviewer agent in PraisonAI that evaluates Python code output from a Coder agent. The Reviewer should assess three dimensions: (1) syntactic correctness — does the code parse without errors, (2) completeness — does it address all requirements from the original task, and (3) readability — are there docstrings, meaningful variable names, and logical structure. Configure a loop workflow between the Coder and Reviewer with a maximum of 3 iterations. If after 3 iterations the code still does not pass review, return the best version with a summary of remaining issues. The Reviewer's instructions should reference the original task description for completeness checking. Include an output guardrail that ensures the final output contains both the approved code and a brief quality assessment.

Connect External Tools via MCP and Prepare for Deployment

The agents we have built so far rely entirely on the LLM's parametric knowledge and web search. For production applications, agents need access to external tools — databases, APIs, file systems, and specialised services. PraisonAI integrates with the Model Context Protocol (MCP), an open standard for connecting AI models to external data sources and tools.

MCP operates through a server-client architecture. An MCP server exposes a set of tools (e.g., 'read file', 'query database', 'fetch URL') via a standardised protocol. PraisonAI agents connect to these servers using the tools=MCP(...) parameter, which supports multiple transport mechanisms: stdio (for local command-line tools), HTTP, WebSocket, and Server-Sent Events.

The elegance of MCP integration is that it decouples tool capability from agent definition. You can swap out the underlying tool implementation — replacing a mock database with a production one, for example — without modifying any agent code. The agent simply sees a set of available functions and their descriptions.

Beyond tool integration, PraisonAI supports deployment to messaging platforms (Telegram, Discord, WhatsApp), persistent memory across sessions, and prompt caching for reduced latency and cost.

Ask your agent

Ask your AI agent to add MCP tool integration to the Researcher agent — specifically, connect it to a file-fetching MCP server so it can read local documents — and configure the system with memory enabled and prompt caching for efficiency.

Think about it

The MCP connection string follows the pattern `MCP("command args")` for stdio servers. What command would launch a file-fetching MCP server?
Memory in PraisonAI is enabled with a single parameter. What are the implications of persistent memory across sessions — when is it helpful, and when might stale memory degrade performance?
Prompt caching reduces cost by reusing computations for identical prompt prefixes. Which agents in our system would benefit most from caching — those with stable system prompts or those with highly variable inputs?
For deployment, what additional configuration would be needed to connect this agent team to a Telegram bot? Think about the interface boundary between the messaging platform and the Planner agent.

What the agent gives back

The agent should produce the final enhanced system with MCP tools attached to the Researcher, memory enabled on all agents, and prompt caching activated. The MCP integration is a single parameter addition to the agent constructor:

```python

researcher = Agent(

name="Researcher",

tools=MCP("npx @anthropic/mcp-server-fetch"),

memory=True,

prompt_caching=True,

...

)

```

The agent should also explain how to enable memory and prompt caching globally across all agents, note that MCP servers must be installed separately (npx fetches them on demand for Node-based servers), and outline the high-level steps for Telegram deployment — setting a bot token, configuring the entry agent, and mapping message events to agent invocations.

Tip

MCP servers are not limited to pre-built options. You can write custom MCP servers that expose any Python function, REST API, or database query as a tool. This is the recommended pattern for integrating proprietary business logic into your agent pipeline.

Warning

Memory persists across sessions by default when enabled. In multi-tenant systems, ensure agents do not leak context between users. PraisonAI supports session-scoped memory via the session management API — use it in any deployment where multiple users interact with the same agent team.

The MCP ecosystem and available servers

The MCP ecosystem is growing rapidly. As of early 2026, commonly used MCP servers include: @anthropic/mcp-server-fetch for URL fetching, @anthropic/mcp-server-filesystem for local file operations, @anthropic/mcp-server-github for GitHub API access, and community-built servers for databases (PostgreSQL, SQLite), search engines, and cloud services (AWS, GCP). PraisonAI can connect to multiple MCP servers simultaneously — a single agent can have tools from several different servers, giving it composite capabilities without any custom integration code.

Cost and latency considerations for production

Each agent invocation costs one or more LLM API calls. In our five-component system (Planner, Researcher, Coder, self-reflection, guardrails), a single user request might trigger 5-10 API calls. At GPT-4o pricing, this is roughly $0.05-0.15 per request. Prompt caching can reduce this by 50-80% for agents with stable system prompts. For cost-sensitive applications, consider using a cheaper model (GPT-4o-mini, Claude Haiku) for the Planner and Researcher, reserving the most capable model for the Coder where output quality matters most. PraisonAI's per-agent llm parameter makes this trivial to configure.

Recap

We have constructed a production-capable multi-agent system in five progressive steps. Starting from a single Planner agent, we composed a three-agent team with specialised roles, introduced dynamic handoffs for adaptive delegation, applied workflow patterns (routing, looping, self-reflection) and guardrails for robustness, and connected external tools via the MCP protocol.

The central lesson of this session is architectural, not syntactic. The power of multi-agent systems lies not in the API calls — which, as we have seen, are remarkably concise — but in the design decisions: how to decompose responsibilities, where to place validation boundaries, when to use handoffs versus sequential execution, and how to balance capability against predictability.

PraisonAI's contribution is removing the implementation friction so that these design decisions become the primary focus. With agent instantiation under 4 microseconds, 100+ LLM providers, and built-in support for memory, caching, and messaging platform deployment, the framework handles the infrastructure so you can focus on the architecture.

Multi-agent systems derive their power from division of cognitive labour — specialised agents with constrained responsibilities outperform monolithic prompts on complex tasks.
Handoffs transform static pipelines into adaptive workflows. The key design challenge is specifying clear delegation criteria and preventing infinite loops.
Guardrails and self-reflection serve complementary roles: guardrails enforce structural constraints cheaply and deterministically, while self-reflection applies open-ended quality reasoning at higher cost.
The MCP protocol decouples tool capability from agent definition, enabling clean separation of concerns and easy swapping of tool implementations.
Production readiness requires attention to cost (use prompt caching, right-size models per agent), safety (scope agent capabilities to minimum required), and observability (log handoff chains, monitor guardrail failures).

Where to go next

Implement parallel workflow patterns — run multiple Researcher agents simultaneously to gather diverse information and aggregate findings.
Build a custom MCP server that exposes your own business logic as tools available to any PraisonAI agent.
Deploy the agent team to a messaging platform (Telegram or Discord) and explore PraisonAI's session management for multi-user environments.
Add RAG (Retrieval-Augmented Generation) to give agents access to a private knowledge base, combining memory with document retrieval for domain-specific applications.
Explore PraisonAI's external agent orchestration to coordinate between different AI systems — for example, using a PraisonAI planner to dispatch tasks to Claude Code, Gemini CLI, or Codex agents.

Sources

MervinPraison/PraisonAI (GitHub Trending Python)