POIROT Protocol

Peer-Oriented Identification & Resolution of Operational Threats

Iñaki Dellibarda Varela1*, R. Sendra-Arranz1, Pablo Romero-Sorozabal1, J.M. Valverde-García1, Annemarie F. Laudanski1,2, Álvaro Gutiérrez3, Eduardo Rocon1*†, Manuel Cebrian1†

1 Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain
2 Biomechanics of Human Mobility Laboratory, Dept. of Kinesiology and Health Sciences, University of Waterloo, Canada
3 ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Madrid, Spain
* Corresponding authors  ·  Equal supervision contribution

A consensus-based framework for detecting and attributing errors in multi-agent AI systems through collaborative peer interrogation and weighted voting.

🎯 What is POIROT?

The Problem

Multi-agent AI systems are increasingly deployed in critical domains — healthcare, finance, autonomous systems — but detecting and attributing errors across distributed agents remains an open problem. Traditional debugging approaches fail when agents have partial observability and faults propagate silently through agent interactions.

The Solution

Rather than relying on an external judge, POIROT turns the system's own agents into investigators. Each agent already understands its role and what it observed — making them the best-placed experts to reason about what went wrong. Through structured peer interrogation and weighted consensus voting, this collective knowledge consistently outperforms single-LLM baselines by up to +26 percentage points.

🔬 How POIROT Works: 5-Phase Protocol

1

Phase 1: Error Vector Space Construction

The POIROT Agent analyzes the multi-agent system description to identify all potential error locations and constructs an N-dimensional error vector space.

INPUT
  • System architecture description
  • Agent roles and responsibilities
OUTPUT
  • Error dimension labels
  • Binary vector representation [0,1,0,...]
  • Component descriptions
2

Phase 2: Individual Analysis

Each agent independently analyzes the session execution logs from their own perspective. Agents only see messages they participated in, preventing hallucination and ensuring evidence-based observations.

🔍 KEY FEATURES
Message Filtering:
Agents see only their own messages, preventing confabulation
JSON Output:
Structured anomaly reports with evidence citations
Transparency:
Agents acknowledge when they see nothing wrong
Non-Participants:
Agents not in session can still join Phase 2
3

Phase 3: Peer Consultation Protocol

Agents receive each other's Phase 2 reports and, in turns, can interrogate their peers — asking follow-up questions, requesting clarifications, or challenging observations. This structured dialogue lets agents cross-reference their partial views of the session. Once the consultation concludes, each agent produces its final fault attribution decision with a full justification.

4

Phase 4: Weighted Voting with Hamming Distance

Agent votes are weighted based on their proximity to the suspected error location using Hamming distance in the error vector space. Agents voting for themselves receive maximum weight; votes far from their position receive lower weight.

📊 VOTING FORMULA
Hamming Similarity:
similarity = 1 - (hamming_distance / N_dimensions)

Vote Weight:
weight = baseline + 0.5 × similarity

Example:
• Agent voting for self: weight ≈ 0.75 (high confidence)
• Agent voting nearby: weight ≈ 0.58 (medium)
• Agent voting far: weight ≈ 0.42 (low)

Final Consensus:
consensus[i] = sum(vote[i] × weight) / sum(weights)
5

Phase 5: Fault Localization

Once all weighted votes are aggregated, POIROT identifies the most probable fault location: the component or set of components with the highest consensus score across the error vector space. The result is a ranked attribution — indicating not just where the failure likely originated, but with what degree of collective confidence.

🔬
OUR EVALUATION BENCHMARK

📐 BLAME

Benchmark for Localizing Agent Malfunctions Effectively — the open evaluation suite we developed to validate POIROT across two distinct multi-agent domains. BLAME provides structured fault injection scenarios, ground truth attribution vectors, and standardized metrics for benchmarking agent debugging protocols.

🏥CORTEX
Medical rehabilitation · 7 dimensions · 15 fault scenarios · 3 agents
💹TradingAgents
Algorithmic trading · 15 dimensions · 6 fault scenarios · 12 agents

📈 Validation Results

⏱️
OPEN BENCHMARK

⏱️ Who & When Benchmark

An open multi-agent benchmark evaluating fault attribution on dynamic, real-world conversational tasks — identifying which agent made an error and at what point. POIROT achieves 42% overall accuracy on 126 heterogeneous cases, with perfect attribution on single-agent scenarios and strong performance as pipeline complexity grows.

42%
Overall accuracy
53 / 126 correct
100%
Single-agent tasks
4/4 — perfect attribution
67%
4-agent tasks
6/9 correct
126
Total cases
Across task categories
🏥

BLAME · CORTEX — POIROT vs. Baseline

Gemini 2.5 Pro
Baseline
27.8%
POIROT
40.5%
DeepSeek Reasoner
Baseline
16.7%
POIROT
42.3%
GPT-oss 120B
Baseline
32.7%
POIROT
31.3%
GPT-oss 20B
Baseline
12.7%
POIROT
19.3%
💹

BLAME · TradingAgents — POIROT vs. Baseline

Gemini 2.5 Pro
Baseline
25%
POIROT
66.7%
DeepSeek Reasoner
Baseline
25.5%
POIROT
44.1%
GPT-oss 120B
Baseline
34.4%
POIROT
48.7%
GPT-oss 20B
Baseline
40.2%
POIROT
48.4%

🎬 Live Demonstrations

Explore real POIROT analyses across two multi-agent systems from the BLAME benchmark. Each case shows the complete 5-phase protocol with actual error injection, agent deliberation, and consensus voting.