multi-agent systems | Markus Dreyer

Deep Research Comparator: A Platform for Fine-Grained Human Annotations of Deep Research Agents

Effectively evaluating deep research agents that autonomously search the web, analyze information, and generate reports remains a major challenge, particularly when it comes to assessing long reports and giving detailed feedback on their intermediate …