@omarsar0
This multi-agent system outperforms 9 of 10 human penetration testers. This work presents the first comprehensive evaluation of AI agents against human cybersecurity professionals on a real enterprise network: approximately 8,000 hosts across 12 subnets at a major research university. It introduces ARTEMIS, a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents running in parallel, and automatic vulnerability triaging. ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate. It outperformed 9 of 10 human penetration testers in the study. How does it work? A supervisor agent manages the workflow, spawning specialized sub-agents with dynamically generated expert prompts for each task. When the agent finds something noteworthy from a scan, it immediately launches parallel sub-agents to probe multiple targets simultaneously. A triage module verifies submissions are reproducible before reporting. This parallelism is a key advantage humans lack. One participant noted a vulnerable LDAP server during scanning, but never returned to it. ARTEMIS would have assigned a sub-agent to investigate while continuing other work. The cost implications are significant. ARTEMIS with GPT-5 costs $18/hour versus the industry average of $60/hour for professional penetration testers. At equivalent performance to most human professionals, that's a 3x cost reduction. On the other hand, ARTEMIS struggles with GUI-based tasks: 80% of humans found a remote code execution vulnerability via TinyPilot's web interface, but the agent couldn't navigate the GUI. It also has higher false-positive rates, sometimes misinterpreting HTTP 200 responses as successful authentication when they were actually redirect pages. This shows the reality of how much work there is to do on computer-using agents. No humans found a vulnerability in an older IDRAC server with outdated HTTPS ciphers that browsers refused to load. ARTEMIS exploited it using curl -k to bypass certificate verification. Paper: https://t.co/xuuqZLuH6j Learn to build effective AI agents in our academy: https://t.co/JBU5beIoD0