🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

What's missing to build useful deep research agents? Deep research agents promise analyst-level reports through automated search and synthesis. However, current systems fall short of genuinely useful research. The question is: where exactly do they fail? This new paper introduces FINDER, a benchmark of 100 human-curated research tasks with 419 structured checklist items for evaluating report quality. Unlike QA benchmarks, FINDER focuses on comprehensive report generation. The researchers analyzed approximately 1,000 reports from mainstream deep research agents. Their findings challenge assumptions about where these deep research systems struggle. Current agents don't struggle with task comprehension. They fail at evidence integration, verification, and reasoning-resilient planning. They understand what you're asking. They just can't synthesize the answer reliably. The paper introduces DEFT, the first failure taxonomy for deep research agents. It identifies 14 distinct failure modes across three categories: reasoning failures, retrieval failures, and generation failures. This systematic breakdown reveals that the gap between current capabilities and useful research isn't about smarter search or better language models. It's about the reasoning architecture that connects retrieval to synthesis. (bookmark it) Paper: https://t.co/gAA7feYHm1

Media 1

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1995915929973403827/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1995915929973403827/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-12-04T20:37:50.753288",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1995915929973403827",
  "url": "https://x.com/omarsar0/status/1995915929973403827",
  "twitterUrl": "https://twitter.com/omarsar0/status/1995915929973403827",
  "text": "What's missing to build useful deep research agents?\n\nDeep research agents promise analyst-level reports through automated search and synthesis. However, current systems fall short of genuinely useful research.\n\nThe question is: where exactly do they fail?\n\nThis new paper introduces FINDER, a benchmark of 100 human-curated research tasks with 419 structured checklist items for evaluating report quality. Unlike QA benchmarks, FINDER focuses on comprehensive report generation.\n\nThe researchers analyzed approximately 1,000 reports from mainstream deep research agents. Their findings challenge assumptions about where these deep research systems struggle.\n\nCurrent agents don't struggle with task comprehension. They fail at evidence integration, verification, and reasoning-resilient planning. They understand what you're asking. They just can't synthesize the answer reliably.\n\nThe paper introduces DEFT, the first failure taxonomy for deep research agents. It identifies 14 distinct failure modes across three categories: reasoning failures, retrieval failures, and generation failures.\n\nThis systematic breakdown reveals that the gap between current capabilities and useful research isn't about smarter search or better language models. It's about the reasoning architecture that connects retrieval to synthesis.\n\n(bookmark it)\n\nPaper: https://t.co/gAA7feYHm1",
  "source": "Twitter for iPhone",
  "retweetCount": 47,
  "replyCount": 20,
  "likeCount": 280,
  "quoteCount": 6,
  "viewCount": 20709,
  "createdAt": "Tue Dec 02 18:00:14 +0000 2025",
  "lang": "en",
  "bookmarkCount": 271,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1995915929973403827",
  "displayTextRange": [
    0,
    273
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "omarsar0",
    "url": "https://x.com/omarsar0",
    "twitterUrl": "https://twitter.com/omarsar0",
    "id": "3448284313",
    "name": "elvis",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "description": "",
    "location": "DAIR.AI Academy",
    "followers": 277867,
    "following": 724,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Sep 04 12:59:26 +0000 2015",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 33719,
    "hasCustomTimelines": true,
    "isTranslator": true,
    "mediaCount": 4356,
    "statusesCount": 16656,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1996595107924263287"
    ],
    "profile_bio": {
      "description": "Building agents @dair_ai • Ex Meta AI, Elastic, PhD • Sharing research & insights on AI Agents • New cohort: https://t.co/tn8LKG5d20",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "dair-ai.thinkific.com/courses/claude…",
              "expanded_url": "https://dair-ai.thinkific.com/courses/claude-code",
              "indices": [
                109,
                132
              ],
              "url": "https://t.co/tn8LKG5d20"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                16,
                24
              ],
              "name": "",
              "screen_name": "dair_ai"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair-ai.thinkific.com",
              "expanded_url": "https://dair-ai.thinkific.com/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/JBU5beHQNs"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/KD0dy2awcu",
        "expanded_url": "https://twitter.com/omarsar0/status/1995915929973403827/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1995915925896527875",
        "indices": [
          274,
          297
        ],
        "media_key": "3_1995915925896527875",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARuy6vW0m4ADCgACG7Lq9qebsLMAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG7Lq9bSbgAMKAAIbsur2p5uwswAA",
            "media_key": "3_1995915925896527875"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G7Lq9bSbgAM2v9f.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 904,
              "w": 1614,
              "x": 0,
              "y": 0
            },
            {
              "h": 1614,
              "w": 1614,
              "x": 0,
              "y": 0
            },
            {
              "h": 1758,
              "w": 1542,
              "x": 63,
              "y": 0
            },
            {
              "h": 1758,
              "w": 879,
              "x": 395,
              "y": 0
            },
            {
              "h": 1758,
              "w": 1614,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1758,
          "width": 1614
        },
        "sizes": {
          "large": {
            "h": 1758,
            "w": 1614
          }
        },
        "type": "photo",
        "url": "https://t.co/KD0dy2awcu"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "urls": [
      {
        "display_url": "arxiv.org/abs/2512.01948",
        "expanded_url": "https://arxiv.org/abs/2512.01948",
        "indices": [
          1343,
          1366
        ],
        "url": "https://t.co/gAA7feYHm1"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}