🐦 Twitter Post Details

Viewing enriched Twitter post

@dair_ai

LLMs can't see. How can we build effective multi-agent systems with vision capabilities? Building multimodal models from scratch is expensive. Training joint vision-language architectures requires massive compute, specialized datasets, and careful optimization. But there's another way. This new research introduces "Be My Eyes," a framework where vision models become literal eyes for LLMs. The key idea: multi-agent collaboration through natural language. Vision agents analyze images and describe what they see. Language agents receive these descriptions and reason about them. Communication happens entirely through text. No joint training. No architectural modifications. Just agents talking to each other. The system is modular. Swap in better vision models as they emerge. Upgrade the LLM independently. Each component improves without retraining the whole system. Results on MMMU, MMMU-Pro, and video understanding benchmarks show competitive performance with specialized multimodal models. What makes this powerful: it challenges the assumption that multimodal AI requires unified architectures. Agent collaboration through language provides an efficient alternative. Paper: https://t.co/VqnwooAGkJ Learn to build with AI Agents in our academy: https://t.co/Y5kVy5iKiQ

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993367363790717060/media_0.jpg?",
      "filename": "media_0.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993367363790717060/media_1.png?",
      "filename": "media_1.png"
    }
  ],
  "processed_at": "2025-11-27T20:36:47.105174",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1993367363790717060",
  "url": "https://x.com/dair_ai/status/1993367363790717060",
  "twitterUrl": "https://twitter.com/dair_ai/status/1993367363790717060",
  "text": "LLMs can't see.\n\nHow can we build effective multi-agent systems with vision capabilities?\n\nBuilding multimodal models from scratch is expensive. Training joint vision-language architectures requires massive compute, specialized datasets, and careful optimization.\n\nBut there's another way.\n\nThis new research introduces \"Be My Eyes,\" a framework where vision models become literal eyes for LLMs.\n\nThe key idea: multi-agent collaboration through natural language.\n\nVision agents analyze images and describe what they see. Language agents receive these descriptions and reason about them. Communication happens entirely through text.\n\nNo joint training. No architectural modifications. Just agents talking to each other.\n\nThe system is modular. Swap in better vision models as they emerge. Upgrade the LLM independently. Each component improves without retraining the whole system.\n\nResults on MMMU, MMMU-Pro, and video understanding benchmarks show competitive performance with specialized multimodal models.\n\nWhat makes this powerful: it challenges the assumption that multimodal AI requires unified architectures. Agent collaboration through language provides an efficient alternative.\n\nPaper: https://t.co/VqnwooAGkJ\nLearn to build with AI Agents in our academy: https://t.co/Y5kVy5iKiQ",
  "source": "Twitter for iPhone",
  "retweetCount": 51,
  "replyCount": 10,
  "likeCount": 301,
  "quoteCount": 5,
  "viewCount": 24240,
  "createdAt": "Tue Nov 25 17:13:09 +0000 2025",
  "lang": "en",
  "bookmarkCount": 229,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1993367363790717060",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "dair_ai",
    "url": "https://x.com/dair_ai",
    "twitterUrl": "https://twitter.com/dair_ai",
    "id": "889050642903293953",
    "name": "DAIR.AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1643277398522187778/31dedbLo_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/889050642903293953/1742055232",
    "description": "",
    "location": "",
    "followers": 81690,
    "following": 1,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Sun Jul 23 09:12:45 +0000 2017",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 3812,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 70,
    "statusesCount": 2575,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1994059148246598076"
    ],
    "profile_bio": {
      "description": "Democratizing AI research, education, and technologies. New Claude Code cohort: https://t.co/GZCGtVkIFm",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "dair-ai.thinkific.com/courses/claude…",
              "expanded_url": "https://dair-ai.thinkific.com/courses/claude-code",
              "indices": [
                80,
                103
              ],
              "url": "https://t.co/GZCGtVkIFm"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair.ai",
              "expanded_url": "https://www.dair.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/lkqPZtMmfU"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/L71RUOFwwx",
        "expanded_url": "https://twitter.com/dair_ai/status/1993367363790717060/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1993367359948832774",
        "indices": [
          278,
          301
        ],
        "media_key": "3_1993367359948832774",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARup3Q10m8AGCgACG6ndDlmaQIQAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG6ndDXSbwAYKAAIbqd0OWZpAhAAA",
            "media_key": "3_1993367359948832774"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G6ndDXSbwAYl72Z.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 908,
              "w": 1622,
              "x": 0,
              "y": 0
            },
            {
              "h": 1622,
              "w": 1622,
              "x": 0,
              "y": 0
            },
            {
              "h": 1780,
              "w": 1561,
              "x": 61,
              "y": 0
            },
            {
              "h": 1780,
              "w": 890,
              "x": 488,
              "y": 0
            },
            {
              "h": 1780,
              "w": 1622,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1780,
          "width": 1622
        },
        "sizes": {
          "large": {
            "h": 1780,
            "w": 1622
          }
        },
        "type": "photo",
        "url": "https://t.co/L71RUOFwwx"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "urls": [
      {
        "display_url": "arxiv.org/abs/2511.19417",
        "expanded_url": "https://arxiv.org/abs/2511.19417",
        "indices": [
          1195,
          1218
        ],
        "url": "https://t.co/VqnwooAGkJ"
      },
      {
        "display_url": "dair-ai.thinkific.com/pages/courses",
        "expanded_url": "https://dair-ai.thinkific.com/pages/courses",
        "indices": [
          1265,
          1288
        ],
        "url": "https://t.co/Y5kVy5iKiQ"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}