🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

When you build AI agents, don't treat prompts like config strings. Treat them like executable business logic. Because that's what they really are. @arshdilbagi's blog and this Stanford CS 224G lecture lay out one of the clearest mental models I have seen for LLM evaluation. Stop treating evals like unit tests. That works for deterministic software. For LLM products, it creates false confidence because real-world usage changes over time. Example: an insurance prompt passed 20 eval cases. The team shipped. In production, a new class of requests showed up and failed quietly. No crash, no alert, just wrong answers at scale. The fix is not "write more eval cases," which is what many teams do. It is building evals as a living feedback loop. Start with a small set, ship, watch what breaks in production, add those failures back, and re-run on every prompt or model change. What eval failure caught your team off guard? Blog: https://t.co/HCVhcow5rA Stanford CS 224G lecture: https://t.co/q667gGwckt

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2029225624825659668/media_0.jpg?",
      "filename": "media_0.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2029225624825659668/media_1.jpg?",
      "filename": "media_1.jpg"
    }
  ],
  "processed_at": "2026-03-06T14:18:59.825475",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2029225624825659668",
  "url": "https://x.com/omarsar0/status/2029225624825659668",
  "twitterUrl": "https://twitter.com/omarsar0/status/2029225624825659668",
  "text": "When you build AI agents, don't treat prompts like config strings.\n\nTreat them like executable business logic. Because that's what they really are.\n\n@arshdilbagi's blog and this Stanford CS 224G lecture lay out one of the clearest mental models I have seen for LLM evaluation.\n\nStop treating evals like unit tests.\n\nThat works for deterministic software.\n\nFor LLM products, it creates false confidence because real-world usage changes over time.\n\nExample: an insurance prompt passed 20 eval cases. The team shipped. In production, a new class of requests showed up and failed quietly. No crash, no alert, just wrong answers at scale.\n\nThe fix is not \"write more eval cases,\" which is what many teams do.\n\nIt is building evals as a living feedback loop. Start with a small set, ship, watch what breaks in production, add those failures back, and re-run on every prompt or model change.\n\nWhat eval failure caught your team off guard?\n\nBlog: https://t.co/HCVhcow5rA\nStanford CS 224G lecture: https://t.co/q667gGwckt",
  "source": "Twitter for iPhone",
  "retweetCount": 9,
  "replyCount": 13,
  "likeCount": 73,
  "quoteCount": 2,
  "viewCount": 8631,
  "createdAt": "Wed Mar 04 16:01:04 +0000 2026",
  "lang": "en",
  "bookmarkCount": 95,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2029225624825659668",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "omarsar0",
    "url": "https://x.com/omarsar0",
    "twitterUrl": "https://twitter.com/omarsar0",
    "id": "3448284313",
    "name": "elvis",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "description": "",
    "location": "DAIR.AI Academy",
    "followers": 292402,
    "following": 778,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Sep 04 12:59:26 +0000 2015",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 34957,
    "hasCustomTimelines": true,
    "isTranslator": true,
    "mediaCount": 4535,
    "statusesCount": 17409,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2029612360273420447"
    ],
    "profile_bio": {
      "description": "Building @dair_ai • Prev: Meta AI, Elastic, PhD • New AI learning portal: https://t.co/1e8RZKs4uX",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [
            {
              "display_url": "academy.dair.ai",
              "expanded_url": "https://academy.dair.ai/",
              "indices": [
                74,
                97
              ],
              "url": "https://t.co/1e8RZKs4uX"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                9,
                17
              ],
              "name": "",
              "screen_name": "dair_ai"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair.ai",
              "expanded_url": "https://www.dair.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/XQto5ypSIk"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/wj3ZdgUUna",
        "expanded_url": "https://twitter.com/omarsar0/status/2029225624825659668/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "id_str": "2029225620811726848",
        "indices": [
          277,
          300
        ],
        "media_key": "3_2029225620811726848",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARwpQfNpmsAACgACHClB9FjagRQAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHClB82mawAAKAAIcKUH0WNqBFAAA",
            "media_key": "3_2029225620811726848"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HClB82mawAA3PeD.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 1613,
              "w": 2880,
              "x": 0,
              "y": 0
            },
            {
              "h": 1620,
              "w": 1620,
              "x": 0,
              "y": 0
            },
            {
              "h": 1620,
              "w": 1421,
              "x": 0,
              "y": 0
            },
            {
              "h": 1620,
              "w": 810,
              "x": 0,
              "y": 0
            },
            {
              "h": 1620,
              "w": 2880,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1620,
          "width": 2880
        },
        "sizes": {
          "large": {
            "h": 1152,
            "w": 2048
          }
        },
        "type": "photo",
        "url": "https://t.co/wj3ZdgUUna"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "go.adaline.ai/bflXl9t",
        "expanded_url": "https://go.adaline.ai/bflXl9t",
        "indices": [
          939,
          962
        ],
        "url": "https://t.co/HCVhcow5rA"
      },
      {
        "display_url": "go.adaline.ai/D9TmfoF",
        "expanded_url": "https://go.adaline.ai/D9TmfoF",
        "indices": [
          989,
          1012
        ],
        "url": "https://t.co/q667gGwckt"
      }
    ],
    "user_mentions": [
      {
        "id_str": "227639633",
        "indices": [
          149,
          161
        ],
        "name": "Arsh Shah Dilbagi",
        "screen_name": "arshdilbagi"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}