🐦 Twitter Post Details

Viewing enriched Twitter post

@dair_ai

Are LLMs any good for web API integrations? While we see a lot of fancy demos, the reality is that LLMs still largely struggle with web API integrations. The default assumption is that code models can handle API calls reliably. After all, they excel at general code completion tasks. But web APIs have unique challenges that break this assumption. This new research introduces WAPIIBench, a benchmark for evaluating LLM-generated web API invocation code across four real-world APIs: Asana, Google Calendar, Google Sheets, and Slack. None of the evaluated open-source models solved more than 40% of tasks. Even when given the correct endpoint, models still generated 6-31% illegal arguments. URLs get hallucinated 14-39% of the time. Why is this so hard? Web API invocations differ from regular function calls in critical ways. Operations are identified by HTTP method plus long URL strings, not simple function names. Multiple argument lists exist across body, header, and query locations. Parameters have complex nested data types. And API specifications are documented externally, limiting what models can memorize. The researchers propose a solution: constrained decoding. They automatically translate OpenAPI specifications into regex-based constraints that filter token predictions during generation. The constraints enforce compliance with API specs without requiring model modifications or prompt adjustments. Constrained decoding improves correctness by 90% on average for full completion and 135% for argument completion. Illegal URLs, methods, and arguments drop to zero. Models that previously generated zero executable code now achieve similar rates to other models. Great read for AI devs. Paper: https://t.co/OXFKJRMmJc Learn to build effective AI agents in our academy: https://t.co/zQXQt0PMbG

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2003508663466770671/media_0.jpg?",
      "filename": "media_0.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2003508663466770671/media_1.png?",
      "filename": "media_1.png"
    }
  ],
  "processed_at": "2025-12-23T20:57:40.404396",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2003508663466770671",
  "url": "https://x.com/dair_ai/status/2003508663466770671",
  "twitterUrl": "https://twitter.com/dair_ai/status/2003508663466770671",
  "text": "Are LLMs any good for web API integrations?\n\nWhile we see a lot of fancy demos, the reality is that LLMs still largely struggle with web API integrations.\n\nThe default assumption is that code models can handle API calls reliably. After all, they excel at general code completion tasks.\n\nBut web APIs have unique challenges that break this assumption.\n\nThis new research introduces WAPIIBench, a benchmark for evaluating LLM-generated web API invocation code across four real-world APIs: Asana, Google Calendar, Google Sheets, and Slack.\n\nNone of the evaluated open-source models solved more than 40% of tasks. Even when given the correct endpoint, models still generated 6-31% illegal arguments.\n\nURLs get hallucinated 14-39% of the time.\n\nWhy is this so hard?\n\nWeb API invocations differ from regular function calls in critical ways. Operations are identified by HTTP method plus long URL strings, not simple function names. Multiple argument lists exist across body, header, and query locations. Parameters have complex nested data types. And API specifications are documented externally, limiting what models can memorize.\n\nThe researchers propose a solution: constrained decoding.\n\nThey automatically translate OpenAPI specifications into regex-based constraints that filter token predictions during generation. The constraints enforce compliance with API specs without requiring model modifications or prompt adjustments.\n\nConstrained decoding improves correctness by 90% on average for full completion and 135% for argument completion. Illegal URLs, methods, and arguments drop to zero. Models that previously generated zero executable code now achieve similar rates to other models.\n\nGreat read for AI devs.\n\nPaper: https://t.co/OXFKJRMmJc\n\nLearn to build effective AI agents in our academy: https://t.co/zQXQt0PMbG",
  "source": "Twitter for iPhone",
  "retweetCount": 2,
  "replyCount": 1,
  "likeCount": 35,
  "quoteCount": 0,
  "viewCount": 4168,
  "createdAt": "Tue Dec 23 16:51:03 +0000 2025",
  "lang": "en",
  "bookmarkCount": 23,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2003508663466770671",
  "displayTextRange": [
    0,
    279
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "dair_ai",
    "url": "https://x.com/dair_ai",
    "twitterUrl": "https://twitter.com/dair_ai",
    "id": "889050642903293953",
    "name": "DAIR.AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1643277398522187778/31dedbLo_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/889050642903293953/1742055232",
    "description": "Democratizing AI research, education, and technologies.",
    "location": "",
    "followers": 83800,
    "following": 1,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Sun Jul 23 09:12:45 +0000 2017",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {
        "urls": [
          {
            "display_url": "dair.ai",
            "expanded_url": "https://www.dair.ai/",
            "url": "https://t.co/lkqPZtMmfU",
            "indices": [
              0,
              23
            ]
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 3930,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 96,
    "statusesCount": 2706,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2003508663466770671"
    ],
    "profile_bio": {
      "description": "Democratizing AI research, education, and technologies."
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.x.com/cnDNGkA4wA",
        "expanded_url": "https://x.com/dair_ai/status/2003508663466770671/photo/1",
        "id_str": "2003508659763245057",
        "indices": [
          280,
          303
        ],
        "media_key": "3_2003508659763245057",
        "media_url_https": "https://pbs.twimg.com/media/G83kgtGbcAEdWrN.jpg",
        "type": "photo",
        "url": "https://t.co/cnDNGkA4wA",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "medium": {
            "faces": []
          },
          "small": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "sizes": {
          "large": {
            "h": 1764,
            "w": 1874,
            "resize": "fit"
          },
          "medium": {
            "h": 1130,
            "w": 1200,
            "resize": "fit"
          },
          "small": {
            "h": 640,
            "w": 680,
            "resize": "fit"
          },
          "thumb": {
            "h": 150,
            "w": 150,
            "resize": "crop"
          }
        },
        "original_info": {
          "height": 1764,
          "width": 1874,
          "focus_rects": [
            {
              "x": 0,
              "y": 0,
              "w": 1874,
              "h": 1049
            },
            {
              "x": 8,
              "y": 0,
              "w": 1764,
              "h": 1764
            },
            {
              "x": 117,
              "y": 0,
              "w": 1547,
              "h": 1764
            },
            {
              "x": 449,
              "y": 0,
              "w": 882,
              "h": 1764
            },
            {
              "x": 0,
              "y": 0,
              "w": 1874,
              "h": 1764
            }
          ]
        },
        "media_results": {
          "result": {
            "media_key": "3_2003508659763245057"
          }
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "arxiv.org/abs/2509.20172",
        "expanded_url": "https://arxiv.org/abs/2509.20172",
        "url": "https://t.co/OXFKJRMmJc",
        "indices": [
          1723,
          1746
        ]
      },
      {
        "display_url": "dair-ai.thinkific.com",
        "expanded_url": "https://dair-ai.thinkific.com/",
        "url": "https://t.co/zQXQt0PMbG",
        "indices": [
          1799,
          1822
        ]
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}