🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

NEW research from Google on effective agent scaling. More tool calls don't always mean better agents. The default approach to scaling tool-augmented agents today remains throwing more resources at the problem such as more search queries, API calls, and more budget. But agents lack budget awareness and quickly hit a performance ceiling. This new research introduces BATS (Budget Aware Test-time Scaling), a framework that makes agents explicitly aware of their resource constraints and dynamically adapts planning and verification strategies based on remaining budget. Standard agents don't know how much budget they have left. Without explicit signals, they perform shallow searches and fail to utilize additional resources even when available. Simply granting more tool calls doesn't help because agents terminate early, believing they've found sufficient answers or concluding they're stuck. Budget Tracker is a lightweight plug-in that surfaces real-time budget states inside the agent's reasoning loop. At each step, the agent sees exactly how many tool calls remain and adapts accordingly. Results: Budget Tracker achieves comparable accuracy to ReAct with 10x less budget (10 vs 100 tool calls), using 40.4% fewer search calls, 21.4% fewer browse calls, and reducing overall cost by 31.3%. BATS goes further by making budget awareness shape the entire orchestration. A planning module adjusts exploration breadth and verification depth based on remaining resources. A self-verification module decides whether to dig deeper on a promising lead or pivot to alternative paths. On BrowseComp, BATS with Gemini-2.5-Pro achieves 24.6% accuracy versus 12.6% for ReAct under identical 100-tool budgets. On BrowseComp-ZH, 46.0% versus 31.5%. On HLE-Search, 27.0% versus 20.5%. All without any task-specific training. Budget-aware design produces more favorable scaling curves and pushes the cost-performance Pareto frontier, achieving higher performance while using fewer resources. It's all about wise-spending. Paper: https://t.co/hqZQoXDnX7 Learn to build effective AI Agents in our academy: https://t.co/JBU5beIoD0

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000225781927325863/media_0.png?",
      "filename": "media_0.png"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000225781927325863/media_1.png?",
      "filename": "media_1.png"
    }
  ],
  "processed_at": "2025-12-14T15:30:58.103496",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2000225781927325863",
  "url": "https://x.com/omarsar0/status/2000225781927325863",
  "twitterUrl": "https://twitter.com/omarsar0/status/2000225781927325863",
  "text": "NEW research from Google on effective agent scaling.\n\nMore tool calls don't always mean better agents. The default approach to scaling tool-augmented agents today remains throwing more resources at the problem such as more search queries, API calls, and more budget.\n\nBut agents lack budget awareness and quickly hit a performance ceiling.\n\nThis new research introduces BATS (Budget Aware Test-time Scaling), a framework that makes agents explicitly aware of their resource constraints and dynamically adapts planning and verification strategies based on remaining budget.\n\nStandard agents don't know how much budget they have left. Without explicit signals, they perform shallow searches and fail to utilize additional resources even when available. Simply granting more tool calls doesn't help because agents terminate early, believing they've found sufficient answers or concluding they're stuck.\n\nBudget Tracker is a lightweight plug-in that surfaces real-time budget states inside the agent's reasoning loop. At each step, the agent sees exactly how many tool calls remain and adapts accordingly.\n\nResults:\n\nBudget Tracker achieves comparable accuracy to ReAct with 10x less budget (10 vs 100 tool calls), using 40.4% fewer search calls, 21.4% fewer browse calls, and reducing overall cost by 31.3%.\n\nBATS goes further by making budget awareness shape the entire orchestration. A planning module adjusts exploration breadth and verification depth based on remaining resources. A self-verification module decides whether to dig deeper on a promising lead or pivot to alternative paths.\n\nOn BrowseComp, BATS with Gemini-2.5-Pro achieves 24.6% accuracy versus 12.6% for ReAct under identical 100-tool budgets. On BrowseComp-ZH, 46.0% versus 31.5%. On HLE-Search, 27.0% versus 20.5%. All without any task-specific training.\n\nBudget-aware design produces more favorable scaling curves and pushes the cost-performance Pareto frontier, achieving higher performance while using fewer resources. It's all about wise-spending.\n\nPaper: https://t.co/hqZQoXDnX7\n\nLearn to build effective AI Agents in our academy: https://t.co/JBU5beIoD0",
  "source": "Twitter for iPhone",
  "retweetCount": 2,
  "replyCount": 0,
  "likeCount": 9,
  "quoteCount": 0,
  "viewCount": 354,
  "createdAt": "Sun Dec 14 15:26:03 +0000 2025",
  "lang": "en",
  "bookmarkCount": 4,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2000225781927325863",
  "displayTextRange": [
    0,
    279
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "omarsar0",
    "url": "https://x.com/omarsar0",
    "twitterUrl": "https://twitter.com/omarsar0",
    "id": "3448284313",
    "name": "elvis",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "description": "",
    "location": "DAIR.AI Academy",
    "followers": 279198,
    "following": 734,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Sep 04 12:59:26 +0000 2015",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 33926,
    "hasCustomTimelines": true,
    "isTranslator": true,
    "mediaCount": 4378,
    "statusesCount": 16738,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2000225781927325863"
    ],
    "profile_bio": {
      "description": "Building @dair_ai • Prev: Meta AI, Elastic, PhD • New cohort: https://t.co/GZMhf39NRs",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "dair-ai.thinkific.com/courses/claude…",
              "expanded_url": "https://dair-ai.thinkific.com/courses/claude-code-for-everyone-2",
              "indices": [
                62,
                85
              ],
              "url": "https://t.co/GZMhf39NRs"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                9,
                17
              ],
              "name": "",
              "screen_name": "dair_ai"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair.ai",
              "expanded_url": "https://www.dair.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/XQto5ypkSM"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/tkfx8FSK75",
        "expanded_url": "https://twitter.com/omarsar0/status/2000225781927325863/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "2000225778412449792",
        "indices": [
          280,
          303
        ],
        "media_key": "3_2000225778412449792",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARvCOr9w2nAACgACG8I6wEJbMKcAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG8I6v3DacAAKAAIbwjrAQlswpwAA",
            "media_key": "3_2000225778412449792"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G8I6v3DacAAVwoY.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 905,
              "w": 1616,
              "x": 0,
              "y": 0
            },
            {
              "h": 1364,
              "w": 1364,
              "x": 0,
              "y": 0
            },
            {
              "h": 1364,
              "w": 1196,
              "x": 0,
              "y": 0
            },
            {
              "h": 1364,
              "w": 682,
              "x": 0,
              "y": 0
            },
            {
              "h": 1364,
              "w": 1616,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1364,
          "width": 1616
        },
        "sizes": {
          "large": {
            "h": 1364,
            "w": 1616
          }
        },
        "type": "photo",
        "url": "https://t.co/tkfx8FSK75"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "urls": [
      {
        "display_url": "arxiv.org/abs/2511.17006",
        "expanded_url": "https://arxiv.org/abs/2511.17006",
        "indices": [
          2030,
          2053
        ],
        "url": "https://t.co/hqZQoXDnX7"
      },
      {
        "display_url": "dair-ai.thinkific.com",
        "expanded_url": "https://dair-ai.thinkific.com/",
        "indices": [
          2106,
          2129
        ],
        "url": "https://t.co/JBU5beIoD0"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}