🐦 Twitter Post Details

Viewing enriched Twitter post

@iScienceLuvr

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation "In this work, we analyze specific properties which make a benchmark more reliable for such decisions, and interventions to design higher-quality evaluation benchmarks. We introduce two key metrics that show differences in current benchmarks: signal, a benchmark's ability to separate better models from worse models, and noise, a benchmark's sensitivity to random variability between training steps. We demonstrate that benchmarks with a better signal-to-noise ratio are more reliable when making decisions at small scale, and those with less noise have lower scaling law prediction error. These results suggest that improving signal or noise will lead to more useful benchmarks, so we introduce three interventions designed to directly affect signal or noise."

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1958106688722243924/media_0.jpg?",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-08-20T12:48:01.585502",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1958106688722243924",
  "url": "https://x.com/iScienceLuvr/status/1958106688722243924",
  "twitterUrl": "https://twitter.com/iScienceLuvr/status/1958106688722243924",
  "text": "Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation\n\n\"In this work, we analyze specific properties which make a benchmark  more reliable for such decisions, and interventions to design  higher-quality evaluation benchmarks. We introduce two key metrics that  show differences in current benchmarks: signal, a benchmark's ability to  separate better models from worse models, and noise, a benchmark's  sensitivity to random variability between training steps. We demonstrate  that benchmarks with a better signal-to-noise ratio are more reliable  when making decisions at small scale, and those with less noise have  lower scaling law prediction error. These results suggest that improving  signal or noise will lead to more useful benchmarks, so we introduce  three interventions designed to directly affect signal or noise.\"",
  "source": "Twitter for iPhone",
  "retweetCount": 1,
  "replyCount": 2,
  "likeCount": 9,
  "quoteCount": 0,
  "viewCount": 1517,
  "createdAt": "Wed Aug 20 09:59:49 +0000 2025",
  "lang": "en",
  "bookmarkCount": 5,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1958106688722243924",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "iScienceLuvr",
    "url": "https://x.com/iScienceLuvr",
    "twitterUrl": "https://twitter.com/iScienceLuvr",
    "id": "441465751",
    "name": "Tanishq Mathew Abraham, Ph.D.",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1913710019729821696/Qge4zx6u_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/441465751/1738204246",
    "description": "",
    "location": "",
    "followers": 80199,
    "following": 1243,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Tue Dec 20 03:45:50 +0000 2011",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 104894,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 2445,
    "statusesCount": 17835,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1952221233648718307"
    ],
    "profile_bio": {
      "description": "CEO @SophontAI |\nPhD at 19 (2023) |\nFounder, ex CEO @MedARC_AI |\nex Research Director Stability AI | \nBiomed. engineer @ 14 |\nTEDx talk➡https://t.co/xPxwKTq6Qb",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "bit.ly/3tpAuan",
              "expanded_url": "https://bit.ly/3tpAuan",
              "indices": [
                136,
                159
              ],
              "url": "https://t.co/xPxwKTq6Qb"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                4,
                14
              ],
              "name": "",
              "screen_name": "SophontAI"
            },
            {
              "id_str": "0",
              "indices": [
                52,
                62
              ],
              "name": "",
              "screen_name": "MedARC_AI"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "sophontai.com",
              "expanded_url": "https://sophontai.com",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/uQ936JTZf1"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/LuYUWhdeY6",
        "expanded_url": "https://twitter.com/iScienceLuvr/status/1958106688722243924/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1958106529137319936",
        "indices": [
          278,
          301
        ],
        "media_key": "3_1958106529137319936",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARssl4OFmmAACgACGyyXqK2bEVQAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABGyyXg4WaYAAKAAIbLJeorZsRVAAA",
            "media_key": "3_1958106529137319936"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/GyyXg4WaYAAv3lU.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 790,
              "w": 1410,
              "x": 0,
              "y": 0
            },
            {
              "h": 1410,
              "w": 1410,
              "x": 0,
              "y": 0
            },
            {
              "h": 1607,
              "w": 1410,
              "x": 0,
              "y": 0
            },
            {
              "h": 1827,
              "w": 914,
              "x": 248,
              "y": 0
            },
            {
              "h": 1827,
              "w": 1410,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1827,
          "width": 1410
        },
        "sizes": {
          "large": {
            "h": 1827,
            "w": 1410
          }
        },
        "type": "photo",
        "url": "https://t.co/LuYUWhdeY6"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {},
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}