🐦 Twitter Post Details

Viewing enriched Twitter post

@rohanpaul_ai

A classic paper, collab between @AIatMeta , @GoogleDeepMind , and @NVIDIAAIDev Language models keep personal facts in a measurable amount of “storage”. This study shows how to count that storage—and when models swap memorization for real learning. 📡 The Question Can we separate a model’s rote recall of training snippets from genuine pattern learning, and measure both in bits? The paper says Yes. Encode the dataset twice: once with the target model, once with a strong reference model that cannot memorize this data. The extra bit-savings the target achieves beyond the reference counts as rote memorization. The shared savings reflect genuine pattern learning. 🧮 The Measurement Trick Treat the model like a compressor. If a data point becomes shorter when the model is present, those saved bits reveal memorization. Subtract the savings that also appear in a strong reference model; the remainder is “unintended” memorization. 📏 What the Numbers Say GPT-style transformers store roughly 3.6 bits per parameter before running out of space. Once full, memorization flattens and test loss starts the familiar “double descent” curve right when dataset information overtakes capacity. 🔄 Why It Matters Loss-based membership inference obeys a clean sigmoid: bigger models or smaller datasets raise attack success; large token-to-parameter ratios push success to random chance. Practitioners can now predict privacy risk from simple size ratios instead of running attacks. Most interesting nugget: a single scaling rule—bits per parameter—explains capacity limits, double descent, and membership-leak risk all in one shot.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1962390848815161638/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1962390848815161638/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-09-06T21:22:48.226846",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1962390848815161638",
  "url": "https://x.com/rohanpaul_ai/status/1962390848815161638",
  "twitterUrl": "https://twitter.com/rohanpaul_ai/status/1962390848815161638",
  "text": "A classic paper, collab between @AIatMeta , @GoogleDeepMind , and  @NVIDIAAIDev\n \nLanguage models keep personal facts in a measurable amount of “storage”. This study shows how to count that storage—and when models swap memorization for real learning. \n\n📡 The Question\n\nCan we separate a model’s rote recall of training snippets from genuine pattern learning, and measure both in bits? \n\nThe paper says Yes. Encode the dataset twice: once with the target model, once with a strong reference model that cannot memorize this data. The extra bit-savings the target achieves beyond the reference counts as rote memorization. The shared savings reflect genuine pattern learning.\n\n🧮 The Measurement Trick\n\nTreat the model like a compressor. If a data point becomes shorter when the model is present, those saved bits reveal memorization. Subtract the savings that also appear in a strong reference model; the remainder is “unintended” memorization. \n\n📏 What the Numbers Say\n\nGPT-style transformers store roughly 3.6 bits per parameter before running out of space. Once full, memorization flattens and test loss starts the familiar “double descent” curve right when dataset information overtakes capacity. \n\n🔄 Why It Matters\n\nLoss-based membership inference obeys a clean sigmoid: bigger models or smaller datasets raise attack success; large token-to-parameter ratios push success to random chance. Practitioners can now predict privacy risk from simple size ratios instead of running attacks. \n\nMost interesting nugget: a single scaling rule—bits per parameter—explains capacity limits, double descent, and membership-leak risk all in one shot.",
  "source": "Twitter for iPhone",
  "retweetCount": 169,
  "replyCount": 16,
  "likeCount": 937,
  "quoteCount": 8,
  "viewCount": 99363,
  "createdAt": "Mon Sep 01 05:43:32 +0000 2025",
  "lang": "en",
  "bookmarkCount": 939,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1962390848815161638",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "rohanpaul_ai",
    "url": "https://x.com/rohanpaul_ai",
    "twitterUrl": "https://twitter.com/rohanpaul_ai",
    "id": "2588345408",
    "name": "Rohan Paul",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1816185267037859840/Fd18CH0v_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/2588345408/1729559315",
    "description": "",
    "location": "Ex Inv Banking (Deutsche)",
    "followers": 82951,
    "following": 8307,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Jun 25 22:38:54 +0000 2014",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 45808,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 21170,
    "statusesCount": 52386,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1964438175729979830"
    ],
    "profile_bio": {
      "description": "Compiling in real-time, the race towards AGI.\n\n🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "rohan-paul.com",
              "expanded_url": "https://www.rohan-paul.com",
              "indices": [
                125,
                148
              ],
              "url": "https://t.co/6LBxO8215l"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "rohan-paul.com",
              "expanded_url": "http://www.rohan-paul.com",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/2NKnK0wIil"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/8BGzWKuiYb",
        "expanded_url": "https://twitter.com/rohanpaul_ai/status/1962390848815161638/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": [
              {
                "h": 58,
                "w": 58,
                "x": 746,
                "y": 164
              }
            ]
          },
          "orig": {
            "faces": [
              {
                "h": 58,
                "w": 58,
                "x": 746,
                "y": 164
              }
            ]
          }
        },
        "id_str": "1962385669684424704",
        "indices": [
          277,
          300
        ],
        "media_key": "3_1962385669684424704",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARs7y16R23AACgACGzvQFG4a4SYAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABGzvLXpHbcAAKAAIbO9AUbhrhJgAA",
            "media_key": "3_1962385669684424704"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/GzvLXpHbcAANMfI.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 478,
              "w": 854,
              "x": 0,
              "y": 0
            },
            {
              "h": 809,
              "w": 809,
              "x": 0,
              "y": 0
            },
            {
              "h": 809,
              "w": 710,
              "x": 0,
              "y": 0
            },
            {
              "h": 809,
              "w": 405,
              "x": 0,
              "y": 0
            },
            {
              "h": 809,
              "w": 854,
              "x": 0,
              "y": 0
            }
          ],
          "height": 809,
          "width": 854
        },
        "sizes": {
          "large": {
            "h": 809,
            "w": 854
          }
        },
        "type": "photo",
        "url": "https://t.co/8BGzWKuiYb"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "user_mentions": [
      {
        "id_str": "1034844617261248512",
        "indices": [
          32,
          41
        ],
        "name": "AI at Meta",
        "screen_name": "AIatMeta"
      },
      {
        "id_str": "4783690002",
        "indices": [
          44,
          59
        ],
        "name": "Google DeepMind",
        "screen_name": "GoogleDeepMind"
      },
      {
        "id_str": "877952584333410305",
        "indices": [
          67,
          79
        ],
        "name": "NVIDIA AI Developer",
        "screen_name": "NVIDIAAIDev"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}