🐦 Twitter Post Details

Viewing enriched Twitter post

@LiorOnAI

Google's latest paper on Compression is the future. Here's why. They compressed LLM memory 6x with zero accuracy loss. When ChatGPT writes a reply, it remembers every word you've said. That memory is stored in a growing notebook (KV cache). A 100,000-word conversation can eat 16 GB of GPU memory. That's half of what most high-end GPUs even have. This is the #1 cost of running AI. Not the thinking. The remembering. TurboQuant shrinks each number in that notebook from 32 bits to just 3. That's like replacing a full paragraph with three words and losing nothing. No retraining. Works on any model instantly. Compressing numbers usually destroys their meaning. Here's how they solved it: 1. Rotate the numbers randomly so they all land on a predictable curve (PolarQuant) 2. Use one extra bit to fix the tiny errors left behind (QJL) Once numbers are predictable, you need far fewer bits to store them. The results: > 8x faster on Nvidia H100 GPUs > 16 GB notebook shrinks to under 3 GB > Search indexing drops from 500 seconds to 0.001 > Accuracy identical to the uncompressed model There's a proven math limit on how good compression can get. TurboQuant is only 2.7x above that floor. We're near the ceiling. Every company running LLMs spends most of its budget on memory. This cuts that cost by over 80%. The race is no longer about bigger models. It's about cheaper inference. Models that needed a $200K server cluster start fitting on a single $2K GPU. AI agents run 24/7 without burning budgets. The companies that win won't just have the best models. They'll have the best compression. Papers are open-access on arXiv, presented at ICLR.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2037907703188619605/media_0.jpg",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2026-03-28T15:02:03.756256",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2037907703188619605",
  "url": "https://x.com/LiorOnAI/status/2037907703188619605",
  "twitterUrl": "https://twitter.com/LiorOnAI/status/2037907703188619605",
  "text": "Google's latest paper on Compression is the future.\n\nHere's why.\n\nThey compressed LLM memory 6x with zero accuracy loss.\n\nWhen ChatGPT writes a reply, it remembers every word you've said. That memory is stored in a growing notebook (KV cache). \n\nA 100,000-word conversation can eat 16 GB of GPU memory. That's half of what most high-end GPUs even have.\n\nThis is the #1 cost of running AI. Not the thinking. The remembering.\n\nTurboQuant shrinks each number in that notebook from 32 bits to just 3. \n\nThat's like replacing a full paragraph with three words and losing nothing. No retraining. Works on any model instantly.\n\nCompressing numbers usually destroys their meaning. Here's how they solved it:\n\n1. Rotate the numbers randomly so they all land on a predictable curve (PolarQuant)\n2. Use one extra bit to fix the tiny errors left behind (QJL)\n\nOnce numbers are predictable, you need far fewer bits to store them.\n\nThe results:\n> 8x faster on Nvidia H100 GPUs\n> 16 GB notebook shrinks to under 3 GB\n> Search indexing drops from 500 seconds to 0.001\n> Accuracy identical to the uncompressed model\n\nThere's a proven math limit on how good compression can get. TurboQuant is only 2.7x above that floor. We're near the ceiling.\n\nEvery company running LLMs spends most of its budget on memory. This cuts that cost by over 80%.\n\nThe race is no longer about bigger models. It's about cheaper inference. \n\nModels that needed a $200K server cluster start fitting on a single $2K GPU. AI agents run 24/7 without burning budgets.\n\nThe companies that win won't just have the best models. They'll have the best compression.\n\nPapers are open-access on arXiv, presented at ICLR.",
  "source": "Twitter for iPhone",
  "retweetCount": 1,
  "replyCount": 1,
  "likeCount": 0,
  "quoteCount": 0,
  "viewCount": 39,
  "createdAt": "Sat Mar 28 15:00:33 +0000 2026",
  "lang": "en",
  "bookmarkCount": 0,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2037907703188619605",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "LiorOnAI",
    "url": "https://x.com/LiorOnAI",
    "twitterUrl": "https://twitter.com/LiorOnAI",
    "id": "931470139",
    "name": "Lior Alexander",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/2032256308196564993/ozddLZ2O_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/931470139/1761077189",
    "description": "Building the Bloomberg of AI @AlphaSignalAI (280K subs) • MIT lecturer • MILA researcher • 9 yrs in ML",
    "location": "San Francisco, CA",
    "followers": 114237,
    "following": 2238,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Nov 07 07:19:36 +0000 2012",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {
        "urls": [
          {
            "display_url": "alphasignal.ai",
            "expanded_url": "https://alphasignal.ai",
            "indices": [
              0,
              23
            ],
            "url": "https://t.co/AyubevaLcb"
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 6826,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 665,
    "statusesCount": 3794,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.x.com/cL6POieUW4",
        "expanded_url": "https://x.com/LiorOnAI/status/2037907703188619605/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "medium": {
            "faces": []
          },
          "orig": {
            "faces": []
          },
          "small": {
            "faces": []
          }
        },
        "id_str": "2037907700223197184",
        "indices": [
          278,
          301
        ],
        "media_key": "3_2037907700223197184",
        "media_results": {
          "result": {
            "media_key": "3_2037907700223197184"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HEgaQYzWQAAYFJy.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 725,
              "w": 1294,
              "x": 0,
              "y": 0
            },
            {
              "h": 1270,
              "w": 1270,
              "x": 12,
              "y": 0
            },
            {
              "h": 1270,
              "w": 1114,
              "x": 90,
              "y": 0
            },
            {
              "h": 1270,
              "w": 635,
              "x": 330,
              "y": 0
            },
            {
              "h": 1270,
              "w": 1294,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1270,
          "width": 1294
        },
        "sizes": {
          "large": {
            "h": 1270,
            "resize": "fit",
            "w": 1294
          },
          "medium": {
            "h": 1178,
            "resize": "fit",
            "w": 1200
          },
          "small": {
            "h": 667,
            "resize": "fit",
            "w": 680
          },
          "thumb": {
            "h": 150,
            "resize": "crop",
            "w": 150
          }
        },
        "type": "photo",
        "url": "https://t.co/cL6POieUW4"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}