🐦 Twitter Post Details

Viewing enriched Twitter post

@JacksonAtkinsX

NVIDIA research just made LLMs 53x faster. 🤯 Imagine slashing your AI inference budget by 98%. This breakthrough doesn't require training a new model from scratch; it upgrades your existing ones for hyper-speed while matching or beating SOTA accuracy. Here's how it works: The technique is called Post Neural Architecture Search (PostNAS). It's a revolutionary process for retrofitting pre-trained models. Freeze the Knowledge: It starts with a powerful model (like Qwen2.5) and locks down its core MLP layers, preserving its intelligence. Surgical Replacement: It then uses a hardware-aware search to replace most of the slow, O(n²) full-attention layers with a new, hyper-efficient linear attention design called JetBlock. Optimize for Throughput: The search keeps a few key full-attention layers in the exact positions needed for complex reasoning, creating a hybrid model optimized for speed on H100 GPUs. The result is Jet-Nemotron: an AI delivering 2,885 tokens per second with top-tier model performance and a 47x smaller KV cache. Why this matters to your AI strategy: - Business Leaders: A 53x speedup translates to a ~98% cost reduction for inference at scale. This fundamentally changes the ROI calculation for deploying high-performance AI. - Practitioners: This isn't just for data centers. The massive efficiency gains and tiny memory footprint (154MB cache) make it possible to deploy SOTA-level models on memory-constrained and edge hardware. - Researchers: PostNAS offers a new, capital-efficient paradigm. Instead of spending millions on pre-training, you can now innovate on architecture by modifying existing models, dramatically lowering the barrier to entry for creating novel, efficient LMs.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1960090774122483783/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1960090774122483783/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-08-27T13:16:06.810581",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1960090774122483783",
  "url": "https://x.com/JacksonAtkinsX/status/1960090774122483783",
  "twitterUrl": "https://twitter.com/JacksonAtkinsX/status/1960090774122483783",
  "text": "NVIDIA research just made LLMs 53x faster. 🤯\n\nImagine slashing your AI inference budget by 98%.\n\nThis breakthrough doesn't require training a new model from scratch; it upgrades your existing ones for hyper-speed while matching or beating SOTA accuracy.\n\nHere's how it works:\n\nThe technique is called Post Neural Architecture Search (PostNAS). It's a revolutionary process for retrofitting pre-trained models.\n\nFreeze the Knowledge: It starts with a powerful model (like Qwen2.5) and locks down its core MLP layers, preserving its intelligence.\n\nSurgical Replacement: It then uses a hardware-aware search to replace most of the slow, O(n²) full-attention layers with a new, hyper-efficient linear attention design called JetBlock.\n\nOptimize for Throughput: The search keeps a few key full-attention layers in the exact positions needed for complex reasoning, creating a hybrid model optimized for speed on H100 GPUs.\n\nThe result is Jet-Nemotron: an AI delivering 2,885 tokens per second with top-tier model performance and a 47x smaller KV cache.\n\nWhy this matters to your AI strategy:\n\n- Business Leaders: A 53x speedup translates to a ~98% cost reduction for inference at scale. This fundamentally changes the ROI calculation for deploying high-performance AI.\n\n- Practitioners: This isn't just for data centers. The massive efficiency gains and tiny memory footprint (154MB cache) make it possible to deploy SOTA-level models on memory-constrained and edge hardware.\n\n- Researchers: PostNAS offers a new, capital-efficient paradigm. Instead of spending millions on pre-training, you can now innovate on architecture by modifying existing models, dramatically lowering the barrier to entry for creating novel, efficient LMs.",
  "source": "Twitter for iPhone",
  "retweetCount": 626,
  "replyCount": 84,
  "likeCount": 3783,
  "quoteCount": 76,
  "viewCount": 384032,
  "createdAt": "Mon Aug 25 21:23:51 +0000 2025",
  "lang": "en",
  "bookmarkCount": 3603,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1960090774122483783",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "JacksonAtkinsX",
    "url": "https://x.com/JacksonAtkinsX",
    "twitterUrl": "https://twitter.com/JacksonAtkinsX",
    "id": "1913258409677512704",
    "name": "Jackson Atkins",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1943864441101021184/DbNcS5yB_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1913258409677512704/1752291678",
    "description": "",
    "location": "United States",
    "followers": 1757,
    "following": 168,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Fri Apr 18 15:49:44 +0000 2025",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1315,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 156,
    "statusesCount": 1354,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1959706713067553103"
    ],
    "profile_bio": {
      "description": "Director of Engineering. Shipped million dollar systems in days. Surfacing AI breakthroughs in academic papers. Follow for 10x gains before they hit mainstream.",
      "entities": {
        "description": {},
        "url": {
          "urls": [
            {
              "display_url": "jacksonatkins.dev",
              "expanded_url": "http://jacksonatkins.dev",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/c3VJIV2gvv"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/BuCyYewwUc",
        "expanded_url": "https://twitter.com/JacksonAtkinsX/status/1960090774122483783/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1960089790734278656",
        "indices": [
          277,
          300
        ],
        "media_key": "3_1960089790734278656",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARszo0eYliAACgACGzOkLI8XUEcAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABGzOjR5iWIAAKAAIbM6QsjxdQRwAA",
            "media_key": "3_1960089790734278656"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/GzOjR5iWIAAn3V0.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 436,
              "w": 779,
              "x": 0,
              "y": 0
            },
            {
              "h": 773,
              "w": 773,
              "x": 0,
              "y": 0
            },
            {
              "h": 773,
              "w": 678,
              "x": 0,
              "y": 0
            },
            {
              "h": 773,
              "w": 387,
              "x": 0,
              "y": 0
            },
            {
              "h": 773,
              "w": 779,
              "x": 0,
              "y": 0
            }
          ],
          "height": 773,
          "width": 779
        },
        "sizes": {
          "large": {
            "h": 773,
            "w": 779
          }
        },
        "type": "photo",
        "url": "https://t.co/BuCyYewwUc"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {},
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}