๐Ÿฆ Twitter Post Details

Viewing enriched Twitter post

@s_batzoglou

RT @_avichawla: Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transfoโ€ฆ

๐Ÿ“Š Media Metadata

{
  "score": 0.34,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.04000000000000001,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-03-16T16:01:01.217401",
  "import_source": "api_import",
  "source_tagged_at": "2026-03-16T16:01:01.217412",
  "enriched": true,
  "enriched_at": "2026-03-16T16:01:01.217415"
}

๐Ÿ”ง Raw API Response

{
  "type": "tweet",
  "id": "2033571777360613501",
  "url": "https://x.com/s_batzoglou/status/2033571777360613501",
  "twitterUrl": "https://twitter.com/s_batzoglou/status/2033571777360613501",
  "text": "RT @_avichawla: Big release from Kimi!\n\nThey just released a new way to handle residual connections in Transformers.\n\nIn a standard Transfoโ€ฆ",
  "source": "Twitter for iPhone",
  "retweetCount": 107,
  "replyCount": 55,
  "likeCount": 1202,
  "quoteCount": 9,
  "viewCount": 160663,
  "createdAt": "Mon Mar 16 15:51:08 +0000 2026",
  "lang": "en",
  "bookmarkCount": 734,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2033571777360613501",
  "displayTextRange": [
    0,
    140
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "s_batzoglou",
    "url": "https://x.com/s_batzoglou",
    "twitterUrl": "https://twitter.com/s_batzoglou",
    "id": "1518735949458378752",
    "name": "Serafim Batzoglou",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1518736918527152128/hV7H_k58_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1518735949458378752/1731329061",
    "description": "Genomics-computation-ML-biotech-foundations of math-philosophy of mind; CDO @seer_bio; former prof @StanfordAILab; cofounder @dnanexus; opinions entirely my own",
    "location": "San Francisco and Miami",
    "followers": 3068,
    "following": 827,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Mon Apr 25 23:37:36 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 39294,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 284,
    "statusesCount": 5878,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [],
    "user_mentions": [
      {
        "id_str": "1175166450832687104",
        "indices": [
          3,
          14
        ],
        "name": "Avi Chawla",
        "screen_name": "_avichawla"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": {
    "type": "tweet",
    "id": "2033472650836914495",
    "url": "https://x.com/_avichawla/status/2033472650836914495",
    "twitterUrl": "https://twitter.com/_avichawla/status/2033472650836914495",
    "text": "Big release from Kimi!\n\nThey just released a new way to handle residual connections in Transformers.\n\nIn a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection.\n\nIf you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs.\n\nEvery layer contributes with weight=1, so every layer gets equal importance.\n\nThis creates a problem called PreNorm dilution, where as the hidden state accumulates layer after layer, its magnitude grows linearly with depth.\n\nAnd any new layer's contribution gets progressively buried in the already-massive residual. This means deeper layers are then forced to produce increasingly large outputs just to have any influence, which destabilizes training.\n\nHere's what the Kimi team observed and did:\n\nRNNs compress all prior token information into a single state across time, leading to problems with handling long-range dependencies. And residual connections compress all prior layer information into a single state across depth.\n\nTransformers solved the first problem by replacing recurrence with attention. This was applied along the sequence dimension.\n\nNow they introduced Attention Residuals, which applies a similar idea to depth.\n\nInstead of adding all previous layer outputs with a fixed weight of 1, each layer now uses softmax attention to selectively decide how much weight each previous layer's output should receive.\n\nSo each layer gets a single learned query vector, and it attends over all previous layer outputs to compute a weighted combination.\n\nThe weights are input-dependent, so different tokens can retrieve different layer representations based on what's actually useful.\n\nThis is Full Attention Residuals (shown in the second diagram below).\n\nBut here's the practical problem with this idea.\n\nFull AttnRes requires keeping all layer outputs in memory and communicating them across pipeline stages during distributed training.\n\nTo solve this, they introduce Block Attention Residuals (shown in the third diagram below).\n\nThe idea is to group consecutive layers into roughly 8 blocks.\n\nWithin each block, layer outputs are summed via standard residuals. But across blocks, the attention mechanism selectively combines block-level representations.\n\nThis drops memory from O(Ld) to O(Nd), where N is the number of blocks.\n\nLayers within the current block can also attend to the partial sum of what's been computed so far inside that block, so local information flow isn't lost.\n\nAnd the raw token embedding is always available as a separate source, which means any layer in the network can selectively reach back to the original input.\n\nResults from the paper:\n\n- Block AttnRes matches the loss of a baseline LLM trained with 1.25x more compute.\n\n- Inference latency overhead is less than 2%, making it a practical drop-in replacement\n\n- On a 48B parameter Kimi Linear model (3B activated) trained on 1.4T tokens, it improved every benchmark they tested: GPQA-Diamond +7.5, Math +3.6, HumanEval +3.1, MMLU +1.1\n\nThe residual connection has mostly been unchanged since ResNet in 2015.\n\nThis might be the first modification that's both theoretically motivated and practically deployable at scale with negligible overhead.\n\nMore details in the post below by Kimi๐Ÿ‘‡\n____\nFind me โ†’  @_avichawla\nEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.",
    "source": "Twitter for iPhone",
    "retweetCount": 107,
    "replyCount": 55,
    "likeCount": 1202,
    "quoteCount": 9,
    "viewCount": 160663,
    "createdAt": "Mon Mar 16 09:17:14 +0000 2026",
    "lang": "en",
    "bookmarkCount": 734,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2033472650836914495",
    "displayTextRange": [
      0,
      280
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "_avichawla",
      "url": "https://x.com/_avichawla",
      "twitterUrl": "https://twitter.com/_avichawla",
      "id": "1175166450832687104",
      "name": "Avi Chawla",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1868297128801390593/Ovl677JQ_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/1175166450832687104/1734257238",
      "description": "Daily tutorialsย and insights on DS, ML, LLMs, and RAGs โ€ข Co-founder @dailydoseofds_ โ€ข IIT Varanasi โ€ข ex-AI Engineer @ MastercardAI",
      "location": "Learn AI Engineering โ†’",
      "followers": 62058,
      "following": 155,
      "status": "",
      "canDm": true,
      "canMediaTag": true,
      "createdAt": "Fri Sep 20 21:55:02 +0000 2019",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {
          "urls": [
            {
              "display_url": "join.dailydoseofds.com",
              "expanded_url": "https://join.dailydoseofds.com/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/er9j5SIvQo"
            }
          ]
        }
      },
      "fastFollowersCount": 0,
      "favouritesCount": 2211,
      "hasCustomTimelines": false,
      "isTranslator": false,
      "mediaCount": 2392,
      "statusesCount": 4765,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "1911306413932163338"
      ],
      "profile_bio": {},
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "display_url": "pic.x.com/5i5AN9tzIm",
          "expanded_url": "https://x.com/_avichawla/status/2033472650836914495/photo/1",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": []
            },
            "medium": {
              "faces": []
            },
            "orig": {
              "faces": []
            },
            "small": {
              "faces": []
            }
          },
          "id_str": "2033472644277063681",
          "indices": [
            281,
            304
          ],
          "media_key": "3_2033472644277063681",
          "media_results": {
            "result": {
              "media_key": "3_2033472644277063681"
            }
          },
          "media_url_https": "https://pbs.twimg.com/media/HDhYmJ6b0AEDbMk.jpg",
          "original_info": {
            "focus_rects": [
              {
                "h": 2003,
                "w": 3576,
                "x": 0,
                "y": 0
              },
              {
                "h": 2107,
                "w": 2107,
                "x": 1469,
                "y": 0
              },
              {
                "h": 2107,
                "w": 1848,
                "x": 1728,
                "y": 0
              },
              {
                "h": 2107,
                "w": 1054,
                "x": 2237,
                "y": 0
              },
              {
                "h": 2107,
                "w": 3576,
                "x": 0,
                "y": 0
              }
            ],
            "height": 2107,
            "width": 3576
          },
          "sizes": {
            "large": {
              "h": 1207,
              "resize": "fit",
              "w": 2048
            },
            "medium": {
              "h": 707,
              "resize": "fit",
              "w": 1200
            },
            "small": {
              "h": 401,
              "resize": "fit",
              "w": 680
            },
            "thumb": {
              "h": 150,
              "resize": "crop",
              "w": 150
            }
          },
          "type": "photo",
          "url": "https://t.co/5i5AN9tzIm"
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [],
      "user_mentions": [
        {
          "id_str": "1175166450832687104",
          "indices": [
            3370,
            3381
          ],
          "name": "Avi Chawla",
          "screen_name": "_avichawla"
        }
      ]
    },
    "quoted_tweet": {
      "type": "tweet",
      "id": "2033378587878072424",
      "url": "https://x.com/Kimi_Moonshot/status/2033378587878072424",
      "twitterUrl": "https://twitter.com/Kimi_Moonshot/status/2033378587878072424",
      "text": "Introducing ๐‘จ๐’•๐’•๐’†๐’๐’•๐’Š๐’๐’ ๐‘น๐’†๐’”๐’Š๐’…๐’–๐’‚๐’๐’”: Rethinking depth-wise aggregation.\n\nResidual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers.\n\n๐Ÿ”น Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth.\n๐Ÿ”น Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale.\n๐Ÿ”น Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead.\n๐Ÿ”น Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains.\n\n๐Ÿ”—Full report:\nhttps://t.co/u3EHICG05h",
      "source": "Twitter for iPhone",
      "retweetCount": 1188,
      "replyCount": 197,
      "likeCount": 8151,
      "quoteCount": 305,
      "viewCount": 2144650,
      "createdAt": "Mon Mar 16 03:03:28 +0000 2026",
      "lang": "en",
      "bookmarkCount": 5953,
      "isReply": false,
      "inReplyToId": null,
      "conversationId": "2033378587878072424",
      "displayTextRange": [
        0,
        261
      ],
      "inReplyToUserId": null,
      "inReplyToUsername": null,
      "author": {
        "type": "user",
        "userName": "Kimi_Moonshot",
        "url": "https://x.com/Kimi_Moonshot",
        "twitterUrl": "https://twitter.com/Kimi_Moonshot",
        "id": "1863959670169501696",
        "name": "Kimi.ai",
        "isVerified": false,
        "isBlueVerified": false,
        "verifiedType": "Business",
        "profilePicture": "https://pbs.twimg.com/profile_images/1910294000927645696/QseOV0uF_normal.png",
        "coverPicture": "https://pbs.twimg.com/profile_banners/1863959670169501696/1733238156",
        "description": "Built by Moonshot AI to empower everyone to be superhuman. โšก๏ธAPI: https://t.co/ggYlFf809H\n@KimiProduct where we share cool use cases and prompts.",
        "location": "",
        "followers": 127297,
        "following": 132,
        "status": "",
        "canDm": false,
        "canMediaTag": true,
        "createdAt": "Tue Dec 03 14:54:14 +0000 2024",
        "entities": {
          "description": {
            "urls": [
              {
                "display_url": "platform.moonshot.ai",
                "expanded_url": "https://platform.moonshot.ai/",
                "indices": [
                  66,
                  89
                ],
                "url": "https://t.co/ggYlFf809H"
              }
            ]
          },
          "url": {
            "urls": [
              {
                "display_url": "kimi.com",
                "expanded_url": "https://www.kimi.com/",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/mlnKFmsdLe"
              }
            ]
          }
        },
        "fastFollowersCount": 0,
        "favouritesCount": 255,
        "hasCustomTimelines": false,
        "isTranslator": false,
        "mediaCount": 111,
        "statusesCount": 298,
        "withheldInCountries": [],
        "affiliatesHighlightedLabel": {},
        "possiblySensitive": false,
        "pinnedTweetIds": [
          "2016024049869324599"
        ],
        "profile_bio": {},
        "isAutomated": false,
        "automatedBy": null
      },
      "extendedEntities": {
        "media": [
          {
            "allow_download_status": {
              "allow_download": true
            },
            "display_url": "pic.x.com/gcWyzhZVc0",
            "expanded_url": "https://x.com/Kimi_Moonshot/status/2033378587878072424/photo/1",
            "ext_media_availability": {
              "status": "Available"
            },
            "features": {
              "large": {
                "faces": []
              },
              "medium": {
                "faces": []
              },
              "orig": {
                "faces": []
              },
              "small": {
                "faces": []
              }
            },
            "id_str": "2033378144850530304",
            "indices": [
              262,
              285
            ],
            "media_key": "3_2033378144850530304",
            "media_results": {
              "result": {
                "media_key": "3_2033378144850530304"
              }
            },
            "media_url_https": "https://pbs.twimg.com/media/HDgCpkHb0AA0a7_.jpg",
            "original_info": {
              "focus_rects": [
                {
                  "h": 553,
                  "w": 987,
                  "x": 0,
                  "y": 0
                },
                {
                  "h": 987,
                  "w": 987,
                  "x": 0,
                  "y": 0
                },
                {
                  "h": 1125,
                  "w": 987,
                  "x": 0,
                  "y": 0
                },
                {
                  "h": 1280,
                  "w": 640,
                  "x": 159,
                  "y": 0
                },
                {
                  "h": 1280,
                  "w": 987,
                  "x": 0,
                  "y": 0
                }
              ],
              "height": 1280,
              "width": 987
            },
            "sizes": {
              "large": {
                "h": 1280,
                "resize": "fit",
                "w": 987
              },
              "medium": {
                "h": 1200,
                "resize": "fit",
                "w": 925
              },
              "small": {
                "h": 680,
                "resize": "fit",
                "w": 524
              },
              "thumb": {
                "h": 150,
                "resize": "crop",
                "w": 150
              }
            },
            "type": "photo",
            "url": "https://t.co/gcWyzhZVc0"
          }
        ]
      },
      "card": null,
      "place": {},
      "entities": {
        "hashtags": [],
        "symbols": [],
        "urls": [
          {
            "display_url": "github.com/MoonshotAI/Attโ€ฆ",
            "expanded_url": "https://github.com/MoonshotAI/Attention-Residuals/blob/master/Attention_Residuals.pdf",
            "indices": [
              847,
              870
            ],
            "url": "https://t.co/u3EHICG05h"
          }
        ],
        "user_mentions": []
      },
      "quoted_tweet": null,
      "retweeted_tweet": null,
      "article": null
    },
    "retweeted_tweet": null,
    "article": null
  },
  "article": null
}