🐦 Twitter Post Details

Viewing enriched Twitter post

@tri_dao

This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right

View on Twitter

📊 Media Metadata

{
  "score": 0.4,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.1,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-03-01T12:44:28.313970",
  "import_source": "api_import",
  "source_tagged_at": "2026-03-01T12:44:28.314010",
  "enriched": true,
  "enriched_at": "2026-03-01T12:44:28.314013"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2026854280846610742",
  "url": "https://x.com/tri_dao/status/2026854280846610742",
  "twitterUrl": "https://twitter.com/tri_dao/status/2026854280846610742",
  "text": "This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right",
  "source": "Twitter for iPhone",
  "retweetCount": 20,
  "replyCount": 2,
  "likeCount": 376,
  "quoteCount": 2,
  "viewCount": 29119,
  "createdAt": "Thu Feb 26 02:58:12 +0000 2026",
  "lang": "en",
  "bookmarkCount": 79,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2026854280846610742",
  "displayTextRange": [
    0,
    273
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "tri_dao",
    "url": "https://x.com/tri_dao",
    "twitterUrl": "https://twitter.com/tri_dao",
    "id": "568879807",
    "name": "Tri Dao",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1389486324516671496/owa9Z4AC_normal.jpg",
    "coverPicture": "",
    "description": "",
    "location": "Stanford, CA",
    "followers": 36593,
    "following": 641,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Wed May 02 07:13:50 +0000 2012",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1588,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 54,
    "statusesCount": 887,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1811453622070444071"
    ],
    "profile_bio": {
      "description": "Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                11,
                23
              ],
              "name": "",
              "screen_name": "PrincetonCS"
            },
            {
              "id_str": "0",
              "indices": [
                41,
                57
              ],
              "name": "",
              "screen_name": "togethercompute"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "tridao.me",
              "expanded_url": "https://tridao.me",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/uFTGOmLPGP"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [],
    "user_mentions": [
      {
        "id_str": "876045451140530176",
        "indices": [
          47,
          60
        ],
        "name": "Mayank Mishra",
        "screen_name": "MayankMish98"
      }
    ]
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "2026769614022259079",
    "url": "https://x.com/MayankMish98/status/2026769614022259079",
    "twitterUrl": "https://twitter.com/MayankMish98/status/2026769614022259079",
    "text": "We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized).\n\nThis bug is related to 2 main issues:\n1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: https://t.co/oahfxjIsKb).\n2. Skipping initialization due to meta device init for DTensors with FSDP-2 (https://t.co/hLC8nnQFc3 will fix this issue upon merging).\n\nThe difference is substantial. Mamba-2 seems to be quite sensitive to the initialization.\nCheck out our experiments at the 7B MoE scale: https://t.co/n8iuUICRux\n\nSpecial thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏\n\nAlso thanks to @SonglinYang4 for quickly helping in merging the PR.",
    "source": "Twitter for iPhone",
    "retweetCount": 73,
    "replyCount": 17,
    "likeCount": 740,
    "quoteCount": 26,
    "viewCount": 358507,
    "createdAt": "Wed Feb 25 21:21:46 +0000 2026",
    "lang": "en",
    "bookmarkCount": 331,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2026769614022259079",
    "displayTextRange": [
      0,
      276
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "MayankMish98",
      "url": "https://x.com/MayankMish98",
      "twitterUrl": "https://twitter.com/MayankMish98",
      "id": "876045451140530176",
      "name": "Mayank Mishra",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1718510814842195968/5-DWxMbE_normal.jpg",
      "coverPicture": "",
      "description": "",
      "location": "Berkeley",
      "followers": 1219,
      "following": 423,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Sat Jun 17 11:54:46 +0000 2017",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 216,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 15,
      "statusesCount": 171,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "2001773898497741203"
      ],
      "profile_bio": {
        "description": "non-member of a non-technical staff at a non-technical non-frontier lab trying to do technical work",
        "entities": {
          "description": {
            "hashtags": [],
            "symbols": [],
            "urls": [],
            "user_mentions": []
          },
          "url": {
            "urls": [
              {
                "display_url": "mayank31398.github.io",
                "expanded_url": "https://mayank31398.github.io/",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/ARxWiQf8vu"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {},
    "card": {
      "binding_values": [
        {
          "key": "photo_image_full_size_large",
          "value": {
            "image_value": {
              "height": 419,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=800x419",
              "width": 800
            }
          }
        },
        {
          "key": "thumbnail_image",
          "value": {
            "image_value": {
              "height": 200,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=400x400",
              "width": 400
            }
          }
        },
        {
          "key": "description",
          "value": {
            "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
          }
        },
        {
          "key": "domain",
          "value": {
            "string_value": "github.com"
          }
        },
        {
          "key": "thumbnail_image_large",
          "value": {
            "image_value": {
              "height": 300,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x600",
              "width": 600
            }
          }
        },
        {
          "key": "summary_photo_image_small",
          "value": {
            "image_value": {
              "height": 202,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=386x202",
              "width": 386
            }
          }
        },
        {
          "key": "thumbnail_image_original",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
              "width": 1200
            }
          }
        },
        {
          "key": "site",
          "value": {
            "scribe_key": "publisher_id",
            "user_value": {
              "id_str": "13334762",
              "path": []
            }
          }
        },
        {
          "key": "photo_image_full_size_small",
          "value": {
            "image_value": {
              "height": 202,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=386x202",
              "width": 386
            }
          }
        },
        {
          "key": "summary_photo_image_large",
          "value": {
            "image_value": {
              "height": 419,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=800x419",
              "width": 800
            }
          }
        },
        {
          "key": "thumbnail_image_small",
          "value": {
            "image_value": {
              "height": 72,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=144x144",
              "width": 144
            }
          }
        },
        {
          "key": "thumbnail_image_x_large",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
              "width": 1200
            }
          }
        },
        {
          "key": "photo_image_full_size_original",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
              "width": 1200
            }
          }
        },
        {
          "key": "photo_image_full_size_alt_text",
          "value": {
            "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
          }
        },
        {
          "key": "vanity_url",
          "value": {
            "scribe_key": "vanity_url",
            "string_value": "github.com"
          }
        },
        {
          "key": "photo_image_full_size",
          "value": {
            "image_value": {
              "height": 314,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x314",
              "width": 600
            }
          }
        },
        {
          "key": "summary_photo_image_alt_text",
          "value": {
            "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
          }
        },
        {
          "key": "thumbnail_image_color",
          "value": {
            "image_color_value": {
              "palette": [
                {
                  "percentage": 90.23,
                  "rgb": {
                    "blue": 255,
                    "green": 255,
                    "red": 255
                  }
                },
                {
                  "percentage": 3.7,
                  "rgb": {
                    "blue": 124,
                    "green": 124,
                    "red": 124
                  }
                },
                {
                  "percentage": 3.27,
                  "rgb": {
                    "blue": 166,
                    "green": 115,
                    "red": 53
                  }
                },
                {
                  "percentage": 1.09,
                  "rgb": {
                    "blue": 45,
                    "green": 70,
                    "red": 154
                  }
                },
                {
                  "percentage": 0.89,
                  "rgb": {
                    "blue": 176,
                    "green": 198,
                    "red": 231
                  }
                }
              ]
            }
          }
        },
        {
          "key": "title",
          "value": {
            "string_value": "[MAMBA2] fix initialization for mamba2 by mayank31398 · Pull Request #739 · fla-org/flash-linear-..."
          }
        },
        {
          "key": "summary_photo_image_color",
          "value": {
            "image_color_value": {
              "palette": [
                {
                  "percentage": 90.23,
                  "rgb": {
                    "blue": 255,
                    "green": 255,
                    "red": 255
                  }
                },
                {
                  "percentage": 3.7,
                  "rgb": {
                    "blue": 124,
                    "green": 124,
                    "red": 124
                  }
                },
                {
                  "percentage": 3.27,
                  "rgb": {
                    "blue": 166,
                    "green": 115,
                    "red": 53
                  }
                },
                {
                  "percentage": 1.09,
                  "rgb": {
                    "blue": 45,
                    "green": 70,
                    "red": 154
                  }
                },
                {
                  "percentage": 0.89,
                  "rgb": {
                    "blue": 176,
                    "green": 198,
                    "red": 231
                  }
                }
              ]
            }
          }
        },
        {
          "key": "summary_photo_image_x_large",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
              "width": 1200
            }
          }
        },
        {
          "key": "summary_photo_image",
          "value": {
            "image_value": {
              "height": 314,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x314",
              "width": 600
            }
          }
        },
        {
          "key": "photo_image_full_size_color",
          "value": {
            "image_color_value": {
              "palette": [
                {
                  "percentage": 90.23,
                  "rgb": {
                    "blue": 255,
                    "green": 255,
                    "red": 255
                  }
                },
                {
                  "percentage": 3.7,
                  "rgb": {
                    "blue": 124,
                    "green": 124,
                    "red": 124
                  }
                },
                {
                  "percentage": 3.27,
                  "rgb": {
                    "blue": 166,
                    "green": 115,
                    "red": 53
                  }
                },
                {
                  "percentage": 1.09,
                  "rgb": {
                    "blue": 45,
                    "green": 70,
                    "red": 154
                  }
                },
                {
                  "percentage": 0.89,
                  "rgb": {
                    "blue": 176,
                    "green": 198,
                    "red": 231
                  }
                }
              ]
            }
          }
        },
        {
          "key": "photo_image_full_size_x_large",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
              "width": 1200
            }
          }
        },
        {
          "key": "card_url",
          "value": {
            "scribe_key": "card_url",
            "string_value": "https://t.co/oahfxjIsKb"
          }
        },
        {
          "key": "summary_photo_image_original",
          "value": {
            "image_value": {
              "height": 600,
              "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
              "width": 1200
            }
          }
        }
      ],
      "card_platform": {
        "platform": {
          "audience": {
            "name": "production"
          },
          "device": {
            "name": "iPhone",
            "version": "13"
          }
        }
      },
      "name": "summary_large_image",
      "url": "https://t.co/oahfxjIsKb",
      "user_refs_results": [
        {
          "rest_id": "13334762",
          "result": {
            "__typename": "User",
            "action_counts": {
              "favorites_count": 8632
            },
            "avatar": {
              "image_url": "https://pbs.twimg.com/profile_images/1633247750010830848/8zfRrYjA_normal.png"
            },
            "banner": {
              "image_url": "https://pbs.twimg.com/profile_banners/13334762/1765308302"
            },
            "core": {
              "created_at": "Mon Feb 11 04:41:50 +0000 2008",
              "name": "GitHub",
              "screen_name": "github"
            },
            "dm_permissions": {
              "can_dm": false
            },
            "exclusive_tweet_following": false,
            "follow_request_sent": false,
            "identity_profile_labels_highlighted_label": {},
            "location": {
              "location": "San Francisco, CA"
            },
            "media_permissions": {
              "can_media_tag": true
            },
            "notifications_settings": {
              "notifications_enabled": false
            },
            "pinned_items": {
              "tweet_ids_str": [
                "2019093909981257849"
              ]
            },
            "privacy": {
              "protected": false,
              "suspended": false
            },
            "private_super_following": false,
            "profile_bio": {
              "description": "The AI-powered developer platform to build, scale, and deliver secure software.",
              "entities": {
                "description": {
                  "hashtags": [],
                  "symbols": [],
                  "urls": [],
                  "user_mentions": []
                },
                "url": {
                  "urls": [
                    {
                      "display_url": "github.com",
                      "expanded_url": "http://github.com",
                      "indices": [
                        0,
                        23
                      ],
                      "url": "https://t.co/bbJgfyzKzp"
                    }
                  ]
                }
              }
            },
            "profile_image_shape": "Square",
            "profile_metadata": {
              "profile_interstitial_type": "",
              "profile_link_color": "981CEB"
            },
            "profile_translation": {
              "translator_type_enum": "None"
            },
            "properties": {
              "has_extended_profile": true
            },
            "relationship_counts": {
              "followers": 2616686,
              "following": 332
            },
            "relationship_perspectives": {
              "blocked_by": false,
              "blocking": false,
              "followed_by": false,
              "following": false,
              "live_following": false,
              "muting": false
            },
            "rest_id": "13334762",
            "smart_blocked_by": false,
            "smart_blocking": false,
            "super_follow_eligible": false,
            "super_followed_by": false,
            "super_following": false,
            "tweet_counts": {
              "media_tweets": 2908,
              "tweets": 10382
            },
            "verification": {
              "is_blue_verified": true,
              "verified": false,
              "verified_type": "Business"
            },
            "website": {
              "url": "https://t.co/bbJgfyzKzp"
            }
          }
        }
      ]
    },
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [
        {
          "display_url": "github.com/fla-org/flash-…",
          "expanded_url": "https://github.com/fla-org/flash-linear-attention/pull/739",
          "indices": [
            337,
            360
          ],
          "url": "https://t.co/oahfxjIsKb"
        },
        {
          "display_url": "github.com/fla-org/flash-…",
          "expanded_url": "https://github.com/fla-org/flash-linear-attention/pull/753",
          "indices": [
            440,
            463
          ],
          "url": "https://t.co/hLC8nnQFc3"
        },
        {
          "display_url": "wandb.ai/mayank31398/ma…",
          "expanded_url": "https://wandb.ai/mayank31398/mamba-test?nw=nwusermayank31398&panelDisplayName=train/lm_loss&panelSectionName=train",
          "indices": [
            637,
            660
          ],
          "url": "https://t.co/n8iuUICRux"
        }
      ],
      "user_mentions": [
        {
          "id_str": "1452403988096188418",
          "indices": [
            680,
            690
          ],
          "name": "Kevin Li",
          "screen_name": "kevinyli_"
        },
        {
          "id_str": "1667598715",
          "indices": [
            692,
            706
          ],
          "name": "Bharat",
          "screen_name": "bharatrunwal2"
        },
        {
          "id_str": "769279457387540480",
          "indices": [
            708,
            717
          ],
          "name": "Han Guo",
          "screen_name": "HanGuo97"
        },
        {
          "id_str": "568879807",
          "indices": [
            719,
            727
          ],
          "name": "Tri Dao",
          "screen_name": "tri_dao"
        },
        {
          "id_str": "1076265378118959104",
          "indices": [
            732,
            742
          ],
          "name": "Albert Gu",
          "screen_name": "_albertgu"
        },
        {
          "id_str": "1345247812666081280",
          "indices": [
            761,
            774
          ],
          "name": "Songlin Yang",
          "screen_name": "SonglinYang4"
        }
      ]
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "article": null
  },
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}