🐦 Twitter Post Details

Viewing enriched Twitter post

@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: https://t.co/oahfxjIsKb). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (https://t.co/hLC8nnQFc3 will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: https://t.co/n8iuUICRux Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_1.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_1.jpg?",
      "type": "photo",
      "filename": "media_1.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_2.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2026769614022259079/media_2.jpg?",
      "type": "photo",
      "filename": "media_2.jpg"
    }
  ],
  "processed_at": "2026-03-01T19:44:33.366720",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2026769614022259079",
  "url": "https://x.com/MayankMish98/status/2026769614022259079",
  "twitterUrl": "https://twitter.com/MayankMish98/status/2026769614022259079",
  "text": "We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized).\n\nThis bug is related to 2 main issues:\n1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: https://t.co/oahfxjIsKb).\n2. Skipping initialization due to meta device init for DTensors with FSDP-2 (https://t.co/hLC8nnQFc3 will fix this issue upon merging).\n\nThe difference is substantial. Mamba-2 seems to be quite sensitive to the initialization.\nCheck out our experiments at the 7B MoE scale: https://t.co/n8iuUICRux\n\nSpecial thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏\n\nAlso thanks to @SonglinYang4 for quickly helping in merging the PR.",
  "source": "Twitter for iPhone",
  "retweetCount": 73,
  "replyCount": 17,
  "likeCount": 740,
  "quoteCount": 26,
  "viewCount": 358507,
  "createdAt": "Wed Feb 25 21:21:46 +0000 2026",
  "lang": "en",
  "bookmarkCount": 331,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2026769614022259079",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "MayankMish98",
    "url": "https://x.com/MayankMish98",
    "twitterUrl": "https://twitter.com/MayankMish98",
    "id": "876045451140530176",
    "name": "Mayank Mishra",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1718510814842195968/5-DWxMbE_normal.jpg",
    "coverPicture": "",
    "description": "",
    "location": "Berkeley",
    "followers": 1219,
    "following": 423,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Sat Jun 17 11:54:46 +0000 2017",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 216,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 15,
    "statusesCount": 171,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2001773898497741203"
    ],
    "profile_bio": {
      "description": "non-member of a non-technical staff at a non-technical non-frontier lab trying to do technical work",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": []
        },
        "url": {
          "urls": [
            {
              "display_url": "mayank31398.github.io",
              "expanded_url": "https://mayank31398.github.io/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/ARxWiQf8vu"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 200,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=400x400",
            "width": 400
          }
        }
      },
      {
        "key": "description",
        "value": {
          "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "github.com"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 300,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x600",
            "width": 600
          }
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
            "width": 1200
          }
        }
      },
      {
        "key": "site",
        "value": {
          "scribe_key": "publisher_id",
          "user_value": {
            "id_str": "13334762",
            "path": []
          }
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 72,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=144x144",
            "width": 144
          }
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
            "width": 1200
          }
        }
      },
      {
        "key": "photo_image_full_size_alt_text",
        "value": {
          "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "github.com"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "summary_photo_image_alt_text",
        "value": {
          "string_value": "The init in FLA repo for mamba2 is incorrect. This PR fixes the issue. After the fix, Mamba2 outperforms GDN at 7B MoE scale (1B active params). The difference between wrong and fixed init is signi..."
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 90.23,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 3.7,
                "rgb": {
                  "blue": 124,
                  "green": 124,
                  "red": 124
                }
              },
              {
                "percentage": 3.27,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 1.09,
                "rgb": {
                  "blue": 45,
                  "green": 70,
                  "red": 154
                }
              },
              {
                "percentage": 0.89,
                "rgb": {
                  "blue": 176,
                  "green": 198,
                  "red": 231
                }
              }
            ]
          }
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "[MAMBA2] fix initialization for mamba2 by mayank31398 · Pull Request #739 · fla-org/flash-linear-..."
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 90.23,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 3.7,
                "rgb": {
                  "blue": 124,
                  "green": 124,
                  "red": 124
                }
              },
              {
                "percentage": 3.27,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 1.09,
                "rgb": {
                  "blue": 45,
                  "green": 70,
                  "red": 154
                }
              },
              {
                "percentage": 0.89,
                "rgb": {
                  "blue": 176,
                  "green": 198,
                  "red": 231
                }
              }
            ]
          }
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 90.23,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 3.7,
                "rgb": {
                  "blue": 124,
                  "green": 124,
                  "red": 124
                }
              },
              {
                "percentage": 3.27,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 1.09,
                "rgb": {
                  "blue": 45,
                  "green": 70,
                  "red": 154
                }
              },
              {
                "percentage": 0.89,
                "rgb": {
                  "blue": 176,
                  "green": 198,
                  "red": 231
                }
              }
            ]
          }
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/oahfxjIsKb"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2026747326837174273/SaG-UwOP?format=jpg&name=orig",
            "width": 1200
          }
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "iPhone",
          "version": "13"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/oahfxjIsKb",
    "user_refs_results": [
      {
        "rest_id": "13334762",
        "result": {
          "__typename": "User",
          "action_counts": {
            "favorites_count": 8632
          },
          "avatar": {
            "image_url": "https://pbs.twimg.com/profile_images/1633247750010830848/8zfRrYjA_normal.png"
          },
          "banner": {
            "image_url": "https://pbs.twimg.com/profile_banners/13334762/1765308302"
          },
          "core": {
            "created_at": "Mon Feb 11 04:41:50 +0000 2008",
            "name": "GitHub",
            "screen_name": "github"
          },
          "dm_permissions": {
            "can_dm": false
          },
          "exclusive_tweet_following": false,
          "follow_request_sent": false,
          "identity_profile_labels_highlighted_label": {},
          "location": {
            "location": "San Francisco, CA"
          },
          "media_permissions": {
            "can_media_tag": true
          },
          "notifications_settings": {
            "notifications_enabled": false
          },
          "pinned_items": {
            "tweet_ids_str": [
              "2019093909981257849"
            ]
          },
          "privacy": {
            "protected": false,
            "suspended": false
          },
          "private_super_following": false,
          "profile_bio": {
            "description": "The AI-powered developer platform to build, scale, and deliver secure software.",
            "entities": {
              "description": {
                "hashtags": [],
                "symbols": [],
                "urls": [],
                "user_mentions": []
              },
              "url": {
                "urls": [
                  {
                    "display_url": "github.com",
                    "expanded_url": "http://github.com",
                    "indices": [
                      0,
                      23
                    ],
                    "url": "https://t.co/bbJgfyzKzp"
                  }
                ]
              }
            }
          },
          "profile_image_shape": "Square",
          "profile_metadata": {
            "profile_interstitial_type": "",
            "profile_link_color": "981CEB"
          },
          "profile_translation": {
            "translator_type_enum": "None"
          },
          "properties": {
            "has_extended_profile": true
          },
          "relationship_counts": {
            "followers": 2616686,
            "following": 332
          },
          "relationship_perspectives": {
            "blocked_by": false,
            "blocking": false,
            "followed_by": false,
            "following": false,
            "live_following": false,
            "muting": false
          },
          "rest_id": "13334762",
          "smart_blocked_by": false,
          "smart_blocking": false,
          "super_follow_eligible": false,
          "super_followed_by": false,
          "super_following": false,
          "tweet_counts": {
            "media_tweets": 2908,
            "tweets": 10382
          },
          "verification": {
            "is_blue_verified": true,
            "verified": false,
            "verified_type": "Business"
          },
          "website": {
            "url": "https://t.co/bbJgfyzKzp"
          }
        }
      }
    ]
  },
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "github.com/fla-org/flash-…",
        "expanded_url": "https://github.com/fla-org/flash-linear-attention/pull/739",
        "indices": [
          337,
          360
        ],
        "url": "https://t.co/oahfxjIsKb"
      },
      {
        "display_url": "github.com/fla-org/flash-…",
        "expanded_url": "https://github.com/fla-org/flash-linear-attention/pull/753",
        "indices": [
          440,
          463
        ],
        "url": "https://t.co/hLC8nnQFc3"
      },
      {
        "display_url": "wandb.ai/mayank31398/ma…",
        "expanded_url": "https://wandb.ai/mayank31398/mamba-test?nw=nwusermayank31398&panelDisplayName=train/lm_loss&panelSectionName=train",
        "indices": [
          637,
          660
        ],
        "url": "https://t.co/n8iuUICRux"
      }
    ],
    "user_mentions": [
      {
        "id_str": "1452403988096188418",
        "indices": [
          680,
          690
        ],
        "name": "Kevin Li",
        "screen_name": "kevinyli_"
      },
      {
        "id_str": "1667598715",
        "indices": [
          692,
          706
        ],
        "name": "Bharat",
        "screen_name": "bharatrunwal2"
      },
      {
        "id_str": "769279457387540480",
        "indices": [
          708,
          717
        ],
        "name": "Han Guo",
        "screen_name": "HanGuo97"
      },
      {
        "id_str": "568879807",
        "indices": [
          719,
          727
        ],
        "name": "Tri Dao",
        "screen_name": "tri_dao"
      },
      {
        "id_str": "1076265378118959104",
        "indices": [
          732,
          742
        ],
        "name": "Albert Gu",
        "screen_name": "_albertgu"
      },
      {
        "id_str": "1345247812666081280",
        "indices": [
          761,
          774
        ],
        "name": "Songlin Yang",
        "screen_name": "SonglinYang4"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}