🐦 Twitter Post Details

Viewing enriched Twitter post

@realDanFu

📢 Super excited to announce Parcae! We've been thinking about scaling laws and the "right" way to get more FLOPs. Turns out layer looping - with the right parameterization - gives you a new axis to scale! Parcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models. But - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity. The main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability. Our key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality. It's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data. Super excited to see how these ideas hold, and what we can do with looped models! Check out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!

View on Twitter

📊 Media Metadata

{
  "score": 0.42,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.12,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-04-16T18:22:36.096714",
  "import_source": "api_import",
  "source_tagged_at": "2026-04-16T18:22:36.096724",
  "enriched": true,
  "enriched_at": "2026-04-16T18:22:36.096726"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2044459930149941304",
  "url": "https://x.com/realDanFu/status/2044459930149941304",
  "twitterUrl": "https://twitter.com/realDanFu/status/2044459930149941304",
  "text": "📢 Super excited to announce Parcae! We've been thinking about scaling laws and the \"right\" way to get more FLOPs.\n\nTurns out layer looping - with the right parameterization - gives you a new axis to scale!\n\nParcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models.\n\nBut - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity.\n\nThe main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability.\n\nOur key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality.\n\nIt's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data.\n\nSuper excited to see how these ideas hold, and what we can do with looped models!\n\nCheck out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!",
  "source": "Twitter for iPhone",
  "retweetCount": 21,
  "replyCount": 0,
  "likeCount": 100,
  "quoteCount": 4,
  "viewCount": 14192,
  "createdAt": "Wed Apr 15 16:56:46 +0000 2026",
  "lang": "en",
  "bookmarkCount": 55,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2044459930149941304",
  "displayTextRange": [
    0,
    272
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "realDanFu",
    "url": "https://x.com/realDanFu",
    "twitterUrl": "https://twitter.com/realDanFu",
    "id": "1173687463790829568",
    "name": "Dan Fu",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1905000576376844288/g3ZPKigb_normal.jpg",
    "coverPicture": "",
    "description": "",
    "location": "",
    "followers": 7550,
    "following": 237,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Mon Sep 16 19:58:03 +0000 2019",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1613,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 190,
    "statusesCount": 870,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1825610767657525372"
    ],
    "profile_bio": {
      "description": "VP, Kernels @togethercompute\nAssistant Professor @ucsd_cse\n\nLooking for talented kernel engineers and performance engineers!",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                12,
                28
              ],
              "name": "",
              "screen_name": "togethercompute"
            },
            {
              "id_str": "0",
              "indices": [
                49,
                58
              ],
              "name": "",
              "screen_name": "ucsd_cse"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "danfu.org",
              "expanded_url": "http://danfu.org",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/pGWFSyjfng"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": [
      {
        "id_str": "1729651485959454720",
        "indices": [
          1480,
          1495
        ],
        "name": "Hayden Prairie",
        "screen_name": "hayden_prairie"
      },
      {
        "id_str": "1534894045805281280",
        "indices": [
          1584,
          1595
        ],
        "name": "Zachary Novack",
        "screen_name": "zacknovack"
      },
      {
        "id_str": "940797358295666688",
        "indices": [
          1600,
          1616
        ],
        "name": "Taylor Berg-Kirkpatrick",
        "screen_name": "BergKirkpatrick"
      },
      {
        "id_str": "1592266692528197632",
        "indices": [
          1643,
          1659
        ],
        "name": "Together AI",
        "screen_name": "togethercompute"
      },
      {
        "id_str": "889527316237369344",
        "indices": [
          1674,
          1683
        ],
        "name": "UCSD CSE",
        "screen_name": "ucsd_cse"
      }
    ]
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "2044453231913537927",
    "url": "https://x.com/hayden_prairie/status/2044453231913537927",
    "twitterUrl": "https://twitter.com/hayden_prairie/status/2044453231913537927",
    "text": "We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters.\n\nTurns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size.\n\nOur scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem!\n\n🧵👇",
    "source": "Twitter for iPhone",
    "retweetCount": 114,
    "replyCount": 27,
    "likeCount": 861,
    "quoteCount": 20,
    "viewCount": 98676,
    "createdAt": "Wed Apr 15 16:30:09 +0000 2026",
    "lang": "en",
    "bookmarkCount": 710,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2044453231913537927",
    "displayTextRange": [
      0,
      273
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "hayden_prairie",
      "url": "https://x.com/hayden_prairie",
      "twitterUrl": "https://twitter.com/hayden_prairie",
      "id": "1729651485959454720",
      "name": "Hayden Prairie",
      "isVerified": false,
      "isBlueVerified": false,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1766700845721145344/Arxoiq-S_normal.jpg",
      "coverPicture": "",
      "description": "",
      "location": "San Diego, CA",
      "followers": 418,
      "following": 101,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Wed Nov 29 00:00:56 +0000 2023",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 120,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 4,
      "statusesCount": 23,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [],
      "profile_bio": {
        "description": "CSE PhD @ UCSD advised by Dan Fu and Taylor Berg-Kirkpatrick | Kernels Research Intern @togethercompute | ML and Systems | 140M parameter model enjoyer",
        "entities": {
          "description": {
            "hashtags": [],
            "symbols": [],
            "urls": [],
            "user_mentions": [
              {
                "id_str": "0",
                "indices": [
                  87,
                  103
                ],
                "name": "",
                "screen_name": "togethercompute"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "allow_download_status": {
            "allow_download": true
          },
          "display_url": "pic.twitter.com/1lOjmy6IWx",
          "expanded_url": "https://twitter.com/hayden_prairie/status/2044453231913537927/photo/1",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": []
            },
            "orig": {
              "faces": []
            }
          },
          "id_str": "2044447978900140032",
          "indices": [
            274,
            297
          ],
          "media_key": "3_2044447978900140032",
          "media_results": {
            "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARxfVprYG7AACgACHF9bYegbEYcAAA==",
            "result": {
              "__typename": "ApiMedia",
              "id": "QXBpTWVkaWE6DAABCgABHF9WmtgbsAAKAAIcX1th6BsRhwAA",
              "media_key": "3_2044447978900140032"
            }
          },
          "media_url_https": "https://pbs.twimg.com/media/HF9WmtgbsAA2Vxe.jpg",
          "original_info": {
            "focus_rects": [
              {
                "h": 468,
                "w": 836,
                "x": 0,
                "y": 0
              },
              {
                "h": 468,
                "w": 468,
                "x": 0,
                "y": 0
              },
              {
                "h": 468,
                "w": 411,
                "x": 26,
                "y": 0
              },
              {
                "h": 468,
                "w": 234,
                "x": 114,
                "y": 0
              },
              {
                "h": 468,
                "w": 1859,
                "x": 0,
                "y": 0
              }
            ],
            "height": 468,
            "width": 1859
          },
          "sizes": {
            "large": {
              "h": 468,
              "w": 1859
            }
          },
          "type": "photo",
          "url": "https://t.co/1lOjmy6IWx"
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [],
      "user_mentions": []
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "communityInfo": null,
    "article": null
  },
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}