🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Universal Reasoning Model Universal Transformers crush standard Transformers on reasoning tasks. But why? Prior work attributed the gains to elaborate architectural innovations like hierarchical designs and complex gating mechanisms. But these researchers found a simpler explanation. This new research demonstrates that the performance gains on ARC-AGI come primarily from two often-overlooked factors: recurrent inductive bias and strong nonlinearity. Applying a single transformation repeatedly works far better than stacking distinct layers for reasoning tasks. With only 4x parameters, a Universal Transformer achieves 40% pass@1 on ARC-AGI 1. Vanilla Transformers with 32x parameters score just 23.75%. Simply scaling depth or width in standard Transformers yields diminishing returns and can even degrade performance. They introduce the Universal Reasoning Model (URM), which enhances this with two techniques. First, ConvSwiGLU adds a depthwise short convolution after the MLP expansion, injecting local token mixing into the nonlinear pathway. Second, Truncated Backpropagation Through Loops skips gradient computation for early recurrent iterations, stabilizing optimization. Results: 53.8% pass@1 on ARC-AGI 1, up from 40% (TRM) and 34.4% (HRM). On ARC-AGI 2, URM reaches 16% pass@1, nearly tripling HRM and more than doubling TRM. Sudoku accuracy hits 77.6%. Ablations: - Removing short convolution drops pass@1 from 53.8% to 45.3%. Removing truncated backpropagation drops it to 40%. - Replacing SwiGLU with simpler activations like ReLU tanks performance to 28.6%. - Removing attention softmax entirely collapses accuracy to 2%. The recurrent structure converts compute into effective depth. Standard Transformers spend FLOPs on redundant refinement in higher layers. Recurrent computation concentrates the same budget on iterative reasoning. Complex reasoning benefits more from iterative computation than from scale. Small models with recurrent structure outperform large static models on tasks requiring multi-step abstraction.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2005640015964250267/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2005640015964250267/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-12-31T02:48:04.153239",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2005640015964250267",
  "url": "https://x.com/omarsar0/status/2005640015964250267",
  "twitterUrl": "https://twitter.com/omarsar0/status/2005640015964250267",
  "text": "Universal Reasoning Model\n\nUniversal Transformers crush standard Transformers on reasoning tasks.\n\nBut why?\n\nPrior work attributed the gains to elaborate architectural innovations like hierarchical designs and complex gating mechanisms.\n\nBut these researchers found a simpler explanation.\n\nThis new research demonstrates that the performance gains on ARC-AGI come primarily from two often-overlooked factors: recurrent inductive bias and strong nonlinearity.\n\nApplying a single transformation repeatedly works far better than stacking distinct layers for reasoning tasks.\n\nWith only 4x parameters, a Universal Transformer achieves 40% pass@1 on ARC-AGI 1. Vanilla Transformers with 32x parameters score just 23.75%. Simply scaling depth or width in standard Transformers yields diminishing returns and can even degrade performance.\n\nThey introduce the Universal Reasoning Model (URM), which enhances this with two techniques. First, ConvSwiGLU adds a depthwise short convolution after the MLP expansion, injecting local token mixing into the nonlinear pathway. Second, Truncated Backpropagation Through Loops skips gradient computation for early recurrent iterations, stabilizing optimization.\n\nResults: 53.8% pass@1 on ARC-AGI 1, up from 40% (TRM) and 34.4% (HRM). On ARC-AGI 2, URM reaches 16% pass@1, nearly tripling HRM and more than doubling TRM. Sudoku accuracy hits 77.6%.\n\nAblations:\n\n- Removing short convolution drops pass@1 from 53.8% to 45.3%. Removing truncated backpropagation drops it to 40%.\n- Replacing SwiGLU with simpler activations like ReLU tanks performance to 28.6%.\n- Removing attention softmax entirely collapses accuracy to 2%.\n\nThe recurrent structure converts compute into effective depth. Standard Transformers spend FLOPs on redundant refinement in higher layers. Recurrent computation concentrates the same budget on iterative reasoning.\n\nComplex reasoning benefits more from iterative computation than from scale. Small models with recurrent structure outperform large static models on tasks requiring multi-step abstraction.",
  "source": "Twitter for iPhone",
  "retweetCount": 92,
  "replyCount": 34,
  "likeCount": 541,
  "quoteCount": 3,
  "viewCount": 52427,
  "createdAt": "Mon Dec 29 14:00:17 +0000 2025",
  "lang": "en",
  "bookmarkCount": 537,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2005640015964250267",
  "displayTextRange": [
    0,
    300
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "omarsar0",
    "url": "https://x.com/omarsar0",
    "twitterUrl": "https://twitter.com/omarsar0",
    "id": "3448284313",
    "name": "elvis",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "description": "",
    "location": "DAIR.AI Academy",
    "followers": 282031,
    "following": 752,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Sep 04 12:59:26 +0000 2015",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 34254,
    "hasCustomTimelines": true,
    "isTranslator": true,
    "mediaCount": 4420,
    "statusesCount": 16895,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2006004138220605920"
    ],
    "profile_bio": {
      "description": "Building @dair_ai • Prev: Meta AI, Elastic, PhD • New cohort: https://t.co/GZMhf39NRs",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "dair-ai.thinkific.com/courses/claude…",
              "expanded_url": "https://dair-ai.thinkific.com/courses/claude-code-for-everyone-2",
              "indices": [
                62,
                85
              ],
              "url": "https://t.co/GZMhf39NRs"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                9,
                17
              ],
              "name": "",
              "screen_name": "dair_ai"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair.ai",
              "expanded_url": "https://www.dair.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/XQto5ypkSM"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/j5SAyq90Mu",
        "expanded_url": "https://twitter.com/omarsar0/status/2005640015964250267/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "2005640012713619456",
        "indices": [
          301,
          324
        ],
        "media_key": "3_2005640012713619456",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARvVdvcZGvAACgACG9V299rboJsAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG9V29xka8AAKAAIb1Xb32tugmwAA",
            "media_key": "3_2005640012713619456"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G9V29xka8AAyPaI.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 899,
              "w": 1606,
              "x": 0,
              "y": 0
            },
            {
              "h": 1606,
              "w": 1606,
              "x": 0,
              "y": 0
            },
            {
              "h": 1790,
              "w": 1570,
              "x": 36,
              "y": 0
            },
            {
              "h": 1790,
              "w": 895,
              "x": 492,
              "y": 0
            },
            {
              "h": 1790,
              "w": 1606,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1790,
          "width": 1606
        },
        "sizes": {
          "large": {
            "h": 1790,
            "w": 1606
          }
        },
        "type": "photo",
        "url": "https://t.co/j5SAyq90Mu"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {},
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}