🐦 Twitter Post Details

Viewing enriched Twitter post

@LiorOnAI

Robotics just proved it can scale like language models. SONIC trained a 42 million parameter model on 100 million frames of human motion and achieved 100% success transferring to real robots with zero fine-tuning. The breakthrough isn't the robot doing backflips. It's that someone finally found the "next token prediction" equivalent for physical movement. For years, training robots meant hand-crafting reward functions for every single skill. Want your robot to walk? Design rewards for balance, foot placement, energy efficiency. Want it to dance? Start over with entirely new rewards. This approach hits a wall because humans can't manually specify every nuance of natural movement. SONIC replaces this with motion tracking: the robot learns by watching 700 hours of motion capture data and trying to mimic it, frame by frame. The data itself becomes the reward function. Scale the data, scale the model, scale the compute, and performance improves predictably. Just like GPT. This unlocks something robotics has never had: a universal control interface. One policy handles: 1. VR teleoperation using head and hand tracking 2. Live webcam feeds converted to robot motion in real-time 3. Text commands like "walk sideways" or "dance like a monkey" 4. Music audio where the robot matches tempo and rhythm 5. Vision-language models for autonomous tasks (95% success rate) All inputs get encoded into the same token space, then decoded into motor commands. No retraining. No reward engineering. No manual retargeting between human and robot skeletons. If this holds, robotics just closed a 5-year gap with AI. Language models scaled by finding one task (predict the next word) that generalizes to everything. Vision models did the same with image classification. Robotics now has motion tracking. Expect the next wave of humanoid companies to train on billions of frames, not millions.

View on Twitter

📊 Media Metadata

{
  "score": 0.42,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.12,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-03-01T12:17:36.215311",
  "import_source": "api_import",
  "source_tagged_at": "2026-03-01T12:17:36.215333",
  "enriched": true,
  "enriched_at": "2026-03-01T12:17:36.215335"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2026397093355728907",
  "url": "https://x.com/LiorOnAI/status/2026397093355728907",
  "twitterUrl": "https://twitter.com/LiorOnAI/status/2026397093355728907",
  "text": "Robotics just proved it can scale like language models.\n\nSONIC trained a 42 million parameter model on 100 million frames of human motion and achieved 100% success transferring to real robots with zero fine-tuning.\n\nThe breakthrough isn't the robot doing backflips. It's that someone finally found the \"next token prediction\" equivalent for physical movement.\n\nFor years, training robots meant hand-crafting reward functions for every single skill. \n\nWant your robot to walk? Design rewards for balance, foot placement, energy efficiency. Want it to dance? Start over with entirely new rewards. \n\nThis approach hits a wall because humans can't manually specify every nuance of natural movement.\n\nSONIC replaces this with motion tracking: the robot learns by watching 700 hours of motion capture data and trying to mimic it, frame by frame. \n\nThe data itself becomes the reward function. Scale the data, scale the model, scale the compute, and performance improves predictably. Just like GPT.\n\nThis unlocks something robotics has never had: a universal control interface. \n\nOne policy handles:\n1. VR teleoperation using head and hand tracking\n2. Live webcam feeds converted to robot motion in real-time\n3. Text commands like \"walk sideways\" or \"dance like a monkey\"\n4. Music audio where the robot matches tempo and rhythm\n5. Vision-language models for autonomous tasks (95% success rate)\n\nAll inputs get encoded into the same token space, then decoded into motor commands. No retraining. No reward engineering. No manual retargeting between human and robot skeletons.\n\nIf this holds, robotics just closed a 5-year gap with AI. \n\nLanguage models scaled by finding one task (predict the next word) that generalizes to everything. Vision models did the same with image classification. \n\nRobotics now has motion tracking. Expect the next wave of humanoid companies to train on billions of frames, not millions.",
  "source": "Twitter for iPhone",
  "retweetCount": 32,
  "replyCount": 15,
  "likeCount": 219,
  "quoteCount": 2,
  "viewCount": 31128,
  "createdAt": "Tue Feb 24 20:41:30 +0000 2026",
  "lang": "en",
  "bookmarkCount": 132,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2026397093355728907",
  "displayTextRange": [
    0,
    275
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "LiorOnAI",
    "url": "https://x.com/LiorOnAI",
    "twitterUrl": "https://twitter.com/LiorOnAI",
    "id": "931470139",
    "name": "Lior Alexander",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/2027106343283527680/lh729xEs_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/931470139/1761077189",
    "description": "",
    "location": "",
    "followers": 112932,
    "following": 2153,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Nov 07 07:19:36 +0000 2012",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 6770,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 661,
    "statusesCount": 3756,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {
      "description": "Covering the latest news for AI devs • Founder @AlphaSignalAI (270k users) •  ML Eng since 2017 • Ex-Mila • MIT",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                47,
                61
              ],
              "name": "",
              "screen_name": "AlphaSignalAI"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "alphasignal.ai",
              "expanded_url": "https://alphasignal.ai",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/AyubevadmD"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": []
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "2026350142652383587",
    "url": "https://x.com/DrJimFan/status/2026350142652383587",
    "twitterUrl": "https://twitter.com/DrJimFan/status/2026350142652383587",
    "text": "What can half of GPT-1 do? We trained a 42M transformer called SONIC to control the body of a humanoid robot. It takes a remarkable amount of subconscious processing for us humans to squat, turn, crawl, sprint. SONIC captures this \"System 1\" - the fast, reactive whole-body intelligence - in a single model that translates any motion command into stable, natural motor signals. And it's all open-source!! \n\nThe key insight: motion tracking is the one, true scalable task for whole body control. Instead of hand-engineering rewards for every new skill, we use dense, frame-by-frame supervision from human mocap data. The data itself encodes the reward function: \"configure your limbs in any human-like position while maintaining balance\".\n\nWe scaled humanoid motion RL to an unprecedented scale: 100M+ mocap frames and 500,000+ parallel robots across 128 GPUs. NVIDIA Isaac Lab allows us to accelerate physics at 10,000x faster tick, giving robots many years of virtual experience in only hours of wall clock time. After 3 days of training, the neural net transfers zero-shot to the real G1 robot with no finetuning. 100% success rate across 50 diverse real-world motion sequences. \n\nOne SONIC policy supports all of the following:\n\n- VR whole-body teleoperation\n- Human video. Just point a webcam to live stream motions.\n- Text prompts. \"Walk sideways\", \"dance like a monkey\", \"kick your left foot\", etc.\n- Music audio. The robot dances to the beat, adapting to tempo and rhythm.\n- VLA foundation models. We plugged in GR00T N1.5 and achieved 95% success on mobile tasks.\n\nWe open-source the code and model checkpoints!! Deep dive in thread:",
    "source": "Twitter for iPhone",
    "retweetCount": 223,
    "replyCount": 78,
    "likeCount": 1511,
    "quoteCount": 42,
    "viewCount": 211749,
    "createdAt": "Tue Feb 24 17:34:56 +0000 2026",
    "lang": "en",
    "bookmarkCount": 612,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2026350142652383587",
    "displayTextRange": [
      0,
      273
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "DrJimFan",
      "url": "https://x.com/DrJimFan",
      "twitterUrl": "https://twitter.com/DrJimFan",
      "id": "1007413134",
      "name": "Jim Fan",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1554922493101559808/SYSZhbcd_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/1007413134/1672408318",
      "description": "",
      "location": "Views my own. Contact →",
      "followers": 365367,
      "following": 3110,
      "status": "",
      "canDm": true,
      "canMediaTag": true,
      "createdAt": "Wed Dec 12 22:11:27 +0000 2012",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 8706,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 846,
      "statusesCount": 4105,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "2018754323141054786"
      ],
      "profile_bio": {
        "description": "NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.",
        "entities": {
          "description": {
            "hashtags": [],
            "symbols": [],
            "urls": [],
            "user_mentions": []
          },
          "url": {
            "urls": [
              {
                "display_url": "jimfan.me",
                "expanded_url": "https://jimfan.me",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/H4rXo4Ei8X"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "additional_media_info": {
            "monetizable": false
          },
          "allow_download_status": {
            "allow_download": true
          },
          "display_url": "pic.twitter.com/bav2eIzVWn",
          "expanded_url": "https://twitter.com/DrJimFan/status/2026350142652383587/video/1",
          "ext_media_availability": {
            "status": "Available"
          },
          "id_str": "2026342653055610880",
          "indices": [
            274,
            297
          ],
          "media_key": "13_2026342653055610880",
          "media_results": {
            "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARwfA+gsmjAAAAA=",
            "result": {
              "__typename": "ApiMedia",
              "id": "QXBpTWVkaWE6DAAECgABHB8D6CyaMAAAAA==",
              "media_key": "13_2026342653055610880"
            }
          },
          "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2026342653055610880/img/fjNGTzUDEMAW2-Da.jpg",
          "original_info": {
            "focus_rects": [],
            "height": 2160,
            "width": 3840
          },
          "sizes": {
            "large": {
              "h": 1152,
              "w": 2048
            }
          },
          "type": "video",
          "url": "https://t.co/bav2eIzVWn",
          "video_info": {
            "aspect_ratio": [
              16,
              9
            ],
            "duration_millis": 186786,
            "variants": [
              {
                "content_type": "application/x-mpegURL",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/pl/mBvLYqeGZNd87Amk.m3u8?tag=21"
              },
              {
                "bitrate": 256000,
                "content_type": "video/mp4",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/vid/avc1/480x270/1cOK9WnkW83mCc1A.mp4?tag=21"
              },
              {
                "bitrate": 832000,
                "content_type": "video/mp4",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/vid/avc1/640x360/crYgw66eZ9fo2ZQT.mp4?tag=21"
              },
              {
                "bitrate": 2176000,
                "content_type": "video/mp4",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/vid/avc1/1280x720/AMDViftPWlW-BXzI.mp4?tag=21"
              },
              {
                "bitrate": 10368000,
                "content_type": "video/mp4",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/vid/avc1/1920x1080/MTf2uLIAzvMNxP4z.mp4?tag=21"
              },
              {
                "bitrate": 25128000,
                "content_type": "video/mp4",
                "url": "https://video.twimg.com/amplify_video/2026342653055610880/vid/avc1/3840x2160/cSt8SJsDOLpyce1n.mp4?tag=21"
              }
            ]
          }
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "timestamps": [],
      "urls": [],
      "user_mentions": []
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "article": null
  },
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}