🐦 Twitter Post Details

Viewing enriched Twitter post

@HelloSurgeAI

Deeper Instructions, Stronger Generalization: Training on ComplexConstraints Given the chance, a model will reward hack however it can: finding the laziest path that satisfies a grader, whether or not that path reflects what you actually wanted. If the grader can be satisfied by a surface trick, that trick is what the model learns. Most instruction-following benchmarks are full of surface tricks. "Stay under 300 words," "avoid commas", a model can satisfy those by scanning the output text, without understanding the task at all. ComplexConstraints, our frontier instruction-following benchmark, is built so there's no lazy path: its constraints fire only under certain conditions, depend on the outputs of earlier steps, require planning ahead, and are often left unstated. You can't satisfy "don't assign anyone with a religious dietary restriction to pork prep" by pattern-matching. You have to understand who's who and reason through many interdependent requirements at once. We post-trained Qwen3-4B on 1,000 of these tasks, using expert-written rubrics directly as the RL reward. The results: → +15.5pp on the held-out set, reaching parity with a model 60x larger → the gains transferred to two external benchmarks the model never trained on: +8.4pp on Meta's AdvancedIF and +10.1pp on MultiChallenge → the largest gains landed on multi-turn abilities, even though every training example was single-turn Think about that last result. When the only way to score is to actually track many interdependent requirements, the model learns that skill rather than a shortcut, and the skill is the same whether the requirements arrive in one complex prompt or accumulate over nine turns. So it showed up on tasks the model was never trained on. A reward signal is only as good as the thought behind it, and not all rubrics are created the same. Research Blog: https://t.co/bUJPcoNFrX Research Paper: https://t.co/zQxE0TN260

View on Twitter

📊 Media Metadata

{
  "score": 0.46,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.16000000000000003,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-07-01T19:00:50.087994",
  "import_source": "api_import",
  "source_tagged_at": "2026-07-01T19:00:50.088005",
  "enriched": true,
  "enriched_at": "2026-07-01T19:00:50.088006"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2072365105346675102",
  "url": "https://x.com/HelloSurgeAI/status/2072365105346675102",
  "twitterUrl": "https://twitter.com/HelloSurgeAI/status/2072365105346675102",
  "text": "Deeper Instructions, Stronger Generalization: Training on ComplexConstraints\n\nGiven the chance, a model will reward hack however it can: finding the laziest path that satisfies a grader, whether or not that path reflects what you actually wanted. If the grader can be satisfied by a surface trick, that trick is what the model learns.\n\nMost instruction-following benchmarks are full of surface tricks. \"Stay under 300 words,\" \"avoid commas\", a model can satisfy those by scanning the output text, without understanding the task at all. \n\nComplexConstraints, our frontier instruction-following benchmark, is built so there's no lazy path: its constraints fire only under certain conditions, depend on the outputs of earlier steps, require planning ahead, and are often left unstated. \n\nYou can't satisfy \"don't assign anyone with a religious dietary restriction to pork prep\" by pattern-matching. You have to understand who's who and reason through many interdependent requirements at once.\n\nWe post-trained Qwen3-4B on 1,000 of these tasks, using expert-written rubrics directly as the RL reward. The results:\n\n→ +15.5pp on the held-out set, reaching parity with a model 60x larger\n→ the gains transferred to two external benchmarks the model never trained on: +8.4pp on Meta's AdvancedIF and +10.1pp on MultiChallenge\n→ the largest gains landed on multi-turn abilities, even though every training example was single-turn\n\nThink about that last result. When the only way to score is to actually track many interdependent requirements, the model learns that skill rather than a shortcut, and the skill is the same whether the requirements arrive in one complex prompt or accumulate over nine turns. So it showed up on tasks the model was never trained on.\n\nA reward signal is only as good as the thought behind it, and not all rubrics are created the same.\n\nResearch Blog: https://t.co/bUJPcoNFrX\nResearch Paper: https://t.co/zQxE0TN260",
  "source": "Twitter for iPhone",
  "retweetCount": 1,
  "replyCount": 0,
  "likeCount": 6,
  "quoteCount": 0,
  "viewCount": 346,
  "createdAt": "Wed Jul 01 17:01:58 +0000 2026",
  "lang": "en",
  "bookmarkCount": 3,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2072365105346675102",
  "displayTextRange": [
    0,
    280
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "HelloSurgeAI",
    "url": "https://x.com/HelloSurgeAI",
    "twitterUrl": "https://twitter.com/HelloSurgeAI",
    "id": "1267866160894222343",
    "name": "Surge AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1992439362009645056/itZea2R1_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1267866160894222343/1763868703",
    "description": "Our mission is to raise AGI with the richness of humanity — curious, witty, imaginative, and full of breathtaking brilliance.",
    "location": "",
    "followers": 8469,
    "following": 141,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Tue Jun 02 17:10:41 +0000 2020",
    "entities": {
      "description": {},
      "url": {
        "urls": [
          {
            "display_url": "surgehq.ai",
            "expanded_url": "https://www.surgehq.ai",
            "indices": [
              0,
              23
            ],
            "url": "https://t.co/6bGF7OxrIX"
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 267,
    "hasCustomTimelines": false,
    "isTranslator": false,
    "mediaCount": 194,
    "statusesCount": 683,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1681343766123143168"
    ],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=800x419",
            "width": 800
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 150,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=280x150",
            "width": 151
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "description",
        "value": {
          "string_value": "We trained a 4B model on 1,000 expert-written rubrics from ComplexConstraints, our frontier instruction-following benchmark. It reached parity with a 60x larger model, and the gains transferred to...",
          "type": "STRING"
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "surgehq.ai",
          "type": "STRING"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 320,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=800x320_1",
            "width": 321
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=386x202",
            "width": 386
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=orig",
            "width": 1148
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=386x202",
            "width": 386
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=800x419",
            "width": 800
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 100,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=100x100",
            "width": 100
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=png&name=2048x2048_2_exp",
            "width": 1148
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=orig",
            "width": 1148
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "surgehq.ai",
          "type": "STRING"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=600x314",
            "width": 600
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 31.74,
                "rgb": {
                  "blue": 64,
                  "green": 170,
                  "red": 109
                }
              },
              {
                "percentage": 17.22,
                "rgb": {
                  "blue": 78,
                  "green": 133,
                  "red": 61
                }
              },
              {
                "percentage": 10.99,
                "rgb": {
                  "blue": 124,
                  "green": 53,
                  "red": 21
                }
              },
              {
                "percentage": 9.29,
                "rgb": {
                  "blue": 80,
                  "green": 205,
                  "red": 191
                }
              },
              {
                "percentage": 5.52,
                "rgb": {
                  "blue": 75,
                  "green": 45,
                  "red": 17
                }
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "Deeper Instructions Lead to Broader Generalization: Training on ComplexConstraints",
          "type": "STRING"
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 31.74,
                "rgb": {
                  "blue": 64,
                  "green": 170,
                  "red": 109
                }
              },
              {
                "percentage": 17.22,
                "rgb": {
                  "blue": 78,
                  "green": 133,
                  "red": 61
                }
              },
              {
                "percentage": 10.99,
                "rgb": {
                  "blue": 124,
                  "green": 53,
                  "red": 21
                }
              },
              {
                "percentage": 9.29,
                "rgb": {
                  "blue": 80,
                  "green": 205,
                  "red": 191
                }
              },
              {
                "percentage": 5.52,
                "rgb": {
                  "blue": 75,
                  "green": 45,
                  "red": 17
                }
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=png&name=2048x2048_2_exp",
            "width": 1148
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=600x314",
            "width": 600
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 31.74,
                "rgb": {
                  "blue": 64,
                  "green": 170,
                  "red": 109
                }
              },
              {
                "percentage": 17.22,
                "rgb": {
                  "blue": 78,
                  "green": 133,
                  "red": 61
                }
              },
              {
                "percentage": 10.99,
                "rgb": {
                  "blue": 124,
                  "green": 53,
                  "red": 21
                }
              },
              {
                "percentage": 9.29,
                "rgb": {
                  "blue": 80,
                  "green": 205,
                  "red": 191
                }
              },
              {
                "percentage": 5.52,
                "rgb": {
                  "blue": 75,
                  "green": 45,
                  "red": 17
                }
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=png&name=2048x2048_2_exp",
            "width": 1148
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/bUJPcoNFrX",
          "type": "STRING"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 1144,
            "url": "https://pbs.twimg.com/card_img/2071656654714617856/pWIlUmqR?format=jpg&name=orig",
            "width": 1148
          },
          "type": "IMAGE"
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "Swift",
          "version": "12"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/bUJPcoNFrX",
    "user_refs_results": []
  },
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "surgehq.ai/blog/training-…",
        "expanded_url": "https://surgehq.ai/blog/training-on-complexconstraints",
        "indices": [
          1872,
          1895
        ],
        "url": "https://t.co/bUJPcoNFrX"
      },
      {
        "display_url": "arxiv.org/abs/2606.09118",
        "expanded_url": "https://arxiv.org/abs/2606.09118",
        "indices": [
          1912,
          1935
        ],
        "url": "https://t.co/zQxE0TN260"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}