🐦 Twitter Post Details

Viewing enriched Twitter post

@dair_ai

Highly-recommended read from MIT on the part of RL with verifiable rewards that everyone keeps hitting. RLVR only optimizes what you can objectively score, so style, structure, and diversity quietly collapse and reward hacking creeps in. The fix here adds an adversarial discriminator trained on human demonstrations, which acts as a learned proxy for the human output distribution. The generator maximizes both task accuracy and the discriminator's human-likeness signal, so verifiable rewards and imitation of humans get optimized together. Why does it matter? Across bug fixing, story generation, and a reward-hacking benchmark, this preserves RLVR's accuracy gains while restoring the fuzzy properties it usually destroys. Bug fixes come out with much lower edit distance, stories score higher win rates and stay diverse, and misbehavior nearly disappears. Paper: https://t.co/kBZA66WGyC Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2073119635214602638/media_0.png",
      "filename": "media_0.png"
    }
  ],
  "processed_at": "2026-07-03T21:01:05.219837",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2073119635214602638",
  "url": "https://x.com/dair_ai/status/2073119635214602638",
  "twitterUrl": "https://twitter.com/dair_ai/status/2073119635214602638",
  "text": "Highly-recommended read from MIT on the part of RL with verifiable rewards that everyone keeps hitting.\n\nRLVR only optimizes what you can objectively score, so style, structure, and diversity quietly collapse and reward hacking creeps in. The fix here adds an adversarial discriminator trained on human demonstrations, which acts as a learned proxy for the human output distribution.\n\nThe generator maximizes both task accuracy and the discriminator's human-likeness signal, so verifiable rewards and imitation of humans get optimized together.\n\nWhy does it matter?\n\nAcross bug fixing, story generation, and a reward-hacking benchmark, this preserves RLVR's accuracy gains while restoring the fuzzy properties it usually destroys. Bug fixes come out with much lower edit distance, stories score higher win rates and stay diverse, and misbehavior nearly disappears.\n\nPaper: https://t.co/kBZA66WGyC\n\nLearn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c",
  "source": "Twitter for iPhone",
  "retweetCount": 3,
  "replyCount": 5,
  "likeCount": 14,
  "quoteCount": 0,
  "viewCount": 2957,
  "createdAt": "Fri Jul 03 19:00:12 +0000 2026",
  "lang": "en",
  "bookmarkCount": 19,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2073119635214602638",
  "displayTextRange": [
    0,
    271
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "dair_ai",
    "url": "https://x.com/dair_ai",
    "twitterUrl": "https://twitter.com/dair_ai",
    "id": "889050642903293953",
    "name": "DAIR.AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1643277398522187778/31dedbLo_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/889050642903293953/1773242460",
    "description": "",
    "location": "",
    "followers": 127319,
    "following": 1,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Sun Jul 23 09:12:45 +0000 2017",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 4719,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 269,
    "statusesCount": 3410,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2073065231790809214"
    ],
    "profile_bio": {
      "description": "Democratizing AI research, education, and technologies. Learn about AI Agents for FREE at https://t.co/HHXg8rryu4",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "academy.dair.ai/courses/elemen…",
              "expanded_url": "https://academy.dair.ai/courses/elements-of-ai-agents",
              "indices": [
                90,
                113
              ],
              "url": "https://t.co/HHXg8rryu4"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "dair.ai",
              "expanded_url": "https://www.dair.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/lkqPZtMU5s"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/R4J6HED6sf",
        "expanded_url": "https://twitter.com/dair_ai/status/2073119635214602638/photo/1",
        "ext_master_playlist_only": [],
        "ext_media_availability": {
          "status": "Available"
        },
        "ext_playlists": [],
        "features": {
          "large": {
            "faces": [
              {
                "h": 88,
                "w": 88,
                "x": 202,
                "y": 1345
              }
            ]
          },
          "orig": {
            "faces": [
              {
                "h": 88,
                "w": 88,
                "x": 202,
                "y": 1345
              }
            ]
          }
        },
        "id_str": "2073119631821422592",
        "indices": [
          272,
          295
        ],
        "media_key": "3_2073119631821422592",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARzFM1I/msAACgACHMUzUwnakY4AAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHMUzUj+awAAKAAIcxTNTCdqRjgAA",
            "media_key": "3_2073119631821422592"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HMUzUj-awAAu77F.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 796,
              "w": 1422,
              "x": 0,
              "y": 0
            },
            {
              "h": 1422,
              "w": 1422,
              "x": 0,
              "y": 0
            },
            {
              "h": 1621,
              "w": 1422,
              "x": 0,
              "y": 0
            },
            {
              "h": 1672,
              "w": 836,
              "x": 209,
              "y": 0
            },
            {
              "h": 1672,
              "w": 1422,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1672,
          "width": 1422
        },
        "sizes": {
          "large": {
            "h": 1672,
            "w": 1422
          }
        },
        "type": "photo",
        "url": "https://t.co/R4J6HED6sf"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "arxiv.org/abs/2607.01181",
        "expanded_url": "https://arxiv.org/abs/2607.01181",
        "indices": [
          873,
          896
        ],
        "url": "https://t.co/kBZA66WGyC"
      },
      {
        "display_url": "academy.dair.ai",
        "expanded_url": "https://academy.dair.ai/",
        "indices": [
          949,
          972
        ],
        "url": "https://t.co/LRnpZN7L4c"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}