🐦 Twitter Post Details

Viewing enriched Twitter post

@joelniklaus

New blog post on harness optimization. We hit Sonnet 4.6 performance with a 7x cost improvement. Fable 5 was the first frontier model release that evaluated on legal tasks. It only scored 13%, the worst performance among all benchmarks evaluated. @Harvey released this benchmark called Legal Agent Benchmark (LAB) just a month prior. It contains a set of realistic legal matters. Each task gives the agent a closed workspace of documents (contracts, emails, spreadsheets, slide decks) and asks for a concrete deliverable: a diligence memo, an issue list, a redline, a draft. An LLM judge grades the deliverable against a long rubric containing 61 distinct binary criteria each on average. Many frontier models such as Gemini 3.1 Pro don't surpass 0% all-pass rate (all rubric criteria passed). With automatic harness optimization, we manage to push DeepSeek V4 Pro from 0% to 5% all-pass rate, achieving parity with Sonnet 4.6 for 1/7 of the price. Read the blog post for the details: https://t.co/kBrWrQkgJW

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2072333963440640155/media_0.mp4",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2072333963440640155/media_0.mp4",
      "type": "video",
      "filename": "media_0.mp4"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2072333963440640155/media_1.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2072333963440640155/media_1.jpg",
      "type": "photo",
      "filename": "media_1.jpg"
    }
  ],
  "processed_at": "2026-07-02T02:02:13.585409",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2072333963440640155",
  "url": "https://x.com/joelniklaus/status/2072333963440640155",
  "twitterUrl": "https://twitter.com/joelniklaus/status/2072333963440640155",
  "text": "New blog post on harness optimization. We hit Sonnet 4.6 performance with a 7x cost improvement.\n\nFable 5 was the first frontier model release that evaluated on legal tasks. It only scored 13%, the worst performance among all benchmarks evaluated.\n\n@Harvey released this benchmark called Legal Agent Benchmark (LAB) just a month prior. It contains a set of realistic legal matters. Each task gives the agent a closed workspace of documents (contracts, emails, spreadsheets, slide decks) and asks for a concrete deliverable: a diligence memo, an issue list, a redline, a draft. An LLM judge grades the deliverable against a long rubric containing 61 distinct binary criteria each on average.\n\nMany frontier models such as Gemini 3.1 Pro don't surpass 0% all-pass rate (all rubric criteria passed). With automatic harness optimization, we manage to push DeepSeek V4 Pro from 0% to 5% all-pass rate, achieving parity with Sonnet 4.6 for 1/7 of the price.\n\nRead the blog post for the details: https://t.co/kBrWrQkgJW",
  "source": "Twitter for iPhone",
  "retweetCount": 30,
  "replyCount": 10,
  "likeCount": 247,
  "quoteCount": 3,
  "viewCount": 22862,
  "createdAt": "Wed Jul 01 14:58:13 +0000 2026",
  "lang": "en",
  "bookmarkCount": 341,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2072333963440640155",
  "displayTextRange": [
    0,
    280
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "joelniklaus",
    "url": "https://x.com/joelniklaus",
    "twitterUrl": "https://twitter.com/joelniklaus",
    "id": "390741977",
    "name": "Joël Niklaus",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1468993727926620161/I1lBhFJM_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/390741977/1762851950",
    "description": "",
    "location": "Berne, Switzerland",
    "followers": 1711,
    "following": 440,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Oct 14 13:44:28 +0000 2011",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 924,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 255,
    "statusesCount": 809,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2030554880285585544"
    ],
    "profile_bio": {
      "description": "Data @huggingface",
      "entities": {
        "description": {
          "user_mentions": [
            {
              "id_str": "",
              "indices": [
                5,
                17
              ],
              "name": "",
              "screen_name": "huggingface"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "niklaus.ai",
              "expanded_url": "http://www.niklaus.ai",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/dQjHdo0BwF"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "additional_media_info": {
          "monetizable": false
        },
        "display_url": "pic.twitter.com/UTRP5qxrNS",
        "expanded_url": "https://twitter.com/joelniklaus/status/2072333963440640155/video/1",
        "ext_master_playlist_only": [],
        "ext_media_availability": {
          "status": "Available"
        },
        "ext_playlists": [],
        "id_str": "2072333911385116672",
        "indices": [
          281,
          304
        ],
        "media_key": "13_2072333911385116672",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARzCaLZ0GxAAAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABHMJotnQbEAAAAA==",
            "media_key": "13_2072333911385116672"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2072333911385116672/img/TkZr05HbCIDv2t6-.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 1070,
          "width": 1482
        },
        "sizes": {
          "large": {
            "h": 1070,
            "w": 1482
          }
        },
        "type": "video",
        "url": "https://t.co/UTRP5qxrNS",
        "video_info": {
          "aspect_ratio": [
            741,
            535
          ],
          "duration_millis": 18551,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video.twimg.com/amplify_video/2072333911385116672/pl/S-fNf0p37orPfKaw.m3u8?tag=14"
            },
            {
              "bitrate": 288000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2072333911385116672/vid/avc1/372x270/97OsJqUB2YjDQyOp.mp4?tag=14"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2072333911385116672/vid/avc1/498x360/fo4Rl-iMXfvXvKGM.mp4?tag=14"
            },
            {
              "bitrate": 2176000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2072333911385116672/vid/avc1/996x720/UySc9Vq3lOfPtTLL.mp4?tag=14"
            }
          ]
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [
      {
        "display_url": "huggingface.co/spaces/joelnik…",
        "expanded_url": "https://huggingface.co/spaces/joelniklaus/harness-optimization",
        "indices": [
          989,
          1012
        ],
        "url": "https://t.co/kBrWrQkgJW"
      }
    ],
    "user_mentions": [
      {
        "id_str": "1636132366753349632",
        "indices": [
          249,
          256
        ],
        "name": "Harvey",
        "screen_name": "Harvey"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}