🐦 Twitter Post Details

Viewing enriched Twitter post

@LiorOnAI

Can agents replace software engineers? Not according to this new benchmark. Mercor and Cognition released APEX-SWE. It tests AI coding agents on real engineering work. > GPT-5.3 Codex leads at 41.5%. > Claude Opus 4.6 follows at 40.5%. Nothing crosses the 50% mark. Why? Old benchmarks are basically solved: HumanEval scores jumped from 67% to 90% in two years. OpenAI flagged SWE-bench as contaminated. Models were memorizing the answers. Those benchmarks never reflected the job in the first place. Those tests only measured code writing. Developers spend 16% of their time on that. The other 84% is debugging, infrastructure, and integration. This benchmark tests the 84%. 200 tasks split into two types: 1. Integration: build systems across live databases, APIs, and cloud services in Docker containers 2. Observability: find and fix real bugs using logs, dashboards, and chat history Each task drops an agent into a live environment. Real services, real credentials, and project boards with filler issues mixed in. 50 tasks are open-source on Hugging Face. The eval harness is on GitHub. You can run it yourself. AI writes half the code at big companies. 90% of developers use AI assistants. All of that covers 16% of the job.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2036957286053392841/media_0.jpg",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2026-03-26T00:16:54.732796",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2036957286053392841",
  "url": "https://x.com/LiorOnAI/status/2036957286053392841",
  "twitterUrl": "https://twitter.com/LiorOnAI/status/2036957286053392841",
  "text": "Can agents replace software engineers? Not according to this new benchmark.\n\nMercor and Cognition released APEX-SWE. It tests AI coding agents on real engineering work. \n\n> GPT-5.3 Codex leads at 41.5%. \n> Claude Opus 4.6 follows at 40.5%. \n\nNothing crosses the 50% mark.\n\nWhy? \n\nOld benchmarks are basically solved: HumanEval scores jumped from 67% to 90% in two years.\n\nOpenAI flagged SWE-bench as contaminated. Models were memorizing the answers. Those benchmarks never reflected the job in the first place.\n\nThose tests only measured code writing. Developers spend 16% of their time on that. The other 84% is debugging, infrastructure, and integration.\n\nThis benchmark tests the 84%. 200 tasks split into two types:\n\n1. Integration: build systems across live databases, APIs, and cloud services in Docker containers\n\n2. Observability: find and fix real bugs using logs, dashboards, and chat history\n\nEach task drops an agent into a live environment. Real services, real credentials, and project boards with filler issues mixed in.\n\n50 tasks are open-source on Hugging Face. The eval harness is on GitHub. You can run it yourself.\n\nAI writes half the code at big companies. 90% of developers use AI assistants. \n\nAll of that covers 16% of the job.",
  "source": "Twitter for iPhone",
  "retweetCount": 1,
  "replyCount": 1,
  "likeCount": 6,
  "quoteCount": 0,
  "viewCount": 358,
  "createdAt": "Thu Mar 26 00:03:56 +0000 2026",
  "lang": "en",
  "bookmarkCount": 1,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2036957286053392841",
  "displayTextRange": [
    0,
    283
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "LiorOnAI",
    "url": "https://x.com/LiorOnAI",
    "twitterUrl": "https://twitter.com/LiorOnAI",
    "id": "931470139",
    "name": "Lior Alexander",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/2032256308196564993/ozddLZ2O_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/931470139/1761077189",
    "description": "",
    "location": "San Francisco, CA",
    "followers": 114138,
    "following": 2229,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Nov 07 07:19:36 +0000 2012",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 6824,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 663,
    "statusesCount": 3787,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {
      "description": "Building the Bloomberg of AI @AlphaSignalAI (280K subs) • MIT lecturer • MILA researcher • 9 yrs in ML",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                29,
                43
              ],
              "name": "",
              "screen_name": "AlphaSignalAI"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "alphasignal.ai",
              "expanded_url": "https://alphasignal.ai",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/AyubevaLcb"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/PVQIHN2Ozi",
        "expanded_url": "https://twitter.com/LiorOnAI/status/2036957286053392841/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "id_str": "2036946983307575296",
        "indices": [
          284,
          307
        ],
        "media_key": "3_2036946983307575296",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARxEsH02WgAACgACHES53AHbMckAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHESwfTZaAAAKAAIcRLncAdsxyQAA",
            "media_key": "3_2036946983307575296"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HESwfTZaAAAq5lv.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 815,
              "w": 1456,
              "x": 0,
              "y": 0
            },
            {
              "h": 918,
              "w": 918,
              "x": 0,
              "y": 0
            },
            {
              "h": 918,
              "w": 805,
              "x": 0,
              "y": 0
            },
            {
              "h": 918,
              "w": 459,
              "x": 0,
              "y": 0
            },
            {
              "h": 918,
              "w": 1456,
              "x": 0,
              "y": 0
            }
          ],
          "height": 918,
          "width": 1456
        },
        "sizes": {
          "large": {
            "h": 918,
            "w": 1456
          }
        },
        "type": "photo",
        "url": "https://t.co/PVQIHN2Ozi"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": []
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "2036495390942568587",
    "url": "https://x.com/mercor_ai/status/2036495390942568587",
    "twitterUrl": "https://twitter.com/mercor_ai/status/2036495390942568587",
    "text": "Introducing APEX-SWE, in collaboration with @Cognition. They see firsthand that real software engineering is not just writing code anymore. It's deploying systems, integrating with tools and debugging when things break.\n\nOn APEX-SWE, every model fails to reliably solve the real production software engineering tasks.\n\n@OpenAI GPT-5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1, followed by @AnthropicAI Opus 4.6 (High) at 40.5%.\n\nEvery frontier model fails on nearly 60% of real production tasks.",
    "source": "Twitter for iPhone",
    "retweetCount": 4,
    "replyCount": 1,
    "likeCount": 29,
    "quoteCount": 1,
    "viewCount": 3167,
    "createdAt": "Tue Mar 24 17:28:32 +0000 2026",
    "lang": "en",
    "bookmarkCount": 12,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2036495390942568587",
    "displayTextRange": [
      0,
      278
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "mercor_ai",
      "url": "https://x.com/mercor_ai",
      "twitterUrl": "https://twitter.com/mercor_ai",
      "id": "1382389316245069826",
      "name": "Mercor",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": "Business",
      "profilePicture": "https://pbs.twimg.com/profile_images/1812325568420986880/l-JJdPAa_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/1382389316245069826/1773372945",
      "description": "",
      "location": "San Francisco",
      "followers": 16909,
      "following": 26,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Wed Apr 14 17:44:51 +0000 2021",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 565,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 65,
      "statusesCount": 224,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "2036495390942568587"
      ],
      "profile_bio": {
        "description": "We are defining the future of work. We connect human expertise with leading AI labs and enterprises to train frontier models.",
        "entities": {
          "description": {
            "hashtags": [],
            "symbols": [],
            "urls": [],
            "user_mentions": []
          },
          "url": {
            "urls": [
              {
                "display_url": "mercor.com/apex/",
                "expanded_url": "https://www.mercor.com/apex/",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/JU07IwL0fQ"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "allow_download_status": {
            "allow_download": true
          },
          "display_url": "pic.twitter.com/T9lWW7WvnE",
          "expanded_url": "https://twitter.com/mercor_ai/status/2036495390942568587/photo/1",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": []
            },
            "orig": {
              "faces": []
            }
          },
          "id_str": "2036463662710579200",
          "indices": [
            279,
            302
          ],
          "media_key": "3_2036463662710579200",
          "media_results": {
            "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARxC+Old2uAACgACHEMVxKvaIIsAAA==",
            "result": {
              "__typename": "ApiMedia",
              "id": "QXBpTWVkaWE6DAABCgABHEL46V3a4AAKAAIcQxXEq9ogiwAA",
              "media_key": "3_2036463662710579200"
            }
          },
          "media_url_https": "https://pbs.twimg.com/media/HEL46V3a4AAQxTc.jpg",
          "original_info": {
            "focus_rects": [
              {
                "h": 2150,
                "w": 3840,
                "x": 0,
                "y": 435
              },
              {
                "h": 3840,
                "w": 3840,
                "x": 0,
                "y": 0
              },
              {
                "h": 4028,
                "w": 3533,
                "x": 0,
                "y": 0
              },
              {
                "h": 4028,
                "w": 2014,
                "x": 100,
                "y": 0
              },
              {
                "h": 4028,
                "w": 3840,
                "x": 0,
                "y": 0
              }
            ],
            "height": 4028,
            "width": 3840
          },
          "sizes": {
            "large": {
              "h": 2048,
              "w": 1952
            }
          },
          "type": "photo",
          "url": "https://t.co/T9lWW7WvnE"
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [],
      "user_mentions": [
        {
          "id_str": "1751810001751183361",
          "indices": [
            44,
            54
          ],
          "name": "Cognition",
          "screen_name": "Cognition"
        },
        {
          "id_str": "4398626122",
          "indices": [
            319,
            326
          ],
          "name": "OpenAI",
          "screen_name": "OpenAI"
        },
        {
          "id_str": "1353836358901501952",
          "indices": [
            401,
            413
          ],
          "name": "Anthropic",
          "screen_name": "AnthropicAI"
        }
      ]
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "article": null
  },
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}