🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Reasoning models now pass all three levels of the CFA exam. In 2023, ChatGPT (GPT-3.5-turbo) failed CFA Levels I and II. GPT-4 passed Level I but failed Level II. LLMs struggles with finance exams requiring numerical precision, qualitative analysis, and ethical judgment simultaneously. That ceiling has been shattered, which speaks to the potential of reasoning models. Researchers evaluated state-of-the-art reasoning models on 980 CFA mock exam questions across all three levels. The results: Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1 all pass every level. Gemini 3.0 Pro achieves 97.6% on Level I. GPT-5 leads Level II with 94.3%. On Level III constructed-response questions, Gemini 3.0 Pro scores 92.0%. The CFA exam tests an evolving hierarchy of skills. Level I covers foundational knowledge through multiple-choice questions. Level II tests the application through case-based vignettes. Level III requires complex synthesis and portfolio construction with both multiple-choice and constructed-response formats. Quantitative methods, previously a major weakness, now show near-zero error rates for top models. The persistent challenge is Ethics and Professional Standards, where even the best models show 17-21% error rates on Level II. An interesting pattern emerges with prompting. Chain-of-thought reasoning helps baseline models substantially but shows inconsistent effects on reasoning models for multiple-choice questions. However, CoT remains highly effective for constructed-response questions. Gemini 3.0 Pro jumps from 86.6% to 92.0% on CRQs with explicit reasoning prompts. Reasoning models now surpass the expertise required of entry-level to mid-level financial analysts. The question shifts from whether AI can pass professional exams to how these capabilities translate to real-world financial decision-making. Paper: https://t.co/wdwtefM3EN Learn to build effective AI Agents in our academy: https://t.co/JBU5beIoD0

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1999532247263903835/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1999532247263903835/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-12-12T18:15:18.966756",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1999532247263903835",
  "url": "https://x.com/omarsar0/status/1999532247263903835",
  "twitterUrl": "https://twitter.com/omarsar0/status/1999532247263903835",
  "text": "Reasoning models now pass all three levels of the CFA exam.\n\nIn 2023, ChatGPT (GPT-3.5-turbo) failed CFA Levels I and II. GPT-4 passed Level I but failed Level II. LLMs struggles with finance exams requiring numerical precision, qualitative analysis, and ethical judgment simultaneously.\n\nThat ceiling has been shattered, which speaks to the potential of reasoning models.\n\nResearchers evaluated state-of-the-art reasoning models on 980 CFA mock exam questions across all three levels. The results: Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1 all pass every level.\n\nGemini 3.0 Pro achieves 97.6% on Level I. GPT-5 leads Level II with 94.3%. On Level III constructed-response questions, Gemini 3.0 Pro scores 92.0%.\n\nThe CFA exam tests an evolving hierarchy of skills. Level I covers foundational knowledge through multiple-choice questions. Level II tests the application through case-based vignettes. Level III requires complex synthesis and portfolio construction with both multiple-choice and constructed-response formats.\n\nQuantitative methods, previously a major weakness, now show near-zero error rates for top models. The persistent challenge is Ethics and Professional Standards, where even the best models show 17-21% error rates on Level II.\n\nAn interesting pattern emerges with prompting. Chain-of-thought reasoning helps baseline models substantially but shows inconsistent effects on reasoning models for multiple-choice questions. However, CoT remains highly effective for constructed-response questions. Gemini 3.0 Pro jumps from 86.6% to 92.0% on CRQs with explicit reasoning prompts.\n\nReasoning models now surpass the expertise required of entry-level to mid-level financial analysts. The question shifts from whether AI can pass professional exams to how these capabilities translate to real-world financial decision-making.\n\nPaper: https://t.co/wdwtefM3EN\n\nLearn to build effective AI Agents in our academy: https://t.co/JBU5beIoD0",
  "source": "Twitter for iPhone",
  "retweetCount": 8,
  "replyCount": 2,
  "likeCount": 16,
  "quoteCount": 0,
  "viewCount": 1282,
  "createdAt": "Fri Dec 12 17:30:11 +0000 2025",
  "lang": "en",
  "bookmarkCount": 8,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1999532247263903835",
  "displayTextRange": [
    0,
    272
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "omarsar0",
    "url": "https://x.com/omarsar0",
    "twitterUrl": "https://twitter.com/omarsar0",
    "id": "3448284313",
    "name": "elvis",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "description": "Building @dair_ai • Ex Meta AI, Elastic, PhD • New cohort: https://t.co/xw2XQ0z8up",
    "location": "DAIR.AI Academy",
    "followers": 279022,
    "following": 733,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Fri Sep 04 12:59:26 +0000 2015",
    "entities": {
      "description": {
        "urls": [
          {
            "display_url": "dair-ai.thinkific.com/courses/buildi…",
            "expanded_url": "https://dair-ai.thinkific.com/courses/building-effective-ai-agents-2",
            "url": "https://t.co/xw2XQ0z8up",
            "indices": [
              59,
              82
            ]
          }
        ]
      },
      "url": {
        "urls": [
          {
            "display_url": "dair.ai",
            "expanded_url": "https://www.dair.ai/",
            "url": "https://t.co/XQto5ypkSM",
            "indices": [
              0,
              23
            ]
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 33907,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 4376,
    "statusesCount": 16730,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1999483394963701911"
    ],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.x.com/TWr5uYOL9L",
        "expanded_url": "https://x.com/omarsar0/status/1999532247263903835/photo/1",
        "id_str": "1999532243723911174",
        "indices": [
          273,
          296
        ],
        "media_key": "3_1999532243723911174",
        "media_url_https": "https://pbs.twimg.com/media/G7_D-1AagAYx7eZ.png",
        "type": "photo",
        "url": "https://t.co/TWr5uYOL9L",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "medium": {
            "faces": []
          },
          "small": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "sizes": {
          "large": {
            "h": 1760,
            "w": 1614,
            "resize": "fit"
          },
          "medium": {
            "h": 1200,
            "w": 1100,
            "resize": "fit"
          },
          "small": {
            "h": 680,
            "w": 624,
            "resize": "fit"
          },
          "thumb": {
            "h": 150,
            "w": 150,
            "resize": "crop"
          }
        },
        "original_info": {
          "height": 1760,
          "width": 1614,
          "focus_rects": [
            {
              "x": 0,
              "y": 0,
              "w": 1614,
              "h": 904
            },
            {
              "x": 0,
              "y": 0,
              "w": 1614,
              "h": 1614
            },
            {
              "x": 35,
              "y": 0,
              "w": 1544,
              "h": 1760
            },
            {
              "x": 367,
              "y": 0,
              "w": 880,
              "h": 1760
            },
            {
              "x": 0,
              "y": 0,
              "w": 1614,
              "h": 1760
            }
          ]
        },
        "media_results": {
          "result": {
            "media_key": "3_1999532243723911174"
          }
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "arxiv.org/abs/2512.08270",
        "expanded_url": "https://arxiv.org/abs/2512.08270",
        "url": "https://t.co/wdwtefM3EN",
        "indices": [
          1889,
          1912
        ]
      },
      {
        "display_url": "dair-ai.thinkific.com",
        "expanded_url": "https://dair-ai.thinkific.com/",
        "url": "https://t.co/JBU5beIoD0",
        "indices": [
          1965,
          1988
        ]
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}