🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Improving Mathematical Reasoning in Open LLMs One of my favorite papers this past week was the work of Shao et al. who introduced a 7B model called DeepSeekMath for improving mathematical reasoning of LLMs. TLDR: - continues pretraining a code base model with 120B math-related tokens - introduces GRPO (a variant of PPO) to enhance mathematical reasoning and reduce training resources via a memory usage optimization scheme - DeepSeekMath 7B achieves 51.7% on MATH which approaches the performance level of Gemini-Ultra (53.2%) and GPT-4 (52.9%) -when self-consistency is used the performance improves to 60.9%. What's outstanding about this specialized model is that it can reach a performance that's competitive with general-purpose models (Gemini Ultra and GPT-4) and even large specialized models (Minerva 540B). It focuses on pre-training with math information at scale (120B tokens), so there is an element of optimizing the data selection pipeline that leads to huge gains. Code training is also key to enhancing mathematical reasoning capabilities. Another aspect of this paper that I like is the introduction of an efficient PPO variant that hints at more performant and efficient ways to leverage reinforcement learning for LLMs. Lots of discussions, analysis, and exploration about RL in the paper.

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with LLMs, RAG, and AI Agents @dair_ai • Prev: Meta AI, Galactica LLM, PapersWithCode, PhD • Creator of the Prompting Guide (~3M learners)",
    "fast_followers_count": 0,
    "favourites_count": 24509,
    "followers_count": 181112,
    "friends_count": 461,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3251,
    "location": "",
    "media_count": 1926,
    "name": "elvis",
    "normal_followers_count": 181112,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 10230,
    "translator_type": "regular",
    "url": "https://t.co/H9w2yq9w1L",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1756792296333361370",
  "conversation_id": "1756792296333361370",
  "full_text": "Improving Mathematical Reasoning in Open LLMs\n\nOne of my favorite papers this past week was the work of Shao et al. who introduced a 7B model called DeepSeekMath for improving mathematical reasoning of LLMs. \n\nTLDR: \n- continues pretraining a code base model with 120B math-related tokens\n- introduces GRPO (a variant of PPO) to enhance mathematical reasoning and reduce training resources via a memory usage optimization scheme \n- DeepSeekMath 7B achieves 51.7% on MATH which approaches the performance level of Gemini-Ultra (53.2%) and GPT-4 (52.9%) \n-when self-consistency is used the performance improves to 60.9%.\n\nWhat's outstanding about this specialized model is that it can reach a performance that's competitive with general-purpose models (Gemini Ultra and GPT-4) and even large specialized models (Minerva 540B). \n\nIt focuses on pre-training with math information at scale (120B tokens), so there is an element of optimizing the data selection pipeline that leads to huge gains. Code training is also key to enhancing mathematical reasoning capabilities.\n\nAnother aspect of this paper that I like is the introduction of an efficient PPO variant that hints at more performant and efficient ways to leverage reinforcement learning for LLMs. Lots of discussions, analysis, and exploration about RL in the paper.",
  "reply_count": 4,
  "retweet_count": 125,
  "favorite_count": 565,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GGFg696XEAAafTj.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1756792296333361370",
  "created_at": "2024-02-11T21:28:17.000Z",
  "#sort_index": "1756792296333361370",
  "view_count": 47871,
  "quote_count": 3,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/omarsar0/status/1756792296333361370"
}