🐦 Twitter Post Details

Viewing enriched Twitter post

@kushal_tirumala

Excited to release our work in data selection for LLM pre-training! We introduce a new data selection method for large-scale web data (D4) which gets ~20% efficiency gains & +2% downstream acc @ 6.7B scale over the current standard of randomly sampling Minhash deduped web docs https://t.co/imH9K5rSfx

🔧 Raw API Response

{
  "user": {
    "created_at": "2022-05-17T06:28:28.000Z",
    "default_profile_image": false,
    "description": "Researcher @ FAIR (@MetaAI), formerly Math/CS ugrad @Caltech",
    "fast_followers_count": 0,
    "favourites_count": 99,
    "followers_count": 243,
    "friends_count": 66,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 7,
    "location": "",
    "media_count": 9,
    "name": "Kushal Tirumala",
    "normal_followers_count": 243,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1526449505779847168/1652770052",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1526458500448079872/hOditV7T_normal.jpg",
    "screen_name": "kushal_tirumala",
    "statuses_count": 42,
    "translator_type": "none",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "1526449505779847168"
  },
  "id": "1696632999134273927",
  "conversation_id": "1696632999134273927",
  "full_text": "Excited to release our work in data selection for LLM pre-training!\n\nWe introduce a new data selection method for large-scale web data (D4) which gets ~20% efficiency gains & +2% downstream acc @ 6.7B scale over the current standard of randomly sampling Minhash deduped web docs https://t.co/imH9K5rSfx",
  "reply_count": 6,
  "retweet_count": 29,
  "favorite_count": 147,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F4ueivBaoAMD8gB.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/kushal_tirumala/status/1696632999134273927",
  "created_at": "2023-08-29T21:16:43.000Z",
  "#sort_index": "1696632999134273927",
  "view_count": 17354,
  "quote_count": 2,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": false,
  "startUrl": "https://twitter.com/kushal_tirumala/status/1696632999134273927"
}