🐦 Twitter Post Details

Viewing enriched Twitter post

@AlphaSignalAI

AI2 just released the largest open-source dataset for LLM pretraining: 3 trillion tokens of high quality data. - Web data from Common Crawl. - Quality filtered - Deduplication within each source. - Risk mitigation for harmful content. https://t.co/vHu07YA5GT https://t.co/W9OHrL6QRj

Media 1

📊 Media Metadata

{
  "media": [
    {
      "url": "https://pbs.twimg.com/media/F4YcuWRWwAAOJMx.jpg",
      "type": "photo",
      "original_url": "https://pbs.twimg.com/media/F4YcuWRWwAAOJMx.jpg",
      "format_converted_from_list": true
    }
  ],
  "conversion_date": "2025-08-13T00:32:45.934752",
  "format_converted": true,
  "original_structure": "had_media_only"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2012-11-07T07:19:36.000Z",
    "default_profile_image": false,
    "description": "Covering the latest breakthroughs in AI. ML Engineer/Researcher now building AlphaSignal → A technical newsletter read by 120,000+ AI developers.",
    "fast_followers_count": 0,
    "favourites_count": 4230,
    "followers_count": 64949,
    "friends_count": 702,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 1408,
    "location": "Join 120,000+ readers →",
    "media_count": 359,
    "name": "Lior⚡",
    "normal_followers_count": 64949,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/931470139/1681303371",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1599792074336964608/CobSHV8l_normal.jpg",
    "screen_name": "AlphaSignalAI",
    "statuses_count": 2400,
    "translator_type": "none",
    "url": "https://t.co/AyubevadmD",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "931470139"
  },
  "id": "1695073895269741015",
  "conversation_id": "1695073895269741015",
  "full_text": "AI2 just released the largest open-source dataset for LLM pretraining: 3 trillion tokens of high quality data.\n\n- Web data from Common Crawl.\n- Quality filtered\n- Deduplication within each source.\n- Risk mitigation for harmful content. \n\nhttps://t.co/vHu07YA5GT https://t.co/W9OHrL6QRj",
  "reply_count": 4,
  "retweet_count": 7,
  "favorite_count": 43,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [
    {
      "url": "https://t.co/vHu07YA5GT",
      "expanded_url": "https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64",
      "display_url": "blog.allenai.org/dolma-3-trilli…"
    }
  ],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F4YcuWRWwAAOJMx.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/AlphaSignalAI/status/1695073895269741015",
  "created_at": "2023-08-25T14:01:24.000Z",
  "#sort_index": "1695073895269741015",
  "view_count": 4912,
  "quote_count": 0,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": false,
  "startUrl": "https://twitter.com/alphasignalai/status/1695073895269741015"
}