🐦 Twitter Post Details

Viewing enriched Twitter post

@OpenBMB

πŸ”₯ Ultra-FineWeb-en-v1.4 is coming! 2.2T tokens fully open-sourced! The core training fuel for MiniCPM4 / 4.1, fully updated based on FineWeb v1.4.0: πŸ†• What's New 1️⃣ Fresher Data: Added CommonCrawl snapshots from Apr 2024 - Jun 2025 to capture the latest world knowledge. 2️⃣ Easier Access: CC Dump Slices are here! No need to download the entire massive dataset anymore, fetch exactly what you need seamlessly. ⚑ Highlights & Performance - Efficient Verification: Efficient Verification Strategy: Reduces data verification cost by 90% - High-Efficiency Filtering Pipeline: Optimizes selection of both positive and negative samples - Performance Gains: +3.613/+1.331 (Eng) & +1.98/+0.61 (Chn) vs. FineWeb/FineWeb-edu & Chinese FineWeb-edu-v2. Still high-quality cleaning. Still true to the open-source spirit. Welcome to download and test! πŸš€ πŸ”— Resources πŸ€— Dataset: https://t.co/KluL5t2kUn πŸ“„ Paper: https://t.co/Kg9LLUqZgB 🧩 Classifier:https://t.co/oUfxrN6AmP πŸ€– MiniCPM4:https://t.co/IQ82jD1PTi #UltraFineWeb #MiniCPM4 #AI #LLM #OpenBMB #UltraData

Media 1
Media 2
Media 3

πŸ“Š Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_2.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_2.jpg?",
      "type": "photo",
      "filename": "media_2.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_3.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1998761211777921144/media_3.jpg?",
      "type": "photo",
      "filename": "media_3.jpg"
    }
  ],
  "processed_at": "2025-12-11T18:33:50.491126",
  "pipeline_version": "2.0"
}

πŸ”§ Raw API Response

{
  "type": "tweet",
  "id": "1998761211777921144",
  "url": "https://x.com/OpenBMB/status/1998761211777921144",
  "twitterUrl": "https://twitter.com/OpenBMB/status/1998761211777921144",
  "text": "πŸ”₯ Ultra-FineWeb-en-v1.4 is coming! 2.2T tokens fully open-sourced!\nThe core training fuel for MiniCPM4 / 4.1, fully updated based on FineWeb v1.4.0:\nπŸ†• What's New\n1️⃣ Fresher Data: Added CommonCrawl snapshots from Apr 2024 - Jun 2025 to capture the latest world knowledge.\n2️⃣ Easier Access: CC Dump Slices are here! No need to download the entire massive dataset anymore, fetch exactly what you need seamlessly.\n⚑ Highlights & Performance\n- Efficient Verification: Efficient Verification Strategy: Reduces data verification cost by 90%\n- High-Efficiency Filtering Pipeline: Optimizes selection of both positive and negative samples\n- Performance Gains: +3.613/+1.331 (Eng) & +1.98/+0.61 (Chn) vs. FineWeb/FineWeb-edu & Chinese FineWeb-edu-v2.\nStill high-quality cleaning. Still true to the open-source spirit. Welcome to download and test! πŸš€\nπŸ”— Resources \nπŸ€— Dataset: https://t.co/KluL5t2kUn\nπŸ“„ Paper: https://t.co/Kg9LLUqZgB \n🧩 Classifier:https://t.co/oUfxrN6AmP\nπŸ€– MiniCPM4:https://t.co/IQ82jD1PTi\n\n#UltraFineWeb #MiniCPM4 #AI #LLM #OpenBMB #UltraData",
  "source": "Twitter for iPhone",
  "retweetCount": 19,
  "replyCount": 3,
  "likeCount": 194,
  "quoteCount": 5,
  "viewCount": 17750,
  "createdAt": "Wed Dec 10 14:26:22 +0000 2025",
  "lang": "en",
  "bookmarkCount": 116,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1998761211777921144",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "OpenBMB",
    "url": "https://x.com/OpenBMB",
    "twitterUrl": "https://twitter.com/OpenBMB",
    "id": "1496119294844825600",
    "name": "OpenBMB",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1503637999535284230/ZflGAHZW_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1496119294844825600/1730975684",
    "description": "OpenBMB (Open Lab for Big Model Base) aims to build foundation models and systems towards AGI.",
    "location": "",
    "followers": 4592,
    "following": 194,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Tue Feb 22 13:47:10 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {
        "urls": [
          {
            "display_url": "github.com/OpenBMB",
            "expanded_url": "https://github.com/OpenBMB",
            "url": "https://t.co/FxKaFMW2cu",
            "indices": [
              0,
              23
            ]
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1356,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 120,
    "statusesCount": 562,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1998761211777921144"
    ],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "width": 800,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=800x419"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 150,
            "width": 278,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=280x150"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "huggingface.co",
          "type": "STRING"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 320,
            "width": 593,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=800x320_1"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "width": 386,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=386x202"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=orig"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "site",
        "value": {
          "scribe_key": "publisher_id",
          "type": "USER",
          "user_value": {
            "id_str": "778764142412984320",
            "path": []
          }
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "width": 386,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=386x202"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "width": 800,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=800x419"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 78,
            "width": 144,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=144x144"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=png&name=2048x2048_2_exp"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=orig"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "huggingface.co",
          "type": "STRING"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "width": 600,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=600x314"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "rgb": {
                  "blue": 237,
                  "green": 95,
                  "red": 94
                },
                "percentage": 32.78
              },
              {
                "rgb": {
                  "blue": 177,
                  "green": 85,
                  "red": 48
                },
                "percentage": 16.93
              },
              {
                "rgb": {
                  "blue": 185,
                  "green": 73,
                  "red": 157
                },
                "percentage": 14.87
              },
              {
                "rgb": {
                  "blue": 165,
                  "green": 73,
                  "red": 194
                },
                "percentage": 9.99
              },
              {
                "rgb": {
                  "blue": 81,
                  "green": 33,
                  "red": 29
                },
                "percentage": 4.69
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "openbmb/Ultra-FineWeb Β· Datasets at Hugging Face",
          "type": "STRING"
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "rgb": {
                  "blue": 237,
                  "green": 95,
                  "red": 94
                },
                "percentage": 32.78
              },
              {
                "rgb": {
                  "blue": 177,
                  "green": 85,
                  "red": 48
                },
                "percentage": 16.93
              },
              {
                "rgb": {
                  "blue": 185,
                  "green": 73,
                  "red": 157
                },
                "percentage": 14.87
              },
              {
                "rgb": {
                  "blue": 165,
                  "green": 73,
                  "red": 194
                },
                "percentage": 9.99
              },
              {
                "rgb": {
                  "blue": 81,
                  "green": 33,
                  "red": 29
                },
                "percentage": 4.69
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=png&name=2048x2048_2_exp"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "width": 600,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=600x314"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "rgb": {
                  "blue": 237,
                  "green": 95,
                  "red": 94
                },
                "percentage": 32.78
              },
              {
                "rgb": {
                  "blue": 177,
                  "green": 85,
                  "red": 48
                },
                "percentage": 16.93
              },
              {
                "rgb": {
                  "blue": 185,
                  "green": 73,
                  "red": 157
                },
                "percentage": 14.87
              },
              {
                "rgb": {
                  "blue": 165,
                  "green": 73,
                  "red": 194
                },
                "percentage": 9.99
              },
              {
                "rgb": {
                  "blue": 81,
                  "green": 33,
                  "red": 29
                },
                "percentage": 4.69
              }
            ]
          },
          "type": "IMAGE_COLOR"
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=png&name=2048x2048_2_exp"
          },
          "type": "IMAGE"
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/KluL5t2kUn",
          "type": "STRING"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 648,
            "width": 1200,
            "url": "https://pbs.twimg.com/card_img/1998394993690771457/TrmaTsfr?format=jpg&name=orig"
          },
          "type": "IMAGE"
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "Android",
          "version": "12"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/KluL5t2kUn",
    "user_refs_results": [
      {
        "result": {
          "__typename": "User",
          "id": "VXNlcjo3Nzg3NjQxNDI0MTI5ODQzMjA=",
          "rest_id": "778764142412984320",
          "affiliates_highlighted_label": {},
          "has_graduated_access": true,
          "is_blue_verified": true,
          "profile_image_shape": "Square",
          "legacy": {
            "can_dm": true,
            "can_media_tag": true,
            "created_at": "Thu Sep 22 01:13:35 +0000 2016",
            "default_profile": true,
            "default_profile_image": false,
            "description": "The AI community building the future. https://t.co/VkRPD0Vclr",
            "entities": {
              "description": {
                "urls": [
                  {
                    "display_url": "hf.co/jobs",
                    "expanded_url": "http://hf.co/jobs",
                    "url": "https://t.co/VkRPD0Vclr",
                    "indices": [
                      38,
                      61
                    ]
                  }
                ]
              },
              "url": {
                "urls": [
                  {
                    "display_url": "huggingface.co",
                    "expanded_url": "https://huggingface.co",
                    "url": "https://t.co/A337WqHDnG",
                    "indices": [
                      0,
                      23
                    ]
                  }
                ]
              }
            },
            "fast_followers_count": 0,
            "favourites_count": 8617,
            "followers_count": 593153,
            "friends_count": 214,
            "has_custom_timelines": false,
            "is_translator": false,
            "listed_count": 7290,
            "location": "NYC and Paris and 🌏",
            "media_count": 459,
            "name": "Hugging Face",
            "normal_followers_count": 593153,
            "pinned_tweet_ids_str": [],
            "possibly_sensitive": false,
            "profile_banner_url": "https://pbs.twimg.com/profile_banners/778764142412984320/1731533786",
            "profile_image_url_https": "https://pbs.twimg.com/profile_images/1991559933473497089/mbrRS49P_normal.jpg",
            "profile_interstitial_type": "",
            "screen_name": "huggingface",
            "statuses_count": 12572,
            "translator_type": "none",
            "url": "https://t.co/A337WqHDnG",
            "verified": false,
            "verified_type": "Business",
            "want_retweets": false,
            "withheld_in_countries": []
          },
          "professional": {
            "rest_id": "1977499064699699479",
            "professional_type": "Business",
            "category": [
              {
                "id": 713,
                "name": "Science & Technology",
                "icon_name": ""
              }
            ]
          },
          "tipjar_settings": {}
        }
      }
    ]
  },
  "place": {},
  "entities": {
    "hashtags": [
      {
        "indices": [
          997,
          1010
        ],
        "text": "UltraFineWeb"
      },
      {
        "indices": [
          1011,
          1020
        ],
        "text": "MiniCPM4"
      },
      {
        "indices": [
          1021,
          1024
        ],
        "text": "AI"
      },
      {
        "indices": [
          1025,
          1029
        ],
        "text": "LLM"
      },
      {
        "indices": [
          1030,
          1038
        ],
        "text": "OpenBMB"
      },
      {
        "indices": [
          1039,
          1049
        ],
        "text": "UltraData"
      }
    ],
    "symbols": [],
    "urls": [
      {
        "display_url": "huggingface.co/datasets/openb…",
        "expanded_url": "https://huggingface.co/datasets/openbmb/Ultra-FineWeb",
        "url": "https://t.co/KluL5t2kUn",
        "indices": [
          866,
          889
        ]
      },
      {
        "display_url": "arxiv.org/abs/2505.05427",
        "expanded_url": "http://arxiv.org/abs/2505.05427",
        "url": "https://t.co/Kg9LLUqZgB",
        "indices": [
          899,
          922
        ]
      },
      {
        "display_url": "huggingface.co/openbmb/Ultra-…",
        "expanded_url": "https://huggingface.co/openbmb/Ultra-FineWeb-classifier",
        "url": "https://t.co/oUfxrN6AmP",
        "indices": [
          937,
          960
        ]
      },
      {
        "display_url": "huggingface.co/collections/op…",
        "expanded_url": "https://huggingface.co/collections/openbmb/minicpm4",
        "url": "https://t.co/IQ82jD1PTi",
        "indices": [
          972,
          995
        ]
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}