🐦 Twitter Post Details

Viewing enriched Twitter post

@hanlin_hl

Glad to share our new preprint from my Meta @AIatMeta internship and @uncnlp collaboration: VEDiT: Latent Prediction Architecture for Procedural Video Representation Learning 🌟 - A well designed DiT-based prediction model ➕a strong off-the-shelf frozen visual encoder ➡️ SoTA in procedural learning tasks without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR. - Compared with image/video generative models that learn representations from pixel space, we predict visual representations entirely in the embedding space of publicly available vision encoders. See more details in paper 👉 https://t.co/ExSHWRiMRU Thread below 🧵

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2021-12-28T23:49:13.000Z",
    "default_profile_image": false,
    "description": "PhD @UNCCS @UNCNLP | MS @Columbia | Research Intern @AIatMeta| Working on generative models, multimodal learning, and LLMs",
    "fast_followers_count": 0,
    "favourites_count": 378,
    "followers_count": 463,
    "friends_count": 788,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 9,
    "location": "Chapel Hill, NC",
    "media_count": 29,
    "name": "Han Lin",
    "normal_followers_count": 463,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1475977099903180802/1697937858",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1564028789012807684/z99so2X1_normal.jpg",
    "screen_name": "hanlin_hl",
    "statuses_count": 235,
    "translator_type": "none",
    "url": "https://t.co/Bs8GUlRCnb",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "1475977099903180802"
  },
  "id": "1843332768304120286",
  "conversation_id": "1843332768304120286",
  "full_text": "Glad to share our new preprint from my Meta @AIatMeta   internship and @uncnlp collaboration: \n\nVEDiT: Latent Prediction Architecture for Procedural Video Representation Learning 🌟\n\n- A well designed DiT-based prediction model ➕a strong off-the-shelf frozen visual encoder ➡️ SoTA in procedural learning tasks without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR. \n- Compared with image/video generative models that learn representations from pixel space, we predict visual representations entirely in the embedding space of publicly available vision encoders.\n\nSee more details in paper 👉 https://t.co/ExSHWRiMRU\n\nThread below 🧵",
  "reply_count": 2,
  "retweet_count": 44,
  "favorite_count": 145,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "1034844617261248512",
      "name": "AI at Meta",
      "screen_name": "AIatMeta",
      "profile": "https://twitter.com/AIatMeta"
    },
    {
      "id_str": "875914488020701188",
      "name": "UNC NLP",
      "screen_name": "uncnlp",
      "profile": "https://twitter.com/uncnlp"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GZTOkjZb0AAq8jj.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/hanlin_hl/status/1843332768304120286",
  "created_at": "2024-10-07T16:49:14.000Z",
  "#sort_index": "1843332768304120286",
  "view_count": 29436,
  "quote_count": 3,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/hanlin_hl/status/1843332768304120286"
}