🐦 Twitter Post Details

Viewing enriched Twitter post

@pascalefung

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by @Delong0_0 @MustafaShukor1 @TheoMoutakanni @willyhcchung Jade Lei Yu Tejaswi Kasarla @AllenBolourchi @ylecun @pascalefung https://t.co/oUnjCaMKVv

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000698260147564683/media_0.mp4?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000698260147564683/media_0.mp4?",
      "type": "video",
      "filename": "media_0.mp4"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000698260147564683/media_1.mp4?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2000698260147564683/media_1.mp4?",
      "type": "video",
      "filename": "media_1.mp4"
    }
  ],
  "processed_at": "2025-12-16T04:45:10.894792",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2000698260147564683",
  "url": "https://x.com/pascalefung/status/2000698260147564683",
  "twitterUrl": "https://twitter.com/pascalefung/status/2000698260147564683",
  "text": "Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. \n\n• VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. \n• We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. \n• We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. \n• We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time.\n\nby @Delong0_0  @MustafaShukor1 @TheoMoutakanni @willyhcchung Jade Lei Yu Tejaswi Kasarla @AllenBolourchi @ylecun @pascalefung \n\nhttps://t.co/oUnjCaMKVv",
  "source": "Twitter for iPhone",
  "retweetCount": 3,
  "replyCount": 1,
  "likeCount": 14,
  "quoteCount": 0,
  "viewCount": 3217,
  "createdAt": "Mon Dec 15 22:43:31 +0000 2025",
  "lang": "en",
  "bookmarkCount": 14,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2000698260147564683",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "pascalefung",
    "url": "https://x.com/pascalefung",
    "twitterUrl": "https://twitter.com/pascalefung",
    "id": "175319263",
    "name": "Pascale Fung",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1399005024303255554/NQpoDvD7_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/175319263/1491734713",
    "description": "",
    "location": "Paris, France",
    "followers": 2916,
    "following": 46,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Fri Aug 06 08:02:31 +0000 2010",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 132,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 23,
    "statusesCount": 146,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {
      "description": "Senior Director of A.I. Research, Meta-FAIR. Chair Professor of ECE, HKUST. Fellow of AAAI, ACL, IEEE, ISCA.",
      "entities": {
        "description": {}
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "additional_media_info": {
          "monetizable": false
        },
        "display_url": "pic.twitter.com/sHy3iqkO2W",
        "expanded_url": "https://twitter.com/pascalefung/status/2000698260147564683/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "id_str": "2000698196041875458",
        "indices": [
          277,
          300
        ],
        "media_key": "13_2000698196041875458",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARvD6Gi910ACAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABG8PoaL3XQAIAAA==",
            "media_key": "13_2000698196041875458"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2000698196041875458/img/C40nixPKjmlj9TAG.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 320,
          "width": 640
        },
        "sizes": {
          "large": {
            "h": 320,
            "w": 640
          }
        },
        "type": "video",
        "url": "https://t.co/sHy3iqkO2W",
        "video_info": {
          "aspect_ratio": [
            2,
            1
          ],
          "duration_millis": 111466,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video.twimg.com/amplify_video/2000698196041875458/pl/CA69OaZ5GZE30UHg.m3u8?tag=21"
            },
            {
              "bitrate": 256000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2000698196041875458/vid/avc1/540x270/Th4DnAq5ysdnPZDD.mp4?tag=21"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2000698196041875458/vid/avc1/640x320/5zxIImQwYtxTcQW1.mp4?tag=21"
            }
          ]
        }
      },
      {
        "additional_media_info": {
          "monetizable": false
        },
        "display_url": "pic.twitter.com/sHy3iqkO2W",
        "expanded_url": "https://twitter.com/pascalefung/status/2000698260147564683/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "id_str": "2000698196071231492",
        "indices": [
          277,
          300
        ],
        "media_key": "13_2000698196071231492",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARvD6Gi/lzAEAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABG8PoaL+XMAQAAA==",
            "media_key": "13_2000698196071231492"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2000698196071231492/img/TEPHzDIucDCPRV4V.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 428,
          "width": 854
        },
        "sizes": {
          "large": {
            "h": 428,
            "w": 854
          }
        },
        "type": "video",
        "url": "https://t.co/sHy3iqkO2W",
        "video_info": {
          "aspect_ratio": [
            427,
            214
          ],
          "duration_millis": 10000,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video.twimg.com/amplify_video/2000698196071231492/pl/AbWkNdCH42aKfwru.m3u8?tag=21"
            },
            {
              "bitrate": 256000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2000698196071231492/vid/avc1/538x270/_pWigxvH208qaPEC.mp4?tag=21"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2000698196071231492/vid/avc1/854x428/55ab_OzU6K90S05T.mp4?tag=21"
            }
          ]
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "urls": [
      {
        "display_url": "arxiv.org/abs/2512.10942",
        "expanded_url": "https://arxiv.org/abs/2512.10942",
        "indices": [
          1051,
          1074
        ],
        "url": "https://t.co/oUnjCaMKVv"
      }
    ],
    "user_mentions": [
      {
        "id_str": "4895781409",
        "indices": [
          926,
          936
        ],
        "name": "Delong Chen (陈德龙)",
        "screen_name": "Delong0_0"
      },
      {
        "id_str": "1438259134860828674",
        "indices": [
          938,
          953
        ],
        "name": "Mustafa Shukor",
        "screen_name": "MustafaShukor1"
      },
      {
        "id_str": "1647851351634788352",
        "indices": [
          954,
          969
        ],
        "name": "Théo Moutakanni",
        "screen_name": "TheoMoutakanni"
      },
      {
        "id_str": "1114119535253237760",
        "indices": [
          970,
          983
        ],
        "name": "Willy",
        "screen_name": "willyhcchung"
      },
      {
        "id_str": "1296859681458069504",
        "indices": [
          1012,
          1027
        ],
        "name": "Allen Bolourchi",
        "screen_name": "AllenBolourchi"
      },
      {
        "id_str": "48008938",
        "indices": [
          1028,
          1035
        ],
        "name": "Yann LeCun",
        "screen_name": "ylecun"
      },
      {
        "id_str": "175319263",
        "indices": [
          1036,
          1048
        ],
        "name": "Pascale Fung",
        "screen_name": "pascalefung"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}