🐦 Twitter Post Details

Viewing enriched Twitter post

@reach_vb

Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥 > WhisperSpeech X Llama 3.1 8B > Trained on 50K hours of speech (7 languages) > Continually trained on 45hrs 10x A1000s > MLS -> WhisperVQ tokens -> Llama 3.1 > Instruction tuned on 1.89M samples > 70% speech, 20% transcription, 10% text > Apache 2.0 licensed ⚡ Architecture: > WhisperSpeech/ VQ for Semantic Tokens > Llama 3.1 8B Instruct for Text backbone > Early fusion (Chameleon) I'm super bullish on @homebrewltd and early fusion, audio and text, multimodal models! (P.S. Play with the demo on Hugging Face)

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2017-06-14T13:50:54.000Z",
    "default_profile_image": false,
    "description": "GPU poor @Huggingface | F1 fan | Here for @at_sofdog’s wisdom | *opinions my own",
    "fast_followers_count": 0,
    "favourites_count": 9835,
    "followers_count": 20342,
    "friends_count": 257,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 444,
    "location": "nvidia-smi",
    "media_count": 1158,
    "name": "Vaibhav (VB) Srivastav",
    "normal_followers_count": 20342,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/874987512850128897/1651638851",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1509901130670747666/JFlrSzB4_normal.jpg",
    "screen_name": "reach_vb",
    "statuses_count": 6133,
    "translator_type": "none",
    "url": "https://t.co/83qEAS8N0w",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "874987512850128897"
  },
  "id": "1845749495927062861",
  "conversation_id": "1845749495927062861",
  "full_text": "Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥\n\n> WhisperSpeech X Llama 3.1 8B\n> Trained on 50K hours of speech (7 languages)\n> Continually trained on 45hrs 10x A1000s\n> MLS -> WhisperVQ tokens -> Llama 3.1\n> Instruction tuned on 1.89M samples\n> 70% speech, 20% transcription, 10% text\n> Apache 2.0 licensed ⚡\n\nArchitecture:\n> WhisperSpeech/ VQ for Semantic Tokens\n> Llama 3.1 8B Instruct for Text backbone\n> Early fusion (Chameleon)\n\nI'm super bullish on @homebrewltd and early fusion, audio and text, multimodal models!\n\n(P.S. Play with the demo on Hugging Face)",
  "reply_count": 16,
  "retweet_count": 134,
  "favorite_count": 1004,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/1845744236970270726/pu/img/U6hx94zGkb62N80T.jpg",
      "type": "video",
      "video_url": "https://video.twimg.com/ext_tw_video/1845744236970270726/pu/vid/avc1/1280x720/X9dPQO6QjF9VBDQK.mp4?tag=12"
    }
  ],
  "url": "https://twitter.com/reach_vb/status/1845749495927062861",
  "created_at": "2024-10-14T08:52:26.000Z",
  "#sort_index": "1845749495927062861",
  "view_count": 80670,
  "quote_count": 17,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/reach_vb/status/1845749495927062861"
}