๐Ÿฆ Twitter Post Details

Viewing enriched Twitter post

@_philschmid

New Embedding Models for Code released by @awscloud! Embedding Models are at the heart of every RAG application. Without good embeddings, retrieving relevant context to answer your user prompts is impossible. ๐Ÿ” Super exciting to see Amazon release CodeSage, a family of open code embedding models with an encoder architecture that supports a wide range of source code understanding tasks. ๐Ÿค— TL;DR; ๐Ÿ“ Comes in 3 sizes: 130M, 356M, 1.3B ๐Ÿ“š Pre-trained on @BigCodeProject the Stack (237 million code files) ๐Ÿ‡ช๐Ÿ‡บ Fine-tuned on 75 million bimodal (code and natural language) pairs ๐Ÿ” Using hard negatives & hard positive improve MAP > 10% ๐Ÿ”  Using @BigCodeProject StarCoder Tokenizer โš–๏ธ Licensed under Apache 2.0 ๐Ÿฅ‡ Outperforms @OpenAI and others on 0-shot Code Search ๐Ÿš€ Sota Performance on NL2Code (Natural Language to Code) ๐Ÿค—ย Available on @huggingface and supported in Sentence Transformers

๐Ÿ”ง Raw API Response

{
  "user": {
    "created_at": "2019-06-18T18:39:49.000Z",
    "default_profile_image": false,
    "description": "Tech Lead and LLMs at @huggingface ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป ๐Ÿค—  AWS ML Hero ๐Ÿฆธ๐Ÿป | Cloud & ML enthusiast | ๐Ÿ“Nuremberg | ๐Ÿ‡ฉ๐Ÿ‡ช https://t.co/l1ppq3q3hk",
    "fast_followers_count": 0,
    "favourites_count": 3770,
    "followers_count": 12805,
    "friends_count": 571,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 345,
    "location": "Nรผrnberg",
    "media_count": 308,
    "name": "Philipp Schmid",
    "normal_followers_count": 12805,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1141052916570214400/1582380032",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1714444511860887552/8TzsCn3e_normal.jpg",
    "screen_name": "_philschmid",
    "statuses_count": 1394,
    "translator_type": "none",
    "url": "https://t.co/8BDXIK6omb",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "1141052916570214400"
  },
  "id": "1757426571634176389",
  "conversation_id": "1757426571634176389",
  "full_text": "New Embedding Models for Code released by @awscloud!\n Embedding Models are at the heart of every RAG application. Without good embeddings, retrieving relevant context to answer your user prompts is impossible. ๐Ÿ”\n\nSuper exciting to see Amazon release CodeSage, a family of open code embedding models with an encoder architecture that supports a wide range of source code understanding tasks. ๐Ÿค—\n\nTL;DR;\n๐Ÿ“ Comes in 3 sizes: 130M, 356M, 1.3B\n๐Ÿ“š Pre-trained on @BigCodeProject the Stack (237 million code files)\n๐Ÿ‡ช๐Ÿ‡บ Fine-tuned on 75 million bimodal (code and natural language) pairs\n๐Ÿ” Using hard negatives & hard positive improve MAP > 10%\n๐Ÿ”  Using @BigCodeProject StarCoder Tokenizer\nโš–๏ธ Licensed under Apache 2.0\n๐Ÿฅ‡ Outperforms @OpenAI and others on 0-shot Code Search\n๐Ÿš€ Sota Performance on NL2Code (Natural Language to Code)\n๐Ÿค—ย Available on @huggingface and supported in Sentence Transformers",
  "reply_count": 5,
  "retweet_count": 37,
  "favorite_count": 231,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "66780587",
      "name": "Amazon Web Services",
      "screen_name": "awscloud",
      "profile": "https://twitter.com/awscloud"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GGOiD0BWMAAFt_P.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/_philschmid/status/1757426571634176389",
  "created_at": "2024-02-13T15:28:40.000Z",
  "#sort_index": "1757426571634176389",
  "view_count": 27306,
  "quote_count": 3,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/_philschmid/status/1757426571634176389"
}