🐦 Twitter Post Details

Viewing enriched Twitter post

@bindureddy

Closing the Gap to Closed Source LLMs – Open-Sourcing 70B Abacus Giraffe! The best-performing open-source model on MT-bench in key categories We are super excited to be open-sourcing our best model for Enterprise AI use cases - the 70B 32K context length Abacus Giraffe model! Abacus Giraffe, based on the 70B Llama-2 offers all the performance gains of Llama-2 with the additional advantage of having a 32K context length Giraffe is a family of models that are finetuned from base Llama 2 and use context length extension techniques to increase their effective context length capability from 4096 to approximately 32000. The longer context window improves performance on many downstream tasks and allows new use cases for the model that a short context length may not permit. We conducted an evaluation of the 70B model on our set of benchmarks that probe LLM performance over long contexts. The 70B model improves significantly at the longest context windows (32k) for the document QA task vs. the 13B model, scoring 61% accuracy vs. the 18% accuracy of 13B on our AltQA dataset. We also find that it outperforms the comparable LongChat-32k model at all context lengths, with an increasing performance at the longest context lengths (recording 61% vs. 35% accuracy at 32k context length). In addition, we ran 70B Giraffe on the MT-Bench evaluation set. MT-Bench examines the performance of LLMs in multi-turn settings (i.e. more than a single question and answer) across a variety of categories, such as Writing, Coding, and Math. The results of this and the comparison to some other LLMs can be seen in the figure above. 70B Giraffe 32k shows the best performance of all the open-source models in the categories of Extraction, Coding, and Math, and maintains a high score in the other categories. There is still a gap in performance in these categories to the best-closed source models – but we here at Abacus are excited to try to close that gap further. Our blog post including links to our git-repo and evals here - https://t.co/vLuAdQYV41

🔧 Raw API Response

{
  "user": {
    "created_at": "2007-05-29T00:21:54.000Z",
    "default_profile_image": false,
    "description": "CEO of @abacusai, using Gen AI to build Applied AI and LLM agents and systems at scale, ex-AWS / Google, passionate about human behavior",
    "fast_followers_count": 0,
    "favourites_count": 6110,
    "followers_count": 92825,
    "friends_count": 400,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 1274,
    "location": "San Francisco, CA",
    "media_count": 513,
    "name": "Bindu Reddy",
    "normal_followers_count": 92825,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/6398252/1689653510",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1443737943684763651/32WHA-kg_normal.jpg",
    "screen_name": "bindureddy",
    "statuses_count": 10668,
    "translator_type": "none",
    "url": "https://t.co/QA1YZ10l2F",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "6398252"
  },
  "id": "1706783306073989314",
  "conversation_id": "1706783306073989314",
  "full_text": "Closing the Gap to Closed Source LLMs – Open-Sourcing 70B Abacus Giraffe!\nThe best-performing open-source model on MT-bench in key categories\n\nWe are super excited to be open-sourcing our best model for Enterprise AI use cases -  the 70B 32K context length Abacus Giraffe model!  \n\nAbacus Giraffe, based on the 70B Llama-2 offers all the performance gains of Llama-2 with the additional advantage of having a 32K context length\n\nGiraffe is a family of models that are finetuned from base Llama 2 and use context length extension techniques to increase their effective context length capability from 4096 to approximately 32000. The longer context window improves performance on many downstream tasks and allows new use cases for the model that a short context length may not permit.\n\nWe conducted an evaluation of the 70B model on our set of benchmarks that probe LLM performance over long contexts. \n\nThe 70B model improves significantly at the longest context windows (32k) for the document QA task vs. the 13B model, scoring 61% accuracy vs. the 18% accuracy of 13B on our AltQA dataset. We also find that it outperforms the comparable LongChat-32k model at all context lengths, with an increasing performance at the longest context lengths (recording 61% vs. 35% accuracy at 32k context length).\n\nIn addition, we ran 70B Giraffe on the MT-Bench evaluation set. MT-Bench examines the performance of LLMs in multi-turn settings (i.e. more than a single question and answer) across a variety of categories, such as Writing, Coding, and Math. \n\nThe results of this and the comparison to some other LLMs can be seen in the figure above. 70B Giraffe 32k shows the best performance of all the open-source models in the categories of Extraction, Coding, and Math, and maintains a high score in the other categories. There is still a gap in performance in these categories to the best-closed source models – but we here at Abacus are excited to try to close that gap further. \n\nOur blog post including links to our git-repo and evals here - https://t.co/vLuAdQYV41",
  "reply_count": 11,
  "retweet_count": 99,
  "favorite_count": 503,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F6-1u4BaEAABTm9.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/bindureddy/status/1706783306073989314",
  "created_at": "2023-09-26T21:30:25.000Z",
  "#sort_index": "1706783306073989314",
  "view_count": 128010,
  "quote_count": 10,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/bindureddy/status/1706783306073989314"
}