🐦 Twitter Post Details

Viewing enriched Twitter post

@ActuallyIsaak

Introducing the MLX-Benchmark Suite!! https://t.co/sp4ZMIBxov The first comprehensive benchmark for evaluating LLMs on Apple's MLX framework. 🎯 What is this? MLX Benchmark is a CLI tool and dataset that measures how well large language models understand, write, and debug code for Apple's MLX machine learning framework — covering everything from core array operations to LoRA fine-tuning with mlx-lm, mlx-vlm, and mlx-embeddings. 📊 Dataset https://t.co/5b04a7PKAp - 520 questions across 6 task types: knowledge QA, multiple choice, true/false, fill-in-the-blank, code generation, and debugging - 11 categories spanning the full MLX ecosystem: mlx_core, mlx_nn, mlx_lm, mlx_lm_lora, mlx_vlm, mlx_embeddings, mlx_embeddings_lora, mlx_optimizers, coding, debugging, conceptual - 4 difficulty levels: easy → medium → hard → very-hard - 90+ subcategories covering everything from array_creation to lora_finetuning ✨ Features - 🏃 Multi-provider benchmarking — Ollama, Anthropic, OpenAI, Groq, OpenRouter - ⚖️ LLM-as-judge evaluation — strict scoring with an independent judge model - 🔍 Fine-grained filtering — by type, difficulty, and category - 📝 LaTeX export — --latex generates publication-ready booktabs tables - 📈 PNG chart export — --plot generates grouped bar charts comparing models A detailed paper will be coming as well!!!

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2045255228237238555/media_0.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2045255228237238555/media_0.jpg",
      "type": "photo",
      "filename": "media_0.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2045255228237238555/media_1.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2045255228237238555/media_1.jpg",
      "type": "photo",
      "filename": "media_1.jpg"
    }
  ],
  "processed_at": "2026-04-17T21:49:31.445858",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2045255228237238555",
  "url": "https://x.com/ActuallyIsaak/status/2045255228237238555",
  "twitterUrl": "https://twitter.com/ActuallyIsaak/status/2045255228237238555",
  "text": "Introducing the MLX-Benchmark Suite!!\n\nhttps://t.co/sp4ZMIBxov\n\nThe first comprehensive benchmark for evaluating LLMs on Apple's MLX framework.\n\n🎯 What is this?\nMLX Benchmark is a CLI tool and dataset that measures how well large language models understand, write, and debug code for Apple's MLX machine learning framework — covering everything from core array operations to LoRA fine-tuning with mlx-lm, mlx-vlm, and mlx-embeddings.\n\n📊 Dataset\nhttps://t.co/5b04a7PKAp\n- 520 questions across 6 task types: knowledge QA, multiple choice, true/false, fill-in-the-blank, code generation, and debugging\n- 11 categories spanning the full MLX ecosystem: mlx_core, mlx_nn, mlx_lm, mlx_lm_lora, mlx_vlm, mlx_embeddings, mlx_embeddings_lora, mlx_optimizers, coding, debugging, conceptual\n- 4 difficulty levels: easy → medium → hard → very-hard\n- 90+ subcategories covering everything from array_creation to lora_finetuning\n\n✨ Features\n- 🏃 Multi-provider benchmarking — Ollama, Anthropic, OpenAI, Groq, OpenRouter\n- ⚖️  LLM-as-judge evaluation — strict scoring with an independent judge model\n- 🔍 Fine-grained filtering — by type, difficulty, and category\n- 📝 LaTeX export — --latex generates publication-ready booktabs tables\n- 📈 PNG chart export — --plot generates grouped bar charts comparing models\n\nA detailed paper will be coming as well!!!",
  "source": "Twitter for iPhone",
  "retweetCount": 3,
  "replyCount": 0,
  "likeCount": 10,
  "quoteCount": 0,
  "viewCount": 698,
  "createdAt": "Fri Apr 17 21:37:00 +0000 2026",
  "lang": "en",
  "bookmarkCount": 3,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2045255228237238555",
  "displayTextRange": [
    0,
    279
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "ActuallyIsaak",
    "url": "https://x.com/ActuallyIsaak",
    "twitterUrl": "https://twitter.com/ActuallyIsaak",
    "id": "1497868184464105475",
    "name": "Gökdeniz Gülmez",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/2032846936760606720/Khq9UgZX_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1497868184464105475/1753190247",
    "description": "",
    "location": "Germany",
    "followers": 1907,
    "following": 777,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Sun Feb 27 09:36:32 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 61684,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 135,
    "statusesCount": 1716,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1932412515612328264"
    ],
    "profile_bio": {
      "description": "ML Researcher | Core contributor to MLX | Violin enthusiast 🎻 | Always coding, sometimes watching Anime.",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": []
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 200,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=400x400",
            "width": 400
          }
        }
      },
      {
        "key": "description",
        "value": {
          "string_value": "The best benchmark for LLMs on Apple's MLX framework knowledge and coding tasks. - Goekdeniz-Guelmez/MLX-Benchmark"
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "github.com"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 300,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=600x600",
            "width": 600
          }
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=orig",
            "width": 1200
          }
        }
      },
      {
        "key": "site",
        "value": {
          "scribe_key": "publisher_id",
          "user_value": {
            "id_str": "13334762",
            "path": []
          }
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 72,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=144x144",
            "width": 144
          }
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=orig",
            "width": 1200
          }
        }
      },
      {
        "key": "photo_image_full_size_alt_text",
        "value": {
          "string_value": "The best benchmark for LLMs on Apple's MLX framework knowledge and coding tasks. - Goekdeniz-Guelmez/MLX-Benchmark"
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "github.com"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "summary_photo_image_alt_text",
        "value": {
          "string_value": "The best benchmark for LLMs on Apple's MLX framework knowledge and coding tasks. - Goekdeniz-Guelmez/MLX-Benchmark"
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 89.79,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 4.7,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 2.43,
                "rgb": {
                  "blue": 130,
                  "green": 125,
                  "red": 121
                }
              },
              {
                "percentage": 1.26,
                "rgb": {
                  "blue": 115,
                  "green": 127,
                  "red": 211
                }
              },
              {
                "percentage": 0.77,
                "rgb": {
                  "blue": 30,
                  "green": 52,
                  "red": 211
                }
              }
            ]
          }
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "GitHub - Goekdeniz-Guelmez/MLX-Benchmark: The best benchmark for LLMs on Apple's MLX framework..."
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 89.79,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 4.7,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 2.43,
                "rgb": {
                  "blue": 130,
                  "green": 125,
                  "red": 121
                }
              },
              {
                "percentage": 1.26,
                "rgb": {
                  "blue": 115,
                  "green": 127,
                  "red": 211
                }
              },
              {
                "percentage": 0.77,
                "rgb": {
                  "blue": 30,
                  "green": 52,
                  "red": 211
                }
              }
            ]
          }
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 89.79,
                "rgb": {
                  "blue": 255,
                  "green": 255,
                  "red": 255
                }
              },
              {
                "percentage": 4.7,
                "rgb": {
                  "blue": 166,
                  "green": 115,
                  "red": 53
                }
              },
              {
                "percentage": 2.43,
                "rgb": {
                  "blue": 130,
                  "green": 125,
                  "red": 121
                }
              },
              {
                "percentage": 1.26,
                "rgb": {
                  "blue": 115,
                  "green": 127,
                  "red": 211
                }
              },
              {
                "percentage": 0.77,
                "rgb": {
                  "blue": 30,
                  "green": 52,
                  "red": 211
                }
              }
            ]
          }
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=png&name=2048x2048_2_exp",
            "width": 1200
          }
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/vzbZBYQ82J"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 600,
            "url": "https://pbs.twimg.com/card_img/2045254809473691648/T25LvGyq?format=jpg&name=orig",
            "width": 1200
          }
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "iPhone",
          "version": "13"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/vzbZBYQ82J",
    "user_refs_results": [
      {
        "rest_id": "13334762",
        "result": {
          "__typename": "User",
          "action_counts": {
            "favorites_count": 8687
          },
          "avatar": {
            "image_url": "https://pbs.twimg.com/profile_images/1633247750010830848/8zfRrYjA_normal.png"
          },
          "banner": {
            "image_url": "https://pbs.twimg.com/profile_banners/13334762/1765308302"
          },
          "core": {
            "created_at": "Mon Feb 11 04:41:50 +0000 2008",
            "name": "GitHub",
            "screen_name": "github"
          },
          "dm_permissions": {
            "can_dm": true
          },
          "exclusive_tweet_following": false,
          "follow_request_sent": false,
          "identity_profile_labels_highlighted_label": {},
          "location": {
            "location": "San Francisco, CA"
          },
          "media_permissions": {
            "can_media_tag": true
          },
          "notifications_settings": {
            "notifications_enabled": false
          },
          "pinned_items": {
            "tweet_ids_str": []
          },
          "privacy": {
            "protected": false,
            "suspended": false
          },
          "private_super_following": false,
          "profile_bio": {
            "description": "The AI-powered developer platform to build, scale, and deliver secure software.",
            "entities": {
              "description": {
                "hashtags": [],
                "symbols": [],
                "urls": [],
                "user_mentions": []
              },
              "url": {
                "urls": [
                  {
                    "display_url": "github.com",
                    "expanded_url": "http://github.com",
                    "indices": [
                      0,
                      23
                    ],
                    "url": "https://t.co/bbJgfyzKzp"
                  }
                ]
              }
            }
          },
          "profile_image_shape": "Square",
          "profile_metadata": {
            "profile_interstitial_type": "",
            "profile_link_color": "981CEB"
          },
          "profile_translation": {
            "translator_type_enum": "None"
          },
          "properties": {
            "has_extended_profile": true
          },
          "relationship_counts": {
            "followers": 2630869,
            "following": 334
          },
          "relationship_perspectives": {
            "blocked_by": false,
            "blocking": false,
            "followed_by": false,
            "following": false,
            "live_following": false,
            "muting": false
          },
          "rest_id": "13334762",
          "smart_blocked_by": false,
          "smart_blocking": false,
          "super_follow_eligible": false,
          "super_followed_by": false,
          "super_following": false,
          "tweet_counts": {
            "media_tweets": 2956,
            "tweets": 10499
          },
          "verification": {
            "is_blue_verified": true,
            "verified": false,
            "verified_type": "Business"
          },
          "website": {
            "url": "https://t.co/bbJgfyzKzp"
          }
        }
      }
    ]
  },
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "github.com/Goekdeniz-Guel…",
        "expanded_url": "https://github.com/Goekdeniz-Guelmez/MLX-Benchmark",
        "indices": [
          39,
          62
        ],
        "url": "https://t.co/sp4ZMIBxov"
      },
      {
        "display_url": "huggingface.co/datasets/Goekd…",
        "expanded_url": "https://huggingface.co/datasets/Goekdeniz-Guelmez/MLX-Benchmark-V2",
        "indices": [
          445,
          468
        ],
        "url": "https://t.co/5b04a7PKAp"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}