🐦 Twitter Post Details

Viewing enriched Twitter post

@iScienceLuvr

NovoMolGen: Rethinking Molecular Language Model Pretraining "there remains limited understanding regarding how standard language modeling practices such as textual representations, tokenization strategies, model size, and dataset scale impact molecular generation performance. In this work, we systematically investigate these critical aspects by introducing NovoMolGen, a family of transformer-based foundation models pretrained on 1.5 billion molecules for de-novo molecule generation. Through extensive empirical analyses, we identify a weak correlation between performance metrics measured during pretraining and actual downstream performance, revealing important distinctions between molecular and general NLP training dynamics. NovoMolGen establishes new state-of-the-art results, substantially outperforming prior Mol-LLMs and specialized generative models in both unconstrained and goal-directed molecular generation tasks"

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1958094437936148626/media_0.jpg?",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-08-20T12:48:42.459510",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1958094437936148626",
  "url": "https://x.com/iScienceLuvr/status/1958094437936148626",
  "twitterUrl": "https://twitter.com/iScienceLuvr/status/1958094437936148626",
  "text": "NovoMolGen: Rethinking Molecular Language Model Pretraining\n\n\"there remains limited understanding regarding how standard language  modeling practices such as textual representations, tokenization  strategies, model size, and dataset scale impact molecular generation  performance. In this work, we systematically investigate these critical  aspects by introducing NovoMolGen, a family of transformer-based  foundation models pretrained on 1.5 billion molecules for de-novo  molecule generation. Through extensive empirical analyses, we identify a  weak correlation between performance metrics measured during  pretraining and actual downstream performance, revealing important  distinctions between molecular and general NLP training dynamics.  NovoMolGen establishes new state-of-the-art results, substantially  outperforming prior Mol-LLMs and specialized generative models in both  unconstrained and goal-directed molecular generation tasks\"",
  "source": "Twitter for iPhone",
  "retweetCount": 2,
  "replyCount": 2,
  "likeCount": 5,
  "quoteCount": 0,
  "viewCount": 1722,
  "createdAt": "Wed Aug 20 09:11:08 +0000 2025",
  "lang": "en",
  "bookmarkCount": 3,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1958094437936148626",
  "displayTextRange": [
    0,
    281
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "iScienceLuvr",
    "url": "https://x.com/iScienceLuvr",
    "twitterUrl": "https://twitter.com/iScienceLuvr",
    "id": "441465751",
    "name": "Tanishq Mathew Abraham, Ph.D.",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1913710019729821696/Qge4zx6u_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/441465751/1738204246",
    "description": "",
    "location": "",
    "followers": 80199,
    "following": 1243,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Tue Dec 20 03:45:50 +0000 2011",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 104894,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 2445,
    "statusesCount": 17835,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1952221233648718307"
    ],
    "profile_bio": {
      "description": "CEO @SophontAI |\nPhD at 19 (2023) |\nFounder, ex CEO @MedARC_AI |\nex Research Director Stability AI | \nBiomed. engineer @ 14 |\nTEDx talk➡https://t.co/xPxwKTq6Qb",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "bit.ly/3tpAuan",
              "expanded_url": "https://bit.ly/3tpAuan",
              "indices": [
                136,
                159
              ],
              "url": "https://t.co/xPxwKTq6Qb"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                4,
                14
              ],
              "name": "",
              "screen_name": "SophontAI"
            },
            {
              "id_str": "0",
              "indices": [
                52,
                62
              ],
              "name": "",
              "screen_name": "MedARC_AI"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "sophontai.com",
              "expanded_url": "https://sophontai.com",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/uQ936JTZf1"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/WRzP0R6jc8",
        "expanded_url": "https://twitter.com/iScienceLuvr/status/1958094437936148626/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1958094040521662465",
        "indices": [
          282,
          305
        ],
        "media_key": "3_1958094040521662465",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARssjCfKGmABCgACGyyMhFHaMJIAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABGyyMJ8oaYAEKAAIbLIyEUdowkgAA",
            "media_key": "3_1958094040521662465"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/GyyMJ8oaYAE7nZv.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 786,
              "w": 1404,
              "x": 0,
              "y": 0
            },
            {
              "h": 1404,
              "w": 1404,
              "x": 0,
              "y": 0
            },
            {
              "h": 1601,
              "w": 1404,
              "x": 0,
              "y": 0
            },
            {
              "h": 1833,
              "w": 917,
              "x": 0,
              "y": 0
            },
            {
              "h": 1833,
              "w": 1404,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1833,
          "width": 1404
        },
        "sizes": {
          "large": {
            "h": 1833,
            "w": 1404
          }
        },
        "type": "photo",
        "url": "https://t.co/WRzP0R6jc8"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {},
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "article": null
}