🐦 Twitter Post Details

Viewing enriched Twitter post

@dair_ai

Top ML Papers of the Week (Oct 2 - Oct 8): - StreamingLLM - Analogical Prompting - The Dawn of LMMs - Neural Developmental Programs - LLMs Represent Space and Time - Retrieval meets Long Context LLMs ... ---- 1/ LLMs Represent Space and Time - discovers that LLMs learn linear representations of space and time across multiple scales; the representations are robust to prompt variations and unified across different entity types; demonstrate that LLMs acquire fundamental structured knowledge such as space and time, claiming that language models learn beyond superficial statistics, but literal world models. https://t.co/pX76NvZPLa 2/ Retrieval meets Long Context LLMs - compares retrieval augmentation and long-context windows for downstream tasks to investigate if the methods can be combined to get the best of both worlds; an LLM with a 4K context window using simple RAG can achieve comparable performance to a fine-tuned LLM with 16K context; retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes; a retrieval-augmented LLaMA2-70B with a 32K context window outperforms GPT-3.5-turbo-16k on seven long context tasks including question answering and query-based summarization. https://t.co/WYi90n0ULH 3/ StreamingLLM - a framework that enables efficient streaming LLMs with attention sinks, a phenomenon where the KV states of initial tokens will largely recover the performance of window attention; the emergence of the attention sink is due to strong attention scores towards the initial tokens; this approach enables LLMs trained with finite length attention windows to generalize to infinite sequence length without any additional fine-tuning. https://t.co/Lima0M4Ctc 4/ Neural Developmental Programs - proposes to use neural networks that self-assemble through a developmental process that mirrors properties of embryonic development in biological organisms (referred to as neural developmental programs); shows the feasibility of the approach in continuous control problems and growing topologies. https://t.co/jr6gwRv0N3 5/ The Dawn of LMMs - a comprehensive analysis of GPT-4V to deepen the understanding of large multimodal models (LMMs); it focuses on probing GPT-4V across various application scenarios; provides examples ranging from code capabilities with vision to retrieval-augmented LMMs. https://t.co/57QsPVoGJe 6/ Training LLMs with Pause Tokens - performs training and inference on LLMs with a learnable <pause> token which helps to delay the model's answer generation and attain performance gains on general understanding tasks of Commonsense QA and math word problem-solving; experiments show that this is only beneficial provided that the delay is introduced in both pertaining and downstream fine-tuning. https://t.co/0fJVAGXIMw 7/ Recursively Self-Improving Code Generation - proposes the use of a language model-infused scaffolding program to recursively improve itself; a seed improver first improves an input program that returns the best solution which is then further tasked to improve itself; shows that the GPT-4 models can write code that can call itself to improve itself. https://t.co/Vzy2Db2VuL 8/ Retrieval-Augmented Dual Instruction Tuning - proposes a lightweight fine-tuning method to retrofit LLMs with retrieval capabilities; it involves a 2-step approach: 1) updates a pretrained LM to better use the retrieved information 2) updates the retriever to return more relevant results, as preferred by the LM Results show that fine-tuning over tasks that require both knowledge utilization and contextual awareness, each stage leads to additional gains; a 65B model achieves state-of-the-art results on a range of knowledge-intensive zero- and few-shot learning benchmarks; it outperforms existing retrieval-augmented language approaches by up to +8.9% in zero-shot and +1.4% in 5-shot. https://t.co/iz7LogfqVK 9/ KOSMOG-G - a model that performs high-fidelity zero-shot image generation from generalized vision-language input that spans multiple images; extends zero-shot subject-driven image generation to multi-entity scenarios; allows the replacement of CLIP, unlocking new applications with other U-Net techniques such as ControlNet and LoRA. https://t.co/uoaSKN8yti 10/ Analogical Prompting - a new prompting approach to automatically guide the reasoning process of LLMs; the approach is different from chain-of-thought in that it doesn’t require labeled exemplars of the reasoning process; the approach is inspired by analogical reasoning and prompts LMs to self-generate relevant exemplars or knowledge in the context. https://t.co/T88jFFUBDo

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1711004647081562158/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1711004647081562158/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "nlp": {
    "processed_at": "2025-08-06T12:43:18.276475",
    "sentiment": "positive",
    "topics": [
      "LLMs",
      "Retrieval Augmentation",
      "Prompt Engineering"
    ],
    "ner": {
      "entities": [
        {
          "entity": "StreamingLLM",
          "type": "paper"
        },
        {
          "entity": "Analogical Prompting",
          "type": "paper"
        },
        {
          "entity": "The Dawn of LMMs",
          "type": "paper"
        },
        {
          "entity": "Neural Developmental Programs",
          "type": "paper"
        },
        {
          "entity": "LLMs Represent Space and Time",
          "type": "paper"
        }
      ]
    }
  },
  "score": 1.0,
  "scored_at": "2025-08-09T13:46:07.542196",
  "import_source": "manual_curation_2023",
  "score_components": {
    "author": 0.09,
    "engagement": 0.13835779330410902,
    "quality": 0.2,
    "source": 0.15,
    "nlp": 0.1,
    "recency": 0.010000000000000002
  },
  "source_tagged_at": "2025-08-09T13:42:52.557722",
  "enriched": true,
  "enriched_at": "2025-08-09T13:42:52.557724",
  "links_checked": true,
  "checked_at": "2025-08-10T10:32:29.204334",
  "original_structure": "had_media_only",
  "enhanced_from_raw_response": true,
  "enhanced_at": "2025-08-14T03:18:39.151191"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2017-07-23T09:12:45.000Z",
    "default_profile_image": false,
    "description": "Democratizing AI research, education, and technologies",
    "fast_followers_count": 0,
    "favourites_count": 1379,
    "followers_count": 42027,
    "friends_count": 1,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 830,
    "location": "",
    "media_count": 45,
    "name": "DAIR.AI",
    "normal_followers_count": 42027,
    "possibly_sensitive": false,
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1643277398522187778/31dedbLo_normal.jpg",
    "screen_name": "dair_ai",
    "statuses_count": 713,
    "translator_type": "none",
    "url": "https://t.co/W6313m1okg",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "889050642903293953"
  },
  "id": "1711004647081562158",
  "conversation_id": "1711004647081562158",
  "full_text": "Top ML Papers of the Week (Oct 2 - Oct 8):\n\n- StreamingLLM \n- Analogical Prompting\n- The Dawn of LMMs\n- Neural Developmental Programs\n- LLMs Represent Space and Time\n- Retrieval meets Long Context LLMs\n...\n\n----\n\n1/ LLMs Represent Space and Time - discovers that LLMs learn linear representations of space and time across multiple scales; the representations are robust to prompt variations and unified across different entity types; demonstrate that LLMs acquire fundamental structured knowledge such as space and time, claiming that language models learn beyond superficial statistics, but literal world models. \n\nhttps://t.co/pX76NvZPLa\n\n2/ Retrieval meets Long Context LLMs - compares retrieval augmentation and long-context windows for downstream tasks to investigate if the methods can be combined to get the best of both worlds; an LLM with a 4K context window using simple RAG can achieve comparable performance to a fine-tuned LLM with 16K context; retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes; a retrieval-augmented LLaMA2-70B with a 32K context window outperforms GPT-3.5-turbo-16k on seven long context tasks including question answering and query-based summarization. \n\nhttps://t.co/WYi90n0ULH\n\n3/ StreamingLLM - a framework that enables efficient streaming LLMs with attention sinks, a phenomenon where the KV states of initial tokens will largely recover the performance of window attention; the emergence of the attention sink is due to strong attention scores towards the initial tokens; this approach enables LLMs trained with finite length attention windows to generalize to infinite sequence length without any additional fine-tuning. \n\nhttps://t.co/Lima0M4Ctc\n\n4/ Neural Developmental Programs - proposes to use neural networks that self-assemble through a developmental process that mirrors properties of embryonic development in biological organisms (referred to as neural developmental programs); shows the feasibility of the approach in continuous control problems and growing topologies.\n\nhttps://t.co/jr6gwRv0N3\n\n5/ The Dawn of LMMs - a comprehensive analysis of GPT-4V to deepen the understanding of large multimodal models (LMMs); it focuses on probing GPT-4V across various application scenarios; provides examples ranging from code capabilities with vision to retrieval-augmented LMMs. \n\nhttps://t.co/57QsPVoGJe\n\n6/ Training LLMs with Pause Tokens - performs training and inference on LLMs with a learnable <pause> token which helps to delay the model's answer generation and attain performance gains on general understanding tasks of Commonsense QA and math word problem-solving; experiments show that this is only beneficial provided that the delay is introduced in both pertaining and downstream fine-tuning.\n\nhttps://t.co/0fJVAGXIMw\n\n7/ Recursively Self-Improving Code Generation - proposes the use of a language model-infused scaffolding program to recursively improve itself; a seed improver first improves an input program that returns the best solution which is then further tasked to improve itself; shows that the GPT-4 models can write code that can call itself to improve itself.\n\nhttps://t.co/Vzy2Db2VuL\n\n8/ Retrieval-Augmented Dual Instruction Tuning - proposes a lightweight fine-tuning method to retrofit LLMs with retrieval capabilities; it involves a 2-step approach: 1) updates a pretrained LM to better use the retrieved information 2) updates the retriever to return more relevant results, as preferred by the LM Results show that fine-tuning over tasks that require both knowledge utilization and contextual awareness, each stage leads to additional gains; a 65B model achieves state-of-the-art results on a range of knowledge-intensive zero- and few-shot learning benchmarks; it outperforms existing retrieval-augmented language approaches by up to +8.9% in zero-shot and +1.4% in 5-shot.\n\nhttps://t.co/iz7LogfqVK\n\n9/ KOSMOG-G - a model that performs high-fidelity zero-shot image generation from generalized vision-language input that spans multiple images; extends zero-shot subject-driven image generation to multi-entity scenarios; allows the replacement of CLIP, unlocking new applications with other U-Net techniques such as ControlNet and LoRA. \n\nhttps://t.co/uoaSKN8yti\n\n10/ Analogical Prompting - a new prompting approach to automatically guide the reasoning process of LLMs; the approach is different from chain-of-thought in that it doesn’t require labeled exemplars of the reasoning process; the approach is inspired by analogical reasoning and prompts LMs to self-generate relevant exemplars or knowledge in the context. \n\nhttps://t.co/T88jFFUBDo",
  "reply_count": 1,
  "retweet_count": 91,
  "favorite_count": 402,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F761gUvXwAAqKp1.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/dair_ai/status/1711004647081562158",
  "created_at": "2023-10-08T13:04:31.000Z",
  "#sort_index": "1711004647081562158",
  "view_count": 84963,
  "quote_count": 5,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/dair_ai/status/1711004647081562158"
}