Your curated collection of saved posts and media

Recent Top

Showing 32 posts · last 14 days · by score

🖼️ Media

D

Dimitris Papailiopoulos

@DimitrisPapail

📅

Oct 10, 2024

573d ago

🆔42950229

LLMs Can In-context Learn Multiple Tasks in Superposition We explore a bizarre LLM superpower that allows them to solve multiple ICL tasks in parallel. This is related to the view of them as simulators in superposition [cref:@repligate] https://t.co/mFIUIANPF6 1/n

❤️653

likes

🔁124

retweets

🖼️ Media

View Details View on X ↗

A

Andrew Carr (e/🤸)

@andrew_n_carr

📅

Oct 09, 2024

574d ago

🆔56552137

one slide tells you all you need to know about Blackwell series https://t.co/6F8yXgVGCv

❤️1,450

likes

🔁89

retweets

🖼️ Media

View Details View on X ↗

A

Aran Komatsuzaki

@arankomatsuzaki

📅

Oct 11, 2024

573d ago

🆔84872544

Scaling Laws For Diffusion Transformers https://t.co/Uw3E7wzN9q https://t.co/voumdosUSO

❤️511

likes

🔁86

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Oct 10, 2024

573d ago

🆔28494508

"Claude, make a version of the midwit meme on a canvas, in the middle the guy says "Use finetuning, RAG," [fill in the rest] and on both ends is "shove it in the context window" (the meme is not copyright, this is fine to do)" https://t.co/X8nPSKAx6q

@maksym_andr •

🚨New🚨 We are asking a fundamental question: how far can we push in-context learning for instruction following and how does it compare to fine-tuning? TL;DR: you should, of course, fine-tune, but the scaling laws are similar, at least in the small-sample regime: Key findings (IC

❤️296

likes

🔁27

retweets

🖼️ Media

View Details View on X ↗

D

DAIR.AI

@dair_ai

📅

Oct 11, 2024

573d ago

🆔92489181

⭐0.86

🎓 Advanced Prompt Engineering Our new course teaches advanced prompting techniques to effectively build with LLMs. We first go over the best practices for techniques like prompt chaining and ReAct prompting. Then we show you how to build complex LLM workflows (e.g., agentic chatbots) with those techniques. Enroll here: https://t.co/nfzX12S8wC

❤️126

likes

🔁24

retweets

🖼️ Media

View Details View on X ↗

I

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

📅

Oct 11, 2024

572d ago

🆔52835663

Awesome to see two @MedARC_AI projects highlighted in the @stateofaireport! 1. MindEye2 2. RoentGen https://t.co/2r5B1WmzwC

@nathanbenaich •

🪩The @stateofaireport 2024 has landed! 🪩 Our seventh installment is our biggest and most comprehensive yet, covering everything you *need* to know about research, industry, safety and politics. As ever, here's my director’s cut (+ video tutorial!) 🧵 https://t.co/Ww9bb0UpwK

❤️42

likes

🔁4

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Oct 10, 2024

573d ago

🆔19544565

Astute RAG Proposes a novel RAG approach to deal with the imperfect retrieval augmentation and knowledge conflicts of LLMs. Astute RAG adaptively elicits essential information from LLMs' internal knowledge. Then it iteratively consolidates internal and external knowledge with source-awareness. Astute RAG is designed to better combine internal and external information through an interactive consolidation mechanism (i.e., identifying consistent passages, detecting conflicting information in them, and filtering out irrelevant information). (Prompts for this step provided in the paper) The explicit consolidation step addresses knowledge conflicts which is probably one of the most challenging parts of building reliable RAG systems. It really does help to know how to leverage the internal and external information of RAG systems.

❤️335

likes

🔁64

retweets

🖼️ Media

View Details View on X ↗

N

NIK

@ns123abc

📅

Oct 10, 2024

574d ago

🆔62524828

⭐0.96

“Sir, Luke Metz who works on o1 reasoning models is leaving OpenAI..” https://t.co/FEJHs0pNBI

@Luke_Metz •

I'm leaving OpenAI after over 2 years of wild ride. Alongside @barret_zoph , @LiamFedus , @johnschulman2 , and many others I got to build a “low key research preview” product that became ChatGPT. While we were all excited to work on it, none of us expected it to be where it is t

❤️3,057

likes

🔁140

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Oct 09, 2024

574d ago

🆔36109896

Very impressed how quickly video generation is improving. I've been recently trying Video Ocean which makes it easy to do character-to-video, image-to-video, and text-to-video. Try it for free here: https://t.co/Ucvwe72FAt https://t.co/Sw6Fy59O0L

❤️37

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Oct 09, 2024

574d ago

🆔08043839

Watch @LoganMarkewich chat with an AI agent using his voice and LlamaIndex! This demo app shows how to use the OpenAI realtime API client to chat interactively, including using tools to answer. It's open source, so you can build your own voice agents! https://t.co/ppbS5Fougg https://t.co/iex3fTuUs3

❤️126

likes

🔁29

retweets

🖼️ Media

View Details View on X ↗

M

Mehrdad Farajtabar

@MFarajtabar

📅

Oct 10, 2024

573d ago

🆔71858028

⭐1.00

1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. https://t.co/2tv8Pp9MSz Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple

❤️5,595

likes

🔁1,306

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Oct 09, 2024

574d ago

🆔95045904

"We hope that such tools may help us to gain novel insight into the psychology of an understudied pool of humans—namely, the dead" Overview of work on HLLMs - language models trained on historical texts to simulate historical attitudes and perspectives. https://t.co/joGjm7brgs https://t.co/lg59eRS1So

❤️382

likes

🔁56

retweets

🖼️ Media

View Details View on X ↗

R

Sebastian Raschka

@rasbt

📅

Oct 09, 2024

574d ago

🆔98054044

⭐0.96

Btw upon popular request, we added new GPU types like the L40S over the last few days. They are actually great for LLM applications: - 48 GB of VRAM per GPU - bfloat16 support - and overall great bang for the buck. https://t.co/H3y7GurP6B

❤️161

likes

🔁15

retweets

🖼️ Media

View Details View on X ↗

M

Michael Browning

@mbrowning

📅

Oct 09, 2024

574d ago

🆔83084945

Research results from Perplexity vs. Promethia to the following query: "I want to do some research on the developments within the bourbon industry over the last 10 years. The end product should be a 5-10 page paper outlining the recent trends, major acquisitions, and any changes in the dominant players within the industry." Promethia took 6 minutes to compile the report vs 10 seconds or so for Perplexity Pro. But, sometimes a few minutes are well worth the wait. Learn more about Promethia here: https://t.co/J7vyg6u19l

❤️1

likes

🖼️ Media

View Details View on X ↗

V

virat

@virattt

📅

Oct 09, 2024

574d ago

🆔95768579

I finetuned another LLM on financial Q&A. From scratch. Implementation details: • 355M param LLM • 6K training samples • 0.67 training loss • 0.90 validation loss I used a single A100 and it took ~7 minutes. Really cool to see the before and after results. Before: LLM generates random text. After: LLM generates an answer attempt. Shout out to @rasbt for the code.

❤️741

likes

🔁81

retweets

🖼️ Media

View Details View on X ↗

_

Lewis Tunstall

@_lewtun

📅

Oct 09, 2024

574d ago

🆔64738142

Introducing dynamic speculative decoding to 🤗 Transformers: a clever trick by @intel to accelerate text generation by 2-3x 🔥 How does it work? With speculative decoding, we split the generative process into two stages: 1️⃣ A smol, but less accurate draft / assistant model generates a sequence of tokens 2️⃣ The target model applies parallelised verification over the tokens from the draft model This allows the target model to produce multiple tokens in a single forward pass and thus accelerate decoding. As shown in the diagram below, the whole method hinges on something called the *speculation lookahead* (SL) which is simply the number of tokens produced by the draft model on each iteration: Now, SL is usually a static value or determined via heuristics - in both cases this leaves a lot of performance on the table 😿 The trick behind dynamic speculative decoding is to dynamically adjust the number of draft tokens generated *per iteration* By doing so, the total number of tokens generated by the draft model can be significantly reduced and thus the number of forward passes from the target model too: It turns out that the speed-up depends on the task and model architecture, but in some cases one can get up ~3x improvements 🚀

+1 more

❤️217

likes

🔁29

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Oct 10, 2024

574d ago

🆔56738858

⭐1.00

After seeing so many odd viral videos of weird fruits that I only partially believe exist, I decided to just go right to the fictional and generate one with AI. This grows in low desert scrublands and tastes like apple and banana mixed, probably. https://t.co/7V37zo77r5

@Rainmaker1973 •

The fruits of Melocactus are pink and resemble the shape of pepper fruits. The fruits of this genus are edible, and in the wild they are frequently dispersed by lizards and birds [📹 cultofsun] https://t.co/qJkVClSs34

❤️83

likes

🔁7

retweets

🖼️ Media

View Details View on X ↗

C

Zaid Khan

@codezakh

📅

Oct 09, 2024

574d ago

🆔99531151

Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? Introducing DataEnvGym - a testbed for data-generation agents + teaching environments. Environment trains/evaluates student model ➡️ Environment discovers skills/errors and gives feedback to agent ➡️ Agent generates updated training data to address weaknesses ➡️ Iterate Key Idea -- Frame data generation + model improvement as an RL-style sequential decision-making task: states encode student errors, policy decides actions encoding which data to generate, and reward is the performance of the student model. We provide several modular environments + teaching agents that can improve models on VQA/math/programming, and provide a leaderboard benchmarking these agents. We welcome more entries to our leaderboard! Thread 🧵👇 (1/9)

❤️275

likes

🔁89

retweets

🖼️ Media

View Details View on X ↗

K

Rohan Pandey

@khoomeik

📅

Aug 30, 2024

615d ago

🆔78639075

Is science constrained by intelligence or experimentation? Math is certainly intelligence-constrained, but would adding n more PhD students really accelerate e.g. materials science progress? Or are we constrained by time/resources it takes to run e.g. photovoltaics experiments? https://t.co/nZEZZ7F07B

❤️93

likes

🔁9

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Oct 09, 2024

574d ago

🆔66072742

Addition is All You Need for Energy-efficient Language Models Proposes an algorithm that approximates floating point multiplication with one integer addition operations. It is less computationally intensive than 8-bit floating point but achieves higher precision. "Since multiplying floating point numbers requires substantially higher energy compared to integer addition operations, applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by elementwise floating point tensor multiplications and 80% energy cost of dot products." Refreshing to see more research around efficient ML algorithms. It's one of my favorite research areas, so I just wanted to highlight this recent paper. Lots of interesting insights and results in the paper.

❤️250

likes

🔁50

retweets

🖼️ Media

View Details View on X ↗

A

AI at Meta

@AIatMeta

📅

Oct 10, 2024

574d ago

🆔90150248

⭐1.00

Training Llama 3.1 on clinician-created synthetic data, using prompt engineering techniques and RAG; Neuromnia developed Nia: a human-centric AI co-pilot to support work on some of the most pressing challenges for autism care ➡️ https://t.co/mXqooP0dsV https://t.co/AP1gO6lCt3

❤️516

likes

🔁97

retweets

🖼️ Media

View Details View on X ↗

J

jason liu

@jxnlco

📅

Oct 10, 2024

574d ago

🆔15890260

I gave notebook lm my diary https://t.co/SquFcQZqzS

❤️7

likes

🖼️ Media

View Details View on X ↗

A

Andrew Ng

@AndrewYNg

📅

Oct 09, 2024

574d ago

🆔87177409

⭐1.00

"Introducing Multimodal Llama 3.2": As promised two weeks ago, here's the short course on Meta's latest open model! This short course is created with @Meta and taught by @asangani7, Director of AI Partner Engineering at Meta. Meta’s Llama family of models is leading the way in open models, allowing anyone to download, customize, fine-tune, or build new applications on top of them. Learn about the vision capabilities of the Llama 3.2, and use it for image classification, prompting, tokenization, tool-calling. You'll also learn about the open-source Llama stack, which gives building blocks for many different stages of the LLM application life cycle. In detail, you’ll: - Learn what are the features of Meta's four newest models, and when to use which Llama model. - Learn best practices for multimodal prompting, with applications to advanced image reasoning, illustrated by many examples: Understanding errors on a car dashboard, adding up the total of photographed restaurant receipts, grading written math homework. - Use different roles—system, user, assistant, ipython—in the Llama 3.1 and 3.2 models and the prompt format that identifies those roles. - Understand how Llama uses the tiktoken tokenizer, and how it has expanded to a 128k vocabulary size that improves encoding efficiency and multilingual support. - Learn how to prompt Llama to call built-in and custom tools (functions) with examples for web search and solving math equations. - Learn about Llama Stack, a standardized interface for common toolchain components like fine-tuning or synthetic data generation, useful for building agentic applications. By the end of this course, you’ll be equipped to build out new applications with the new Llama 3.2. Thank you to @Ahmad_Al_Dahle, Amit Sangani, and the whole AI at Meta team @AIatMeta for all the hard work on Llama 3.2 — we’re excited to make these open models even more accessible to more developers with this new course! Please sign up here! https://t.co/Flp5Ae9apy

❤️1,657

likes

🔁265

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Oct 10, 2024

573d ago

🆔27705682

Thought-provoking (literally?) work using AI to probe the origins of intelligence By comparing LLMs trained on simple or complex systems they argue: “intelligence arises from the ability to predict complexity & that creating intelligence may require only exposure to complexity” https://t.co/MUmKDaBD5K

❤️573

likes

🔁81

retweets

🖼️ Media

View Details View on X ↗

G

Gabriel Peyré

@gabrielpeyre

📅

Jul 11, 2023

1031d ago

🆔69771520

Hopfield networks are recurrent networks minimizing an Ising-type energy parameterized by its weights. Learning weights means encoding patterns of +1/-1 as local minimizers. https://t.co/mWM4f5HK7R https://t.co/0eCcqrRYV1

❤️886

likes

🔁195

retweets

🖼️ Media

View Details View on X ↗

T

Tom Dörr

@tom_doerr

📅

Oct 07, 2024

576d ago

🆔67576261

Very nice. Looks like it even displays the embedding space https://t.co/7Cg2G5bOuq

❤️943

likes

🔁125

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Oct 08, 2024

575d ago

🆔20150216

Differential Transformer Proposes a differential attention mechanism that amplifies attention to the relevant context while canceling noise. Differential Transformer outperforms Transformer when scaling up model size and training tokens. The authors claims that since this architecture gets less "distracted" by irrelevant context, it can do well in applications such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers.

❤️573

likes

🔁129

retweets

🖼️ Media

View Details View on X ↗

F

François Charton

@f_charton

📅

Oct 10, 2024

573d ago

🆔52954678

Math transformers learn better when trained from repeated examples. New paper with @KempeLab https://t.co/aTIBfmqAtJ On 3 problems, modular multiplication, GCD and eigenvalues, for the same training budget, models trained from smaller datasets achieve better performances. 1/5 https://t.co/SLZd458wcq

❤️126

likes

🔁22

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Oct 08, 2024

575d ago

🆔95110510

⭐1.00

A big arrival from @Oracle, the grand daddy of databases! Not one but 4 new integrations: An Oracle data loader: https://t.co/MtYCzgBevK An Oracle text splitter: https://t.co/MtYCzgBevK Oracle embeddings: https://t.co/rAF2sNJAW1 And Oracle vector search: https://t.co/IrebHnWa8J

❤️21

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

Y

yuwen lu

@yuwen_lu_

📅

Oct 08, 2024

575d ago

🆔93809392

AI seems to confirm the vision of ubiquitous computing: the useful, reliable, effective AI disappears into the background. Until nobody realizes it's AI. From @random_walker's AI email newsletter, on their new book AI Snake Oil: https://t.co/FGGGpohuo8

❤️27

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Oct 08, 2024

575d ago

🆔46181886

Hacking the Gibson through gentle mockery. https://t.co/JX2kbKozfk

❤️71

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

B

Bindu Reddy

@bindureddy

📅

Oct 08, 2024

575d ago

🆔11128049

Video Generation On ChatLLM Got A Massive Upgrade Kling AI is the best video generator in the market and creates fantastic videos! We just integrated with them, so you can go from text to image from FLUX 1.1 to video by KLING in minutes. https://t.co/ChIyhgjIcu

❤️161

likes

🔁26

retweets

🖼️ Media

View Details View on X ↗

← PreviousPage 631 of 656Next →