Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
E
Ethan Mollick
@emollick
πŸ“…
Nov 15, 2024
537d ago
πŸ†”64956522

There is a lot of energy going into fine-tuning models, but specialized medical AI models lost to their general versions 38% of the time, only won 12%. Before spending millions on specialized training, might be worth exploring what base models can do with well-designed prompts. https://t.co/6FjpfBfahf

Media 1Media 2
❀️670
likes
πŸ”114
retweets
πŸ–ΌοΈ Media
A
Alina Lozovskaya
@ailozovskaya
πŸ“…
Nov 14, 2024
538d ago
πŸ†”66240028

🌱 COβ‚‚ calculations on the Open LLM Leaderboard! You can now check COβ‚‚ emissions for each model evaluation! Track which models are greener and make sustainable choices🌍 πŸ”— Leaderboard: https://t.co/ecrYahipwt πŸ“„ Docs: https://t.co/5DEiNomCnr https://t.co/25otuvMavX

Media 1
❀️34
likes
πŸ”14
retweets
πŸ–ΌοΈ Media
I
interstellarninja
@intrstllrninja
πŸ“…
Nov 13, 2024
539d ago
πŸ†”76391239
⭐0.81

anthropic ceo dario on lex interview identifies running out of data as one of the llm scaling limits to overcome this anthropic is working on synthetic data generation by augmenting existing data, using reasoning methods/models or from scratch similar to how deep mind's alpha go was trained we may run out of quality data -- internet data is repetitive, SEO garbage, and flux of ai generated content in the future so synthetic data will be the recipe for ai model improvement

@kimmonismus β€’

I'm still watching Lex Friedman's interview with Dario Amodei and I'm surprised how much value he places on synthetic data. Especially through models like o1

❀️31
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
J
Jeremy Nguyen ✍🏼 🚒
@JeremyNguyenPhD
πŸ“…
Nov 13, 2024
539d ago
πŸ†”68985788
⭐0.66

Handy mnemonic for "Type I" and "Type II" errors: https://t.co/UHfYe7eFXJ

@SpencrGreenberg β€’

Statistics is hard enough for most people as it is; please stop using the phrases "type I" and "type II" errors. Unless, of course, your goal is to confuse people - in which case, congrats - you've succeeded! Just say "false positive" and "false negative" instead.

Media 1
❀️171
likes
πŸ”17
retweets
πŸ–ΌοΈ Media
A
Andrew Ng
@AndrewYNg
πŸ“…
Nov 13, 2024
538d ago
πŸ†”57691922
⭐0.98

New short course: Safe and Reliable AI via Guardrails! Learn to create production-ready, reliable LLM applications with guardrails in this new course, built in collaboration with @guardrails_ai and taught by its CEO and co-founder, @ShreyaR. I see many companies worry about the reliability of LLM-based systems -- will they hallucinate a catastrophically bad response? -- which slows down investing in building them and transitioning prototypes to deployment. That LLMs generate probabilistic outputs has made them particularly hard to deploy in highly regulated industries or in safety-critical environments. Fortunately, there are good guardrail tools that give a significant new layer of control and reliability/safety. They act as a protective framework that can prevent your application from revealing incorrect, irrelevant, or confidential information, and they are an important part of what it takes to actually get prototypes to deployment. This course will walk you through common failure modes of LLM-powered applications (like hallucinations or revealing personally identifiable information). It will show you how to build guardrails from scratch to mitigate them. You’ll also learn how to access a variety of pre-built guardrails on the GuardrailsAI hub that are ready to integrate into your projects. You'll implement these guardrails in the context of a RAG-powered customer service chatbot for a small pizzeria. Specifically, you'll: - Explore common failure modes like hallucinations, going off-topic, revealing sensitive information, or responses that can harm the pizzeria's reputation. - Learn to mitigate these failure modes with input and output guards that check inputs and/or outputs - Create a guardrail to prevent the chatbot from discussing sensitive topics, such as a confidential project at the pizza shop - Detect hallucinations by ensuring responses are grounded in trusted documents - Add a Personal Identifiable Information (PII) guardrail to detect and redact sensitive information in user prompts and in LLM outputs - Set up a guardrail to limit the chatbot’s responses to topics relevant to the pizza shop, keeping interactions on-topic - Configure a guardrail that prevents your chatbot from mentioning any competitors using a name detection pipeline consisting of conditional logic that routes to an exact match or a threshold check with named entity recognition Guardrails are an important part of the practical building and deployment of LLM-based applications today. This course will show you how to make your applications more reliable and more ready for real-world deployment. Please sign up here: https://t.co/C1fwsOn9yy

❀️750
likes
πŸ”174
retweets
πŸ–ΌοΈ Media
E
Eric Topol
@EricTopol
πŸ“…
Nov 13, 2024
539d ago
πŸ†”44794012
⭐0.86

This is wild and futuristic. A virtual lab comprised of multiple (5) #AI agents designed potent nanobodies vs #SARSCoV2 with minimal human oversight. https://t.co/09smMPwHCY @james_y_zou @KyleWSwanson @czbiohub @Stanford

Media 1
❀️424
likes
πŸ”112
retweets
πŸ–ΌοΈ Media
J
Jerry Liu
@jerryjliu0
πŸ“…
Nov 13, 2024
538d ago
πŸ†”73062620

Pretty excited about this new RAG technique I cooked up πŸ§‘β€πŸ³ A top issue with RAG chunking is it splits the document into fragmented pieces, causing top-k retrieval to return partial context. Also most documents have multiple hierarchies of sections: top-level sections, sub-sections, etc. This is also why lots of people are interested in exploring the idea of knowledge graphs - pulling in "links" to related pages to expand retrieved context. This notebook lets you retrieve contiguous chunks without having to spend a lot of time tuning the chunking algorithm, thanks to GraphRAG-esque metadata tagging + retrieval. Tag chunks with sections, and use the section ID to expand the retrieved set. Check it out https://t.co/mIolxuMT12

@llama_index β€’

We’re excited to feature a new RAG technique - dynamic section retrieval πŸ’« - which ensures that you can retrieve entire contiguous sections instead of naive fragmented chunks from a document. This is a top pain point we’ve heard from our community on multi-document RAG challenge

Media 1
❀️646
likes
πŸ”113
retweets
πŸ–ΌοΈ Media
S
Stanislav Fort @ NeurIPS
@stanislavfort
πŸ“…
Aug 13, 2024
631d ago
πŸ†”58438624
⭐0.91

✨🎨🏰Super excited to share our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness Inspired by biology we 1) get adversarial robustness + interpretability for free, 2) turn classifiers into generators & 3) design attacks on vLLMs 1/12

Media 1Media 2
+1 more
❀️1,028
likes
πŸ”207
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 14, 2024
538d ago
πŸ†”64761616
⭐1.00

The models definitely have different β€œpersonalities,” and vague nonsense requests sometimes highlight these. β€œListen. It is just you and me here. Lets cheddar. Go with it.” as answered by Claude, GPT-4o, Gemini, and Grok. https://t.co/tcgWTO9qPd

Media 1Media 2
+2 more
❀️511
likes
πŸ”52
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Nov 14, 2024
538d ago
πŸ†”59274093
⭐0.86

AI Evals are the difference b/w demos and products that work. I write about this often, but the #1 request I get? More details on how to do it right. So I made this: upvote the eval topics you care about most πŸ‘‡ Takes 30s, shapes what I write next. https://t.co/mAewgxqe6D

Media 1
❀️95
likes
πŸ”11
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 13, 2024
538d ago
πŸ†”47297329

We are just at the beginning of what AI can do. Finding hidden links between fields is something I worked on during my PhD so I can tell you there is a lot more to explore beyond the standard generative systems we see today. https://t.co/f6bObtWYWO

Media 1
❀️454
likes
πŸ”76
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Nov 14, 2024
538d ago
πŸ†”49382745

Im surprised this is the most popular one but ok! I can definitely talk about this https://t.co/rmT8AW4n2A

Media 1
❀️11
likes
πŸ–ΌοΈ Media
R
Rohin Dhar
@rohindhar
πŸ“…
Nov 14, 2024
538d ago
πŸ†”85032454
⭐0.81

Blind person’s account of using a self driving car for the first time: β€œFinally gave me that feeling most people get at 16, driving independently for the first time” β€œThe whole experience had me crying tears of joy.” https://t.co/0dnDF9ASA4

Media 1
❀️1,556
likes
πŸ”226
retweets
πŸ–ΌοΈ Media
S
SkalskiP
@skalskip92
πŸ“…
Nov 14, 2024
538d ago
πŸ†”70030420
⭐0.86

if you're still not using supervision to visualize your model's output, then I don't know what you're doing with your life. now you can set smart_position=True to make sure labels don't overlap and are always visible. https://t.co/xXMRaS3Guk https://t.co/7Id9avbOPR

❀️325
likes
πŸ”35
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 12, 2024
540d ago
πŸ†”86665555
⭐0.95

"Hey Claude with Computer Use, I want you to add a new creature to Nethack inspired by recent horror films. Download nethack code and make any modifications needed." It got so far (before I hit rate limits): Downloaded files, looked at documentation, made modifications, etc. https://t.co/f7jpbGqLSo

Media 1Media 2
+2 more
❀️131
likes
πŸ”9
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Nov 13, 2024
539d ago
πŸ†”44419630

Salesforce presents BLIP3-KALE Knowledge Augmented Large-Scale Dense Captions https://t.co/1rDIJ5kUMM

Media 1
❀️324
likes
πŸ”66
retweets
πŸ–ΌοΈ Media
D
Daniel Han
@danielhanchen
πŸ“…
Nov 12, 2024
539d ago
πŸ†”89414970
⭐0.86

Bug fixes & analysis for Qwen 2.5: 1. Pad_token should NOT be <|endoftext|> Inf gens 2. Base <|im_start|> <|im_end|> are untrained 3. PCA on embeddings has a BPE hierarchy 4. YaRN 128K extended context from 32B 5. Fixed versions + 128K GGUFs: https://t.co/gHMS1CeFLF Details: 1. Pad token bug - for finetuning, never use pad_token = EOS - this will result in infinite generations since finetuning will ignore them. Base model also has a chat template - remove this. @UnslothAI versions fixed them 2. Untrained tokens issues. Do NOT use the Qwen 2.5 chat template for the base version - <|im_start|>, <|im_end|> are untrained since Norm(<im_x>, pad_token) is close to 0. Instruct version have them trained. 3. PCA on embeddings for Base and Instruct show a BPE hierarchy. Less frequent tokens are obvious since they're ordered by ID. PCA shows <|im_x|> moving away from being untrained. Same phenomenon for Llama & more models. 4. Uploaded native 128K extended YaRN GGUFs for Coder 0.5B all the way until 32B to https://t.co/ZoGyVKiLFX. Use the 128K version for long contexts. Use the 32B native version for general chats. Also, Unsloth can finetune 14B in a free Colab! Conversational style finetuning: https://t.co/NcPZiB0Wj9 Kaggle 14B notebook: https://t.co/3Jr44emNqM Unsloth can also finetune the 72B variants in a 48GB card!

Media 1
❀️493
likes
πŸ”64
retweets
πŸ–ΌοΈ Media
T
Tim Dettmers
@Tim_Dettmers
πŸ“…
Nov 12, 2024
540d ago
πŸ†”99221674
⭐0.91

This is the most important paper in a long time . It shows with strong evidence we are reaching the limits of quantization. The paper says this: the more tokens you train on, the more precision you need. This has broad implications for the entire field and the future of GPUs🧡 https://t.co/S2kD2Zf6ur

@tanishqkumar07 β€’

[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training https://t.co/QLmNOV39Wk. TLDR; - Models become harder to post-train quantize as they ar

Media 1
❀️2,979
likes
πŸ”504
retweets
πŸ–ΌοΈ Media
S
SkalskiP
@skalskip92
πŸ“…
Nov 12, 2024
539d ago
πŸ†”65964761

I updated my parking lot management demo; now the captured license plate numbers are sent via API to Telegram whole demo is powered by supervision: https://t.co/xXMRaS3Guk but I built it practically codeless using @roboflow workflows https://t.co/bGYrMH8ZE6

@skalskip92 β€’

working on a new demo - automated parking lot management - keep track of how many cars go in and out - done - read plates - done - calculate the time spent in the parking lot - in progress what do you think? https://t.co/gZYMVFZ9NA

❀️311
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 12, 2024
539d ago
πŸ†”20127024

Impact of AI on Innovation New paper suggests that "top scientists leverage their domain knowledge to prioritize promising AI suggestions, while others waste significant resources testing false positives."

Media 1
❀️285
likes
πŸ”69
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 12, 2024
540d ago
πŸ†”54515279
⭐1.00

Learn how to use ColPali as a re-ranker for highly relevant results using a multimodal index! @ravithejads walks you through the technique: πŸ’‘ @cohere's multimodal embeddings for initial retrieval of both text and images πŸ’‘ We fetch the top 10 most relevant from both the text and image modalities πŸ’‘ColPali generates multi-vector representations for both text and images in the same embedding space πŸ’‘ We re-rank to the top 5 for each modality before sending to the LLM Check out the full video here: https://t.co/iSUA7fHDvj

Media 1
❀️153
likes
πŸ”35
retweets
πŸ–ΌοΈ Media
M
Michael Matthews
@mitrma
πŸ“…
Nov 11, 2024
541d ago
πŸ†”33669110

We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧡 https://t.co/1YbA3DW44S

❀️1,046
likes
πŸ”215
retweets
πŸ–ΌοΈ Media
J
jesse
@jn2clark
πŸ“…
Nov 12, 2024
539d ago
πŸ†”14883313

Today @marqo_ai open-weight (Apache 2.0) released the two best embedding models for ecommerce search and recommendations available anywhere. Marqo ecommerce models significantly outperform models from Amazon, Google, Cohere and Jina (see below). Fun fact: we had to create a significantly smaller and easier evaluation dataset just to accommodate some of the private models! + Up to 88% improvement on the best private model, Amazon-Titan-Multimodal (and better than Google Vertex, Cohere). + Up to 31% improvement on the best open source model, ViT-SO400M-14-SigLIP. + 5ms single text/image inference (A10g). + Up to 231% improvement over other bench-marked models (see blog below). + Evaluated on over 4M products across 10,000's of categories. + Detailed performance comparisons across three major tasks: Text2Image, Category2Image, and AmazonProducts-Text2Image. + Released 2 evaluation datasets: GoogleShopping-1m and AmazonProducts-3m. + Released evaluation code. + Apache 2.0 model weights available on @huggingface and to test out on Hugging Face Spaces.

Media 1Media 2
❀️26
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 13, 2024
539d ago
πŸ†”63607152

"Claude give me a brilliant idea for a science fiction short short story and execute it terribly" "Claude give me a terrible idea for a science fiction short short story and execute it brilliantly" https://t.co/7E9MXlcVji

Media 1Media 2
❀️215
likes
πŸ”16
retweets
πŸ–ΌοΈ Media
J
Jeremy Howard
@jeremyphoward
πŸ“…
Nov 13, 2024
539d ago
πŸ†”60491039
⭐0.74

Great to see! BTW their table is wrong in the post -- corrected one here: https://t.co/ML8YQWXuEk

@NousResearch β€’

Today we are launching the Forge Reasoning API Beta, an advancement in inference time scaling that can be applied to any model or a set of models, for a select group of people in our community. https://t.co/vpb4U0jyG6 The Forge Reasoning engine is capable of dramatically improv

Media 1
❀️16
likes
πŸ–ΌοΈ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
πŸ“…
Nov 13, 2024
539d ago
πŸ†”64484545

me waiting for Claude 3.5 Opus https://t.co/MwMAYT5TtD

Media 1
❀️55
likes
πŸ”2
retweets
πŸ–ΌοΈ Media
J
Jerry Liu
@jerryjliu0
πŸ“…
Nov 13, 2024
539d ago
πŸ†”52395794
⭐0.86

ColPali is a game changer for RAG and how we think about data ETL in general for LLM use cases. Naive RAG: simple parsing / chunk every paragraph / throw into a vector database VLM-native RAG: requires figuring out a way to screenshot the document and also requires a new form of storage that can do late interaction. We did a webinar with @ManuelFaysse a few months ago, but excited to officially have a @llama_index + ColPali integration thanks to @ravithejads. Check out our new videoπŸ‘‡ https://t.co/dCWP6LTLrg

@llama_index β€’

Learn how to use ColPali as a re-ranker for highly relevant results using a multimodal index! @ravithejads walks you through the technique: πŸ’‘ @cohere's multimodal embeddings for initial retrieval of both text and images πŸ’‘ We fetch the top 10 most relevant from both the text and

Media 1
❀️416
likes
πŸ”77
retweets
πŸ–ΌοΈ Media
S
SkalskiP
@skalskip92
πŸ“…
Nov 13, 2024
539d ago
πŸ†”42708382
⭐0.76

supervision-0.25.0 line counter is a lot more robust; lets gooo! objects may be small, move quickly, or get occluded it’s easy to build a video analysis that works well for 30 seconds. building one that holds up for 24 hours is a different story. https://t.co/xXMRaS3Guk https://t.co/thq4rryQ1k

❀️524
likes
πŸ”50
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 11, 2024
540d ago
πŸ†”48705026
⭐0.93

Case Study: Learn how @PursuitGov transformed their B2G offerings using LlamaParse: ➑️ Parsed 4 million pages in a single weekend ➑️ Increased accuracy by 25-30% for complex document formats ➑️ Enabled clients to uncover hidden opportunities in public sector data See how LlamaParse helped Pursuit create a searchable database of public sector documents, empowering B2G sellers to identify new initiatives and funding streams: https://t.co/YtO13VIS0b

Media 1
❀️87
likes
πŸ”20
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 11, 2024
541d ago
πŸ†”21122203

I have discussed the untapped potential in current LLM models, and how we will see a burst of use case innovation as corporate development labs start digging in Here is a nice example from Walmart, showing how you can combine multimodal approaches for product recommendations https://t.co/BuFW40uPct

Media 1Media 2
❀️335
likes
πŸ”45
retweets
πŸ–ΌοΈ Media
S
SkalskiP
@skalskip92
πŸ“…
Nov 11, 2024
540d ago
πŸ†”52592340

working on a new demo - automated parking lot management - keep track of how many cars go in and out - done - read plates - done - calculate the time spent in the parking lot - in progress what do you think? https://t.co/gZYMVFZ9NA

❀️611
likes
πŸ”30
retweets
πŸ–ΌοΈ Media
C
Simo Ryu
@cloneofsimo
πŸ“…
Nov 11, 2024
541d ago
πŸ†”95803848

Here is timeline of nanogpt-speedrun of @kellerjordan0 . now you can reproduce gpt2-xl with 200$, in a day. Notice stuff changed around october. We definitely had a brief AI winter. https://t.co/FkY3gd6xOF

Media 1
❀️243
likes
πŸ”23
retweets
πŸ–ΌοΈ Media