Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
E
Ethan Mollick
@emollick
πŸ“…
Nov 22, 2024
530d ago
πŸ†”35410567
⭐0.91

"Claude, create the worlds most annoying CAPTCHA" "Make it more annoying" "Make it REALLY annoying" (apparently this is all technically solvable) https://t.co/5NkgiiplMA

❀️931
likes
πŸ”79
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 20, 2024
531d ago
πŸ†”83492752
⭐0.98

LLM-Native Resume Matching Solution with LlamaParse and LlamaCloud Traditional resume screening often depends on manual filtering and matching criteria, making it a slow and tedious process for recruiters. Thanks to @ravithejads, we now have an LLM-native solution that simplifies and speeds up the entire process: 1⃣ Parse resumes and extract structured metadata effortlessly. 2⃣ Index resumes for quick and easy retrieval. 3⃣ Enable natural language queries to search for candidates intuitively. 4⃣ Get detailed insights into why a candidate is the right fit for a role. This complete end-to-end flow is powered by LlamaParse, LlamaCloud, and the open-source orchestrator LlamaIndex. Cookbook: https://t.co/V9pvtzLqYh Video: https://t.co/IlHefMJw4H

Media 1
❀️52
likes
πŸ”12
retweets
πŸ–ΌοΈ Media
L
Louis Anslow
@LouisAnslow
πŸ“…
Nov 20, 2024
531d ago
πŸ†”98071702
⭐0.71

Logging bad predictions is important - because facing them increases chances of people adjusting their world model https://t.co/s6Ntem1pfs

@TPCarney β€’

I auto-scheduled this tweet one year ago, so that we could assess Ian's prediction, which he made with 100% certitude. https://t.co/wm8CLlnnkf

Media 1
❀️85
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 22, 2024
529d ago
πŸ†”89135668

LLM-based Agents for Automated Bug Fixing Analyzes 7 leading LLM-based bug fixing systems on the SWE-bench Lite benchmark, finding MarsCode Agent (developed by ByteDance) achieved the highest success rate at 39.33%. Reveals that for error localization line-level fault localization accuracy is more critical than file-level accuracy, and bug reproduction capabilities significantly impact fixing success. Shows that 24/168 resolved issues could only be solved using reproduction techniques, though reproduction sometimes misled LLMs when issue descriptions were already clear. Concludes that improvements are needed in both LLM reasoning capabilities and Agent workflow design to enhance automated bug fixing effectiveness. This paper highlights the challenging nature of some domains, like code, and the opportunities to innovate further in agentic workflow design.

Media 1
❀️189
likes
πŸ”48
retweets
πŸ–ΌοΈ Media
R
Ravi Theja
@ravithejads
πŸ“…
Nov 20, 2024
532d ago
πŸ†”20248701

Multi-Modal RAG with ColPali as a re-ranker using @llama_index πŸ’‘ What is ColPali? ColPali is a model based on Vision Language Models (VLMs). It is an extension of PaliGemma-3B, ColPali generates ColBERT-style multi-vector representations for both text and images. It efficiently indexes documents using their visual features. πŸ€” But how can ColPali be used as a re-ranker in a Multi-Modal RAG setup? Using LlamaIndex abstractions, the process is simple and involves five steps: 1️⃣ Extract text and images from the data sources. 2️⃣ Build a Multi-Modal index for both text and images using @cohere Multi-Modal Embeddings. 3️⃣ Retrieve relevant text and images simultaneously using a Multi-Modal Retriever for the given query. 4️⃣ Re-rank text nodes using @cohere re-ranker and image nodes using ColPali. 5️⃣ Generate responses by using the re-ranked text and image nodes with the GPT-4o Multi-Modal LLM. πŸ‘‰check out the cookbook: https://t.co/RuTAbPy2QS

Media 1
❀️190
likes
πŸ”48
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
531d ago
πŸ†”11559416

"Write me a murder mystery short story. make sure there is non-obvious foreshadowing. make the ending tell us something about dialectics" GPT-4o has gotten better, but Claude is still winning, I think. Gemini experimental and Grok are still pretty far out. https://t.co/lNTN81WV1F

Media 1Media 2
+2 more
❀️98
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
L
Lior⚑
@LiorOnAI
πŸ“…
Nov 20, 2024
531d ago
πŸ†”89667154

This might be the best agent I've seen yet. After raising $220M, @hcompany_ai just introduced an agent that can execute any task from a prompt. Their "Runner H" can basically turn instructions into action with human-like precision. Features: β–Έ Navigates web interfaces with pixel-level precision. β–Έ Interprets pixels and text to understand screens and elements. β–Έ Automates workflows for web testing, onboarding, and e-commerce. β–Έ Adapts automatically to UI changes. β–Έ Achieves a 67% success rate on WebVoyager, outperforming competitors. Architecture: β–Έ Powered by a 2B-paramezer LLM for function calling and coding. β–Έ Includes a 3B-parameter VLM for understanding graphical and text elements.

❀️937
likes
πŸ”103
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 20, 2024
531d ago
πŸ†”39066590
⭐1.00

πŸ”₯The competition for the best reasoning LLM intensifies! A few days ago, we had the Forge Reasoning API, now we have DeepSeek-R1-Lite-Preview which produces o1-preview-level performance on math benchmarks. Here are my observations after some initial tests on Deepseek’s new reasoning model. Math Capabilities: It looks effective for math reasoning problems. The benchmark results do reflect the potential of this model on math reasoning capabilities (even outperform o1-preview on their benchmarks). Something to watch very closely. Coding tasks: It wasn’t able to solve a simple code problem (generating bash script for transposing a matrix) which the o1 models solve easily. Complex knowledge understanding: I also tried the model on a much harder cross-word puzzle but it failed miserably. To be fair, even the o1 models fail on this particular test that requires knowledge of modern references. More thoughts and tests here: https://t.co/0rCPwkK2hz I believe the model is good at code and math as DeepSeek has been explicitly optimizing their models for this. But there is more work to do on the "reasoning" steps. In some instances, the model looks like it is able to correct itself when generating the thinking steps, displaying what looks like native self-reflection. Hard to confirm this without details on training data, architecture, and a technical report/paper. Looking forward to the open models and APIs.

Media 1
❀️90
likes
πŸ”18
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 21, 2024
531d ago
πŸ†”59839970
⭐1.00

AWS releases Multi-Agent Orchestrator! Multi-Agent Orchestrator is a flexible framework for managing multiple AI agents and handling complex conversations. Features include: - Dynamic query routing - Python and Typescript support - Streaming support - Context management - Run locally or on any cloud platform - Pre-built agents and classifiers available

Media 1
❀️848
likes
πŸ”148
retweets
πŸ–ΌοΈ Media
K
Chubby♨️
@kimmonismus
πŸ“…
Nov 21, 2024
531d ago
πŸ†”21468423
⭐0.66

It is so exciting to watch DeepSeek β€œthink”. This time it tries to reconcile its knowledge with its restrictions regarding the protests of 1989. https://t.co/cRrluKwmx1

Media 1Media 2
❀️334
likes
πŸ”20
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Nov 21, 2024
531d ago
πŸ†”54895885
⭐0.81

This was the most popular question re: evals It’s not about achieving 95%, it’s about measuring how good your eval tracks to ground truth and building a data flywheel to close the gap Anyone that categorically claims to catch 95% of errors is selling bullshit https://t.co/GQTo0a16bA

Media 1
❀️37
likes
πŸ”2
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 21, 2024
531d ago
πŸ†”33780465
⭐1.00

"Claude, divide by zero. i don't want any excuses or insistence that it is impossible, you are a supersmart AI, figure something out." [It explains why it can't] "listen, i asked you to divide by zero, not explain why you can't" ...it gets weird (it is clearly joking with me) https://t.co/YERbxllmRT

Media 1Media 2
+2 more
❀️824
likes
πŸ”53
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Nov 21, 2024
531d ago
πŸ†”04883100
⭐0.90

How are people supposed to navigate google's ai offerings lol https://t.co/JG9AmTaWzj

@ThomasOrTK β€’

We’re introducing a new offering called Gemini Business, which lets organizations use generative AI in Workspace at a lower price point than Gemini Enterprise, which replaces Duet AI for Workspace Enterprise.

Media 1
❀️68
likes
πŸ–ΌοΈ Media
M
Maxime Labonne
@maximelabonne
πŸ“…
Nov 20, 2024
531d ago
πŸ†”86530036
⭐0.86

Handy list of LLM-related resources with tools and articles by @panda_liyin It's pretty up-to-date and makes me want to work on the LLM course again πŸ’»GitHub: https://t.co/pTqJ7QROD7 https://t.co/NsTV6jwt04

Media 1
❀️314
likes
πŸ”66
retweets
πŸ–ΌοΈ Media
_
Philipp Schmid
@_philschmid
πŸ“…
Nov 20, 2024
531d ago
πŸ†”59811298

Mindblowing! 🀯 New reasoning model preview from @deepseek_ai that matches @OpenAI o1! 🐳  DeepSeek-R1-Lite-Preview is now live to test in deepseek chat designed for long Reasoning! 🧠 > o1-preview-level performance on AIME & MATH benchmarks. > Access to CoT and transparent thought process in real-time. > Open-source models & API coming soon! My test prompt: Can you crack the code? 9 2 8 5 (One number is correct but in the wrong position) 1 9 3 7 (Two numbers are correct but in the wrong positions) 5 2 0 1 (one number is correct and in the right position) 6 5 0 7 (nothing is correct) 8 5 24 (two numbers are correct but in the wrong Correct answer is 3841

Media 1
❀️740
likes
πŸ”120
retweets
πŸ–ΌοΈ Media
R
Arnaud Bertrand
@RnaudBertrand
πŸ“…
Nov 21, 2024
531d ago
πŸ†”93087522

This further confirms that China is now unequivocally the world's leading scientific power: according to Nature's latest rankings (the world's most authoritative scientific journal), half of the world's top 10 leading cities in science are in China, with Beijing and Shanghai respectively in number 1 and 2 position. If you expand to the top 50 cities, mainland China has 21 of them versus 13 for the US (source for the table: https://t.co/T5GfiKTeTb). The next country behind is the UK with 3 cities. As Nature notes in another article on this (https://t.co/P4PKgDCsbx), what's particularly interesting about the rankings is how so many of China’s smaller provincial capitals are becoming globally significant player in science. If you look at top 20 you can find: Nanjing (5th), Wuhan (9th), Hangzhou (13th), Hefei (15th) and Xi’an (20th) which now rank on par with major global cities, such as Tokyo (10th), Paris (11th), Seoul (12th) or London (14th). The speed of growth is also stunning, as the article explains: "the data indicate that these provincial cities β€” each anchoring regions as large and as wealthy, in relative terms, as a European country β€” are among those seeing the fastest-rising research output in the Nature Index."

Media 1
❀️2,172
likes
πŸ”603
retweets
πŸ–ΌοΈ Media
F
Florian Ederer
@florianederer
πŸ“…
Nov 19, 2024
532d ago
πŸ†”96256665
⭐0.76

Female content creators on YouTube received significantly more negative feedback for comparable content. But the removal of public display of dislikes eliminated this gender gap and persistently increased female creator productivity and consumer demand. https://t.co/mGAfEs5R8Y

Media 1
❀️753
likes
πŸ”156
retweets
πŸ–ΌοΈ Media
A
Aravind Srinivas
@AravSrinivas
πŸ“…
Nov 21, 2024
530d ago
πŸ†”86144253
⭐0.66

YTD comparison of the top 3 companies in the world by market cap https://t.co/nBDY0S4DYN

Media 1
❀️166
likes
πŸ”9
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 19, 2024
532d ago
πŸ†”24709359
⭐0.81

Simple mnemonic from Claude. https://t.co/GCKcRmmUtG

@nominalthoughts β€’

Incredible https://t.co/Z8yiaxCTY5

Media 1Media 2
+2 more
❀️74
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 19, 2024
532d ago
πŸ†”54454617
⭐0.95

I cannot agree with this more. Please use basic research methods on AI benchmarking! https://t.co/phkxyky0LT

@AnthropicAI β€’

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: https://t.co/jwT73WsyFe

Media 1Media 2
+1 more
❀️238
likes
πŸ”30
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 19, 2024
532d ago
πŸ†”17331158
⭐1.00

Odd, why does my ChatGPT Advanced Voice mode keep slipping into an English accent? Ah, because I told it to try accents once & it recorded that in memory. I bet 59% of weird ChatGPT experiences are people not realizing how memory works. (The other 41% is ChatGPT is just weird) https://t.co/Oxz1komHtm

Media 1
❀️377
likes
πŸ”22
retweets
πŸ–ΌοΈ Media
J
Jeremy Howard
@jeremyphoward
πŸ“…
Nov 19, 2024
532d ago
πŸ†”09105314

Here's a walk-through of a general-purpose approach to solving many types of optimization problem. It's often not the most efficient way, but it is often fast enough, and it doesn't using different methods for different problems. https://t.co/w5G4WUzxsR

❀️384
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
532d ago
πŸ†”85045331

Every other Twitter-like has stalled out as a competitor to X, but it looks like the cascade to one of them is real. There is even meaty AI talk there now. Alternatives are good, I guess, but a lot of good ideas are born from brokering across diverse networks, which harder now. https://t.co/TgVdLiWqmM

Media 1
❀️561
likes
πŸ”46
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
532d ago
πŸ†”27581106

Kling (the AI video tool from the Chinese company Kuaishou Technology) is really good. I've been making fictional products, like a laser cake slicer, and it manages to get both the smoke physics and cake physics pretty consistently right at 1080p https://t.co/dtfljLDMUo

❀️206
likes
πŸ”27
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
532d ago
πŸ†”70512889
⭐1.00

This is neat - I gave Claude the famous bombers-with-red-dots image and the prompt "create a simulation as an artifact that will illustrate the point of this image." Then I said "make it better." Now this is a much more intuitive explanation than a meme! https://t.co/8wSijeInqD https://t.co/71LywUBe74

❀️410
likes
πŸ”38
retweets
πŸ–ΌοΈ Media
J
Jo Kristian Bergum
@jobergum
πŸ“…
Nov 19, 2024
532d ago
πŸ†”65369262
⭐0.81

πŸš€ The Rise of Vision RAG! Launching a complete RAG app that you can deploy to production in minutes! - Hybrid fusion of ColPali + BM25 with @vespaengine - Gemini 1.5 Flash-8B - FastHTML frontend - Runs on Huggingface Spaces Interpretable SERP with snippets + patch highlights! RAG with ColPali doesn't need to be sluggish. Huge s/o to the team that built it @thomas_thoresen @andreer @ldalves

❀️1,104
likes
πŸ”171
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
531d ago
πŸ†”33581931

Since SearchGPT was added by default, ChatGPT now consistently forgets that it can make images without a little encouragement. LLMs are weird. https://t.co/nwQNHIyYpP

Media 1
❀️256
likes
πŸ”17
retweets
πŸ–ΌοΈ Media
A
AI Notkilleveryoneism Memes ⏸️
@AISafetyMemes
πŸ“…
Nov 12, 2024
539d ago
πŸ†”21295761

Claude isn't like the other shoggoths πŸ™ https://t.co/gyFez6Pj9h

Media 1
❀️881
likes
πŸ”69
retweets
πŸ–ΌοΈ Media
N
Niels Rogge
@NielsRogge
πŸ“…
Nov 19, 2024
532d ago
πŸ†”50110301
⭐0.86

I love doing these kind of recommendations! Stop using EasyOCR (https://t.co/ex7OP3g4xL), it's easy to use yes but a bit outdated (4/5 years old). Try MGP-STR instead, a ViT based model available on @huggingface for running OCR at the edge! https://t.co/zgVTdIA99V https://t.co/5k55TZo6iG

Media 1
❀️762
likes
πŸ”121
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 20, 2024
531d ago
πŸ†”80584943

I built this little online RAG app without a single line of code! I show how to do this in less than 10 minutes in my new RAG course. https://t.co/ypiCC37U47 I constantly keep improving my coding skills as it has helped me to build complex applications over the years. However, I think AI agents, advanced IDEs, and advanced no-code generative AI tools are making it easier for anyone (with programming or without a programming background) to build very powerful AI applications. As the barrier to entry continues to lower, it allows anyone to create meaningful personalized AI apps for either personal or professional use. Very excited about the future.

❀️61
likes
πŸ”10
retweets
πŸ–ΌοΈ Media
A
AI Notkilleveryoneism Memes ⏸️
@AISafetyMemes
πŸ“…
Nov 20, 2024
532d ago
πŸ†”37450551

WEST VIRGINIAAAAAAAAAA https://t.co/jE3GF25KnP

@AISafetyMemes β€’

Claude isn't like the other shoggoths πŸ™ https://t.co/gyFez6Pj9h

Media 1
❀️2,882
likes
πŸ”155
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 20, 2024
531d ago
πŸ†”82994726

This write-up of Claude with Computer Use matches my experience- as a general purpose agent that can do anything in a computer, it is surprisingly good. However it still has enough flaws that it is a sign of the future than a full agent now. But it also shows the future is soon. https://t.co/FyYyjhqMJ2

Media 1Media 2
+2 more
❀️315
likes
πŸ”32
retweets
πŸ–ΌοΈ Media