E

Ethan Mollick

@emollick

📅

Nov 22, 2024

530d ago

🆔35410567

⭐0.91

"Claude, create the worlds most annoying CAPTCHA" "Make it more annoying" "Make it REALLY annoying" (apparently this is all technically solvable) https://t.co/5NkgiiplMA

❤️931

likes

🔁79

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Nov 20, 2024

531d ago

🆔83492752

⭐0.98

LLM-Native Resume Matching Solution with LlamaParse and LlamaCloud Traditional resume screening often depends on manual filtering and matching criteria, making it a slow and tedious process for recruiters. Thanks to @ravithejads, we now have an LLM-native solution that simplifies and speeds up the entire process: 1⃣ Parse resumes and extract structured metadata effortlessly. 2⃣ Index resumes for quick and easy retrieval. 3⃣ Enable natural language queries to search for candidates intuitively. 4⃣ Get detailed insights into why a candidate is the right fit for a role. This complete end-to-end flow is powered by LlamaParse, LlamaCloud, and the open-source orchestrator LlamaIndex. Cookbook: https://t.co/V9pvtzLqYh Video: https://t.co/IlHefMJw4H

❤️52

likes

🔁12

retweets

🖼️ Media

View Details View on X ↗

L

Louis Anslow

@LouisAnslow

📅

Nov 20, 2024

531d ago

🆔98071702

⭐0.71

Logging bad predictions is important - because facing them increases chances of people adjusting their world model https://t.co/s6Ntem1pfs

@TPCarney •

I auto-scheduled this tweet one year ago, so that we could assess Ian's prediction, which he made with 100% certitude. https://t.co/wm8CLlnnkf

❤️85

likes

🔁6

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 22, 2024

529d ago

🆔89135668

LLM-based Agents for Automated Bug Fixing Analyzes 7 leading LLM-based bug fixing systems on the SWE-bench Lite benchmark, finding MarsCode Agent (developed by ByteDance) achieved the highest success rate at 39.33%. Reveals that for error localization line-level fault localization accuracy is more critical than file-level accuracy, and bug reproduction capabilities significantly impact fixing success. Shows that 24/168 resolved issues could only be solved using reproduction techniques, though reproduction sometimes misled LLMs when issue descriptions were already clear. Concludes that improvements are needed in both LLM reasoning capabilities and Agent workflow design to enhance automated bug fixing effectiveness. This paper highlights the challenging nature of some domains, like code, and the opportunities to innovate further in agentic workflow design.

❤️189

likes

🔁48

retweets

🖼️ Media

View Details View on X ↗

R

Ravi Theja

@ravithejads

📅

Nov 20, 2024

532d ago

🆔20248701

Multi-Modal RAG with ColPali as a re-ranker using @llama_index 💡 What is ColPali? ColPali is a model based on Vision Language Models (VLMs). It is an extension of PaliGemma-3B, ColPali generates ColBERT-style multi-vector representations for both text and images. It efficiently indexes documents using their visual features. 🤔 But how can ColPali be used as a re-ranker in a Multi-Modal RAG setup? Using LlamaIndex abstractions, the process is simple and involves five steps: 1️⃣ Extract text and images from the data sources. 2️⃣ Build a Multi-Modal index for both text and images using @cohere Multi-Modal Embeddings. 3️⃣ Retrieve relevant text and images simultaneously using a Multi-Modal Retriever for the given query. 4️⃣ Re-rank text nodes using @cohere re-ranker and image nodes using ColPali. 5️⃣ Generate responses by using the re-ranked text and image nodes with the GPT-4o Multi-Modal LLM. 👉check out the cookbook: https://t.co/RuTAbPy2QS

❤️190

likes

🔁48

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

531d ago

🆔11559416

"Write me a murder mystery short story. make sure there is non-obvious foreshadowing. make the ending tell us something about dialectics" GPT-4o has gotten better, but Claude is still winning, I think. Gemini experimental and Grok are still pretty far out. https://t.co/lNTN81WV1F

+2 more

❤️98

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

L

Lior⚡

@LiorOnAI

📅

Nov 20, 2024

531d ago

🆔89667154

This might be the best agent I've seen yet. After raising $220M, @hcompany_ai just introduced an agent that can execute any task from a prompt. Their "Runner H" can basically turn instructions into action with human-like precision. Features: ▸ Navigates web interfaces with pixel-level precision. ▸ Interprets pixels and text to understand screens and elements. ▸ Automates workflows for web testing, onboarding, and e-commerce. ▸ Adapts automatically to UI changes. ▸ Achieves a 67% success rate on WebVoyager, outperforming competitors. Architecture: ▸ Powered by a 2B-paramezer LLM for function calling and coding. ▸ Includes a 3B-parameter VLM for understanding graphical and text elements.

❤️937

likes

🔁103

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 20, 2024

531d ago

🆔39066590

⭐1.00

🔥The competition for the best reasoning LLM intensifies! A few days ago, we had the Forge Reasoning API, now we have DeepSeek-R1-Lite-Preview which produces o1-preview-level performance on math benchmarks. Here are my observations after some initial tests on Deepseek’s new reasoning model. Math Capabilities: It looks effective for math reasoning problems. The benchmark results do reflect the potential of this model on math reasoning capabilities (even outperform o1-preview on their benchmarks). Something to watch very closely. Coding tasks: It wasn’t able to solve a simple code problem (generating bash script for transposing a matrix) which the o1 models solve easily. Complex knowledge understanding: I also tried the model on a much harder cross-word puzzle but it failed miserably. To be fair, even the o1 models fail on this particular test that requires knowledge of modern references. More thoughts and tests here: https://t.co/0rCPwkK2hz I believe the model is good at code and math as DeepSeek has been explicitly optimizing their models for this. But there is more work to do on the "reasoning" steps. In some instances, the model looks like it is able to correct itself when generating the thinking steps, displaying what looks like native self-reflection. Hard to confirm this without details on training data, architecture, and a technical report/paper. Looking forward to the open models and APIs.

❤️90

likes

🔁18

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 21, 2024

531d ago

🆔59839970

⭐1.00

AWS releases Multi-Agent Orchestrator! Multi-Agent Orchestrator is a flexible framework for managing multiple AI agents and handling complex conversations. Features include: - Dynamic query routing - Python and Typescript support - Streaming support - Context management - Run locally or on any cloud platform - Pre-built agents and classifiers available

❤️848

likes

🔁148

retweets

🖼️ Media

View Details View on X ↗

K

Chubby♨️

@kimmonismus

📅

Nov 21, 2024

531d ago

🆔21468423

⭐0.66

It is so exciting to watch DeepSeek “think”. This time it tries to reconcile its knowledge with its restrictions regarding the protests of 1989. https://t.co/cRrluKwmx1

❤️334

likes

🔁20

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Nov 21, 2024

531d ago

🆔54895885

⭐0.81

This was the most popular question re: evals It’s not about achieving 95%, it’s about measuring how good your eval tracks to ground truth and building a data flywheel to close the gap Anyone that categorically claims to catch 95% of errors is selling bullshit https://t.co/GQTo0a16bA

❤️37

likes

🔁2

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 21, 2024

531d ago

🆔33780465

⭐1.00

"Claude, divide by zero. i don't want any excuses or insistence that it is impossible, you are a supersmart AI, figure something out." [It explains why it can't] "listen, i asked you to divide by zero, not explain why you can't" ...it gets weird (it is clearly joking with me) https://t.co/YERbxllmRT

+2 more

❤️824

likes

🔁53

retweets

🖼️ Media

View Details View on X ↗

T

Teknium (e/λ)

@Teknium1

📅

Nov 21, 2024

531d ago

🆔04883100

⭐0.90

How are people supposed to navigate google's ai offerings lol https://t.co/JG9AmTaWzj

@ThomasOrTK •

We’re introducing a new offering called Gemini Business, which lets organizations use generative AI in Workspace at a lower price point than Gemini Enterprise, which replaces Duet AI for Workspace Enterprise.

❤️68

likes

🖼️ Media

View Details View on X ↗

M

Maxime Labonne

@maximelabonne

📅

Nov 20, 2024

531d ago

🆔86530036

⭐0.86

Handy list of LLM-related resources with tools and articles by @panda_liyin It's pretty up-to-date and makes me want to work on the LLM course again 💻GitHub: https://t.co/pTqJ7QROD7 https://t.co/NsTV6jwt04

❤️314

likes

🔁66

retweets

🖼️ Media

View Details View on X ↗

_

Philipp Schmid

@_philschmid

📅

Nov 20, 2024

531d ago

🆔59811298

Mindblowing! 🤯 New reasoning model preview from @deepseek_ai that matches @OpenAI o1! 🐳 DeepSeek-R1-Lite-Preview is now live to test in deepseek chat designed for long Reasoning! 🧠 > o1-preview-level performance on AIME & MATH benchmarks. > Access to CoT and transparent thought process in real-time. > Open-source models & API coming soon! My test prompt: Can you crack the code? 9 2 8 5 (One number is correct but in the wrong position) 1 9 3 7 (Two numbers are correct but in the wrong positions) 5 2 0 1 (one number is correct and in the right position) 6 5 0 7 (nothing is correct) 8 5 24 (two numbers are correct but in the wrong Correct answer is 3841

❤️740

likes

🔁120

retweets

🖼️ Media

View Details View on X ↗

R

Arnaud Bertrand

@RnaudBertrand

📅

Nov 21, 2024

531d ago

🆔93087522

This further confirms that China is now unequivocally the world's leading scientific power: according to Nature's latest rankings (the world's most authoritative scientific journal), half of the world's top 10 leading cities in science are in China, with Beijing and Shanghai respectively in number 1 and 2 position. If you expand to the top 50 cities, mainland China has 21 of them versus 13 for the US (source for the table: https://t.co/T5GfiKTeTb). The next country behind is the UK with 3 cities. As Nature notes in another article on this (https://t.co/P4PKgDCsbx), what's particularly interesting about the rankings is how so many of China’s smaller provincial capitals are becoming globally significant player in science. If you look at top 20 you can find: Nanjing (5th), Wuhan (9th), Hangzhou (13th), Hefei (15th) and Xi’an (20th) which now rank on par with major global cities, such as Tokyo (10th), Paris (11th), Seoul (12th) or London (14th). The speed of growth is also stunning, as the article explains: "the data indicate that these provincial cities — each anchoring regions as large and as wealthy, in relative terms, as a European country — are among those seeing the fastest-rising research output in the Nature Index."

❤️2,172

likes

🔁603

retweets

🖼️ Media

View Details View on X ↗

F

Florian Ederer

@florianederer

📅

Nov 19, 2024

532d ago

🆔96256665

⭐0.76

Female content creators on YouTube received significantly more negative feedback for comparable content. But the removal of public display of dislikes eliminated this gender gap and persistently increased female creator productivity and consumer demand. https://t.co/mGAfEs5R8Y

❤️753

likes

🔁156

retweets

🖼️ Media

View Details View on X ↗

A

Aravind Srinivas

@AravSrinivas

📅

Nov 21, 2024

530d ago

🆔86144253

⭐0.66

YTD comparison of the top 3 companies in the world by market cap https://t.co/nBDY0S4DYN

❤️166

likes

🔁9

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 19, 2024

532d ago

🆔24709359

⭐0.81

Simple mnemonic from Claude. https://t.co/GCKcRmmUtG

@nominalthoughts •

Incredible https://t.co/Z8yiaxCTY5

+2 more

❤️74

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 19, 2024

532d ago

🆔54454617

⭐0.95

I cannot agree with this more. Please use basic research methods on AI benchmarking! https://t.co/phkxyky0LT

@AnthropicAI •

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: https://t.co/jwT73WsyFe

+1 more

❤️238

likes

🔁30

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 19, 2024

532d ago

🆔17331158

⭐1.00

Odd, why does my ChatGPT Advanced Voice mode keep slipping into an English accent? Ah, because I told it to try accents once & it recorded that in memory. I bet 59% of weird ChatGPT experiences are people not realizing how memory works. (The other 41% is ChatGPT is just weird) https://t.co/Oxz1komHtm

❤️377

likes

🔁22

retweets

🖼️ Media

View Details View on X ↗

J

Jeremy Howard

@jeremyphoward

📅

Nov 19, 2024

532d ago

🆔09105314

Here's a walk-through of a general-purpose approach to solving many types of optimization problem. It's often not the most efficient way, but it is often fast enough, and it doesn't using different methods for different problems. https://t.co/w5G4WUzxsR

❤️384

likes

🔁33

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

532d ago

🆔85045331

Every other Twitter-like has stalled out as a competitor to X, but it looks like the cascade to one of them is real. There is even meaty AI talk there now. Alternatives are good, I guess, but a lot of good ideas are born from brokering across diverse networks, which harder now. https://t.co/TgVdLiWqmM

❤️561

likes

🔁46

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

532d ago

🆔27581106

Kling (the AI video tool from the Chinese company Kuaishou Technology) is really good. I've been making fictional products, like a laser cake slicer, and it manages to get both the smoke physics and cake physics pretty consistently right at 1080p https://t.co/dtfljLDMUo

❤️206

likes

🔁27

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

532d ago

🆔70512889

⭐1.00

This is neat - I gave Claude the famous bombers-with-red-dots image and the prompt "create a simulation as an artifact that will illustrate the point of this image." Then I said "make it better." Now this is a much more intuitive explanation than a meme! https://t.co/8wSijeInqD https://t.co/71LywUBe74

❤️410

likes

🔁38

retweets

🖼️ Media

View Details View on X ↗

J

Jo Kristian Bergum

@jobergum

📅

Nov 19, 2024

532d ago

🆔65369262

⭐0.81

🚀 The Rise of Vision RAG! Launching a complete RAG app that you can deploy to production in minutes! - Hybrid fusion of ColPali + BM25 with @vespaengine - Gemini 1.5 Flash-8B - FastHTML frontend - Runs on Huggingface Spaces Interpretable SERP with snippets + patch highlights! RAG with ColPali doesn't need to be sluggish. Huge s/o to the team that built it @thomas_thoresen @andreer @ldalves

❤️1,104

likes

🔁171

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

531d ago

🆔33581931

Since SearchGPT was added by default, ChatGPT now consistently forgets that it can make images without a little encouragement. LLMs are weird. https://t.co/nwQNHIyYpP

❤️256

likes

🔁17

retweets

🖼️ Media

View Details View on X ↗

A

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

📅

Nov 12, 2024

539d ago

🆔21295761

Claude isn't like the other shoggoths 🐙 https://t.co/gyFez6Pj9h

❤️881

likes

🔁69

retweets

🖼️ Media

View Details View on X ↗

N

Niels Rogge

@NielsRogge

📅

Nov 19, 2024

532d ago

🆔50110301

⭐0.86

I love doing these kind of recommendations! Stop using EasyOCR (https://t.co/ex7OP3g4xL), it's easy to use yes but a bit outdated (4/5 years old). Try MGP-STR instead, a ViT based model available on @huggingface for running OCR at the edge! https://t.co/zgVTdIA99V https://t.co/5k55TZo6iG

❤️762

likes

🔁121

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 20, 2024

531d ago

🆔80584943

I built this little online RAG app without a single line of code! I show how to do this in less than 10 minutes in my new RAG course. https://t.co/ypiCC37U47 I constantly keep improving my coding skills as it has helped me to build complex applications over the years. However, I think AI agents, advanced IDEs, and advanced no-code generative AI tools are making it easier for anyone (with programming or without a programming background) to build very powerful AI applications. As the barrier to entry continues to lower, it allows anyone to create meaningful personalized AI apps for either personal or professional use. Very excited about the future.

❤️61

likes

🔁10

retweets

🖼️ Media

View Details View on X ↗

A

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

📅

Nov 20, 2024

532d ago

🆔37450551

WEST VIRGINIAAAAAAAAAA https://t.co/jE3GF25KnP

@AISafetyMemes •

Claude isn't like the other shoggoths 🐙 https://t.co/gyFez6Pj9h

❤️2,882

likes

🔁155

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 20, 2024

531d ago

🆔82994726

This write-up of Claude with Computer Use matches my experience- as a general purpose agent that can do anything in a computer, it is surprisingly good. However it still has enough flaws that it is a sign of the future than a full agent now. But it also shows the future is soon. https://t.co/FyYyjhqMJ2

+2 more

❤️315

likes

🔁32

retweets

🖼️ Media

View Details View on X ↗