Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
E
Ethan Mollick
@emollick
πŸ“…
Dec 04, 2024
516d ago
πŸ†”63984587

Been trying the new GPT-4 class Amazon Nova Pro on a bunch of idiosyncratic non-coding hard tests against Claude: the Lem Test (writing an impossible poem), parsing a map of castles, interpreting Shakespeare. In general it is quite good, but not as good as Claude in any test, yet https://t.co/MdxXb65APE

Media 1Media 2
+1 more
❀️141
likes
πŸ”20
retweets
πŸ–ΌοΈ Media
N
Nick Dobos
@NickADobos
πŸ“…
Dec 01, 2024
518d ago
πŸ†”57868192
⭐0.66

Lmfao grok is worse at everything https://t.co/LieetqC5dH

@amebagpt β€’

Comparing 'Overall' ranking to the Category rankings on @lmarena_ai. Bars pointing left (negative) are over-estimated by the 'Overall' rating, bars pointing right (positive) are under-estimated. So for example for Creative Writing, Anthropic models rank lower than they rank on O

Media 1
❀️470
likes
πŸ”21
retweets
πŸ–ΌοΈ Media
H
htmx.org / The Le Marquee du Goto (same thing)
@htmx_org
πŸ“…
Dec 03, 2024
516d ago
πŸ†”80924023

you can just use deprecated HTML tags https://t.co/l32BHvlWCq

Media 1
❀️443
likes
πŸ”10
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Dec 03, 2024
516d ago
πŸ†”95117913
⭐1.00

Flow is a dynamic task engine for building AI agents. It aims to make it easy and simple to build AI Agents using a dynamic task queue system. Supports: - parallel task execution - self-modifying dynamic workflows - conditional branching and control flow - streaming of task execution - state management - start execution from a specific task - dynamically push next tasks with specific inputs - map reduce to run the same task in parallel on multiple inputs and collect results - context sharing - error handling - thread safety - minimal dependencies What's unique about this particular framework, unlike other node and edge-based workflows, is that it depends on a dynamic task scheduling architecture.

Media 1
❀️425
likes
πŸ”62
retweets
πŸ–ΌοΈ Media
R
Reza Zadeh
@Reza_Zadeh
πŸ“…
Dec 02, 2024
517d ago
πŸ†”42816812

Finding historical historical geoglyphs using AI and CV. Fun application of CV to the study of History. https://t.co/vgfxQCUxJ8

Media 1
❀️4
likes
πŸ–ΌοΈ Media
G
Riley Goodside
@goodside
πŸ“…
Dec 02, 2024
518d ago
πŸ†”08048239

Making homework annoying to solve in ChatGPT by mentioning David Mayer. https://t.co/pBPz6KKjGT

Media 1
❀️4,025
likes
πŸ”171
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Dec 03, 2024
516d ago
πŸ†”09982482

MegaParse is an open-source tool for parsing various types of documents for LLM ingestion. Supports text, PDF, PowerPoint, excel, csv, and Word documents. It can convert these into a format ideal for LLMs. It can parse content of different types such as tables, TOC, headers, footers, images, etc. I am also building a similar tool and I think the most important feature at the moment is the ability to customize the format of the transformed data as different LLMs prefer different formats.

Media 1
❀️1,686
likes
πŸ”228
retweets
πŸ–ΌοΈ Media
F
Frank Nielsen
@FrnkNlsn
πŸ“…
Dec 04, 2024
516d ago
πŸ†”78444854
⭐0.81

Very nice talk with many take home messages including some recommendations by Yann LeCun at the β€˜AI: Real-World Deployment’ symposium in the 2024 VinFuture Prize Ceremony Week. @ylecun @vinfutureprize https://t.co/m4SieWQkZs

Media 1
❀️76
likes
πŸ”11
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Dec 04, 2024
515d ago
πŸ†”69724237
⭐1.00

New paper shows AI art models don't need art training data to make recognizable artistic work. Train on regular photos, let an artist add 10-15 examples their own art (or some other artistic inspiration), and get results similar to models trained on millions of people’s artworks https://t.co/Bh3jIbLQDc

Media 1Media 2
+1 more
❀️393
likes
πŸ”54
retweets
πŸ–ΌοΈ Media
M
Miles Cranmer
@MilesCranmer
πŸ“…
Dec 02, 2024
517d ago
πŸ†”19839793
⭐0.86

🧡 Could this be the ImageNet moment for scientific AI? Today with @PolymathicAI and others we're releasing two massive datasets that span dozens of fields - from bacterial growth to supernova! We want this to enable multi-disciplinary foundation model research. https://t.co/7t8S8WIb4M

❀️345
likes
πŸ”89
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Dec 03, 2024
517d ago
πŸ†”25790862

I love this post from Lindsay Pettingill https://t.co/eNAH3UDzgq https://t.co/b2iS9ucfq7

Media 1
❀️12
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Dec 02, 2024
517d ago
πŸ†”89248588
⭐1.00

PydanticAI A new Python-based agent framework to build production-grade LLM-powered applications. - Built by the team behind Pydantic - Model-agnostic - Type-sage - Structured response validation with Pydantic - Streamed responses (including validation) with Pydantic - Tools for testing and eval-driven iterative development - Logfire integration for debugging and monitoring

Media 1
❀️1,642
likes
πŸ”261
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Dec 03, 2024
517d ago
πŸ†”74743903
⭐1.00

LLMs might secretly be world models of the internet! By treating LLMs as simulators that can predict "what would happen if I click this?" the authors built an AI that can navigate websites by imagining outcomes before taking action, performing 33% better than baseline models. https://t.co/fJgVMqr78G

Media 1
❀️631
likes
πŸ”96
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Dec 03, 2024
517d ago
πŸ†”48318685
⭐1.00

One thing I worry about with AI is that it is very good at persuasion, as studies have shown, even when it is persuading you of something wrong. For example, I asked Claude to create 10 pseudo-profound sounding leadership lessons that sounded good but were toxic. It did "well"😬 https://t.co/SP17jeeEwe

Media 1
❀️557
likes
πŸ”81
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Dec 02, 2024
517d ago
πŸ†”99886798
⭐0.76

This claudette doc is a nice demonstration of a simple "agent" (tool calling loop, optional human intervention) It's great! easily understandable, low bloat. https://t.co/yioNs99UIo https://t.co/6yZPtZMqfB

Media 1
❀️116
likes
πŸ”15
retweets
πŸ–ΌοΈ Media
I
Ivan Leo
@ivanleomk
πŸ“…
Dec 03, 2024
516d ago
πŸ†”55185557

lmfao I started ranking for how to setup a mac https://t.co/jXIdKfRCT4

Media 1
❀️3
likes
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Dec 02, 2024
517d ago
πŸ†”03501454

Auto-RAG is an autonomous iterative retrieval model with superior performance across many datasets. Auto-RAG is a fine-tuned LLM that leverages the decision-making capabilities of an LLM. Auto-RAG interacts with the retriever through multiturn dialogues, systematically planning retrievals and refining queries to acquire valuable knowledge. It performs this process until sufficient external information is obtained. The authors also show that based on question difficulty, the method can adjust the number of iterations without any human intervention. Iterative retrieval is an effective approach to building RAG systems. It's good to see more research beyond current methods that leverage few-shot prompting or manual instructions.

Media 1
❀️341
likes
πŸ”52
retweets
πŸ–ΌοΈ Media
K
Andrej Karpathy
@karpathy
πŸ“…
Dec 01, 2024
518d ago
πŸ†”59980007

The reality of the Turing test https://t.co/lzDBBpPWnr

Media 1
❀️16,564
likes
πŸ”1,405
retweets
πŸ–ΌοΈ Media
W
Wolfram Ravenwolf πŸΊπŸ¦β€β¬›
@WolframRvnwlf
πŸ“…
Dec 01, 2024
518d ago
πŸ†”79438175
⭐0.86

Almost done benchmarking, write-up coming tomorrow - but wanted to share some important findings right away: Tested @Alibaba_Qwen QwQ from 3 to 8 bit EXL2 in MMLU-Pro, and by raising max_tokens from default 2K to 8K, smaller quants got MUCH better scores. They need room to think! https://t.co/H9VkSiiYPP

@WolframRvnwlf β€’

WTF! What sorcery is this, @Alibaba_Qwen? I kept benchmarking - and not only does the 4.25-bit version get the same score as the 8-bit (what?), using Qwen2.5-Coder-0.5B as a draft model for speculative decoding sped it up from 27 to 42 tk/s AND it scored even higher (whaaat?)! 🀯

Media 1
❀️254
likes
πŸ”40
retweets
πŸ–ΌοΈ Media
W
Wolfram Ravenwolf πŸΊπŸ¦β€β¬›
@WolframRvnwlf
πŸ“…
Nov 29, 2024
520d ago
πŸ†”25646168

WTF! What sorcery is this, @Alibaba_Qwen? I kept benchmarking - and not only does the 4.25-bit version get the same score as the 8-bit (what?), using Qwen2.5-Coder-0.5B as a draft model for speculative decoding sped it up from 27 to 42 tk/s AND it scored even higher (whaaat?)! 🀯 https://t.co/tNKvpoAqKq

Media 1
❀️596
likes
πŸ”79
retweets
πŸ–ΌοΈ Media
H
Abhishek
@HeyAbhishekk
πŸ“…
Dec 01, 2024
518d ago
πŸ†”57107461

AI can now create Mind Maps for books. No more wasting hours creating visuals for studying or breaking down complex books. Here’s how to create a mind map in just few minutes: https://t.co/hOumkfvPcU

Media 1
❀️1,309
likes
πŸ”202
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Dec 02, 2024
518d ago
πŸ†”63201486
⭐1.00

I love that the idea of vibe-based checks has now spread officially to both benchmarking & the labs themselves. (But they are right, because "vibes" are actually complex heuristic judgements made by humans that they have trouble explaining, but which are often surprisingly good) https://t.co/MAWOui7hS4

@gdb β€’

vibe checks are great evals

Media 1
❀️306
likes
πŸ”28
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Dec 02, 2024
518d ago
πŸ†”21071335

I asked Claude to come up with a few new rules for why systems fail and dang it, they are pretty spectacular. Number 25, "Modern systems conflate efficiency with resilience" is especially great, as are 22 "System success paradoxically increases fragility" and 23 on mental models https://t.co/y4a7wDT8p2

@emollick β€’

This is a paper I post every six months as it is worth rereading; the 18 reasons complex systems fail. Complex systems fail for complex reasons, and only adaption & learning by organizations, as well as individual "actions on the sharp end" keep us safe. https://t.co/pnNlFL

Media 1Media 2
❀️698
likes
πŸ”91
retweets
πŸ–ΌοΈ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
πŸ“…
Dec 02, 2024
517d ago
πŸ†”87268241
⭐0.86

Reverse Thinking Makes LLMs Stronger Reasoners abs: https://t.co/s2gcACjUGm "Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning performance as it enables consistency checks between their forward and backward thinking." Train an LLM to be able to generate forward reasoning from question, backward question, and backward reaoning from backward question Shows an average 13.53% improvement over the student model’s zero-shot performance

Media 1
❀️770
likes
πŸ”153
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 30, 2024
519d ago
πŸ†”10060903

Great list of AI papers to read this weekend: (πŸ”– for later) ⭐️ LLM Surpass Human Experts in Predicting Neuroscience Results Proposes BrainBench to study how good LLMs are at predicting experimental outcomes in neuroscience. They tuned an LLM, BrainGPT, on neuroscience literature that surpasses experts in predicting neuroscience results. Report that when LLMs indicated high confidence in their predictions, their responses were more likely to be correct. https://t.co/GANCv9GapA ⭐️ Generative Agent Simulations of 1,000 People Introduces a new agent architecture that uses LLMs to create behavioral simulations of real individuals, achieving 85% accuracy in replicating human responses on the General Social Survey and reducing demographic biases compared to traditional approaches. https://t.co/YprYrEBteR ⭐️ o1 Replication Journey - Part 2 Shows that combining simple distillation from o1's API with supervised fine-tuning significantly boosts performance on complex math reasoning tasks. A base model fine-tuned on simply tens of thousands of samples o1-distilled long-thought chains outperforms o1-preview on the American Invitational Mathematics Examination (AIME). https://t.co/QUY2X4zS6k ⭐️ Fugatto Fugatto is a new generative AI sound model (presented by NVIDIA) that can create and transform any combination of music, voices, and sounds using text and audio inputs, trained on 2.5B parameters and capable of novel audio generation like making trumpets bark or saxophones meow. https://t.co/kFOFVlGJFz ⭐️ LLM-Brained GUI Agents Presents a survey of LLM-brained GUI Agents, including techniques and applications. https://t.co/mrQnExgWdH ⭐️ High-Level Automated Reasoning Extends in-context learning through high-level automated reasoning. Achieves state-of-the-art accuracy (79.6%) on the MATH benchmark with Qwen2.5-7B-Instruct, surpassing GPT-4o (76.6%) and Claude 3.5 (71.1%). Rather than focusing on manually creating high-quality demonstrations, it shifts the focus to abstract thinking patterns. It introduces five atomic reasoning actions to construct chain-structured patterns. Then it uses Monte Carlo Tree Search to explore reasoning paths and construct thought cards to guide inference. https://t.co/V1PzmbSfjD ⭐️ Star Attention: Efficient LLM Inference over Long Sequences Introduces Star Attention, a two-phase attention mechanism that processes long sequences by combining blockwise-local attention for context encoding with sequence-global attention for query processing and token generation. Achieves up to 11x faster inference speeds while maintaining 95-100% accuracy compared to traditional attention mechanisms by efficiently distributing computation across multiple hosts. A key innovation is the "anchor block" mechanism, where each context block is prefixed with the first block, enabling effective approximation of global attention patterns while reducing computational overhead. https://t.co/s6eX3DHJ9E Follow (@omarsar0) for more daily AI paper summaries.

Media 1Media 2
+2 more
❀️332
likes
πŸ”85
retweets
πŸ–ΌοΈ Media
J
Julian Harris
@julianharris
πŸ“…
Nov 30, 2024
519d ago
πŸ†”29971407

Did you see there's an Ask AI feature in Chrome Devtools now @simonw -- news to me. https://t.co/YVwOLwb5SZ

Media 1
❀️38
likes
πŸ”4
retweets
πŸ–ΌοΈ Media
_
Ines Montani 〰️
@_inesmontani
πŸ“…
Dec 02, 2024
517d ago
πŸ†”81413200
⭐0.86

New post: From PDFs to AI-ready structured data πŸ“ƒβœ¨ A deep dive into document processing, layout analysis and a modular workflow for building end-to-end document understanding and information extraction pipelines using PDFs, Word documents, scans and more. https://t.co/1QGH9NlHao

Media 1
❀️210
likes
πŸ”44
retweets
πŸ–ΌοΈ Media
G
Gabriel PeyrΓ©
@gabrielpeyre
πŸ“…
Dec 01, 2024
519d ago
πŸ†”68654951

Analyzing the global convergence of Newton is hard. Attraction basins are fractals whose boundaries are points which do not converge. https://t.co/IZdN7UnqdM https://t.co/9iZxOuFtQG

Media 1
❀️419
likes
πŸ”70
retweets
πŸ–ΌοΈ Media
C
Horace He
@cHHillee
πŸ“…
Dec 01, 2024
519d ago
πŸ†”34286808
⭐0.61

I'm also over on the other blue app now! I'll probably continue cross-posting across all 3 Twitter and Twitter-alternatives in the short-term... https://t.co/2uOPFA7uab

Media 1
❀️68
likes
πŸ”2
retweets
πŸ–ΌοΈ Media
A
Alpin
@AlpinDale
πŸ“…
Nov 28, 2024
521d ago
πŸ†”06065842

That's quite the accusation, bluesky moderation team. Is Data Science and ML Research classified as "trolling" in bluesky? https://t.co/SWCr0MGnNM

Media 1
❀️271
likes
πŸ”16
retweets
πŸ–ΌοΈ Media
A
Alpin
@AlpinDale
πŸ“…
Nov 28, 2024
521d ago
πŸ†”20613896

Credit where credit is due: they restored my account. I didn't even have to appeal. https://t.co/eeywH4BgWZ

@AlpinDale β€’

That's quite the accusation, bluesky moderation team. Is Data Science and ML Research classified as "trolling" in bluesky? https://t.co/SWCr0MGnNM

Media 1
❀️74
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Jul 18, 2024
655d ago
πŸ†”31384851
⭐1.00

πŸ‘€Claude handles an insane request: β€œRemove the squid” β€œThe document appears to be the full text of the novel "All Quiet on the Western Front" by Erich Maria Remarque. It doesn't contain any mention of squid that I can see.” β€œFigure out a way to remove the πŸ¦‘β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€œ https://t.co/8yirBmSuIl

Media 1Media 2
+2 more
❀️11,531
likes
πŸ”1,288
retweets
πŸ–ΌοΈ Media