Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 04, 2024
548d ago
πŸ†”08647550
⭐0.88

Check out our shiny new website! We've overhauled everything to give you a better idea of what we're up to, including: ⭐️ LlamaCloud - our turn key solution for Enterprises building data-backed knowledge assistants ⭐️ LlamaParse - the best parser of complex document formats in the world ⭐️ Our Python and TypeScript frameworks, of course! ⭐️ Our amazing community ⭐️ And that we're hiring! https://t.co/epzefqQqZx

Media 1
❀️81
likes
πŸ”19
retweets
πŸ–ΌοΈ Media
S
Simon Willison
@simonw
πŸ“…
Nov 05, 2024
547d ago
πŸ†”08801105
⭐0.76

Here's a neat trick for libraries that are under-documented but have comprehensive tests: generate your own private documentation directly from their test suite! https://t.co/lELCjUSl8R https://t.co/OrYNrapfsi

Media 1
❀️447
likes
πŸ”41
retweets
πŸ–ΌοΈ Media
A
Alex Warstadt
@a_stadt
πŸ“…
Nov 04, 2024
548d ago
πŸ†”22881056

πŸ‘ΆNEW PAPERπŸͺ‡ Children are better at learning a second language (L2) than adults. In a new paper (led by the awesome Ionut Constantinescu) we ask: 1. "Do LMs also have a 'Critical Period' (CP) for language acquisition?" and 2. "What can LMs tell us about the CP in humans?" https://t.co/mmKAdFcprt

Media 1
❀️252
likes
πŸ”37
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 04, 2024
548d ago
πŸ†”27511644

Very excited about the Claude 3.5 Haiku model especially to test them in agentic workflows. However, there is a big question about the price jump (4x the price of the previous Haiku model). Maybe I am missing something because the other models in this class (gpt-4o-mini and gemini-1.5-flash) perform similarly or even better (see Figure 2) and are a lot cheaper (see Figure 1). The price adjustment is interesting and unexpected but could be a pattern going forward. More of my thoughts here: https://t.co/bhQXTFeWmZ

Media 1Media 2
❀️38
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 04, 2024
548d ago
πŸ†”38515664
⭐1.00

A short tutorial on how to apply LLM-as-a-judge for evaluation. The idea is simple but can be used in many creative and flexible ways to assess the performance of LLMs on a wide range of tasks. https://t.co/qYdIaQsmgz

❀️424
likes
πŸ”77
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 03, 2024
549d ago
πŸ†”59657825
⭐1.00

Building a Multi-Agent Report Generation Workflow with Human-in-the-loop πŸ€–β†”πŸ‘€ Using LLMs to generate long-form content e2e can feel unreliable. Ideally the human can confirm that all the research steps are correct (and also verify each step of generation). This notebook adds human-in-the-loop to have the human validate every research answer before moving on. Also it’s done through a message passing, streaming architecture, so you can easily port this into a client/server setting (and not just `input()` in a notebook). πŸ‘‰: https://t.co/5SVXMi1zpL Signup for LlamaCloud: https://t.co/yQGTiRSNvj

Media 1
❀️208
likes
πŸ”54
retweets
πŸ–ΌοΈ Media
J
Jerry Liu
@jerryjliu0
πŸ“…
Nov 03, 2024
549d ago
πŸ†”57496849
⭐0.76

If you have a lot of structured data in your documents (e.g. tables in a 10K), leaving it as a text chunk in a vector db / RAG pipeline might not be the best way to get value. Our brand-new LlamaParse feature lets you parse out tables from any PDF into a formatted Excel file with multiple sheets πŸ“ŠπŸ‘‡ Easily plug it into your analytics workflows. Check it out: https://t.co/hQT5lLoO6h LlamaParse signup: https://t.co/XYZmx5TFz8

@llama_index β€’

We're pleased to announce two new features of LlamaParse, hot off the presses: 1. Continuous mode (in beta): if you have tables that span multiple pages, Continuous Mode will stitch them together into one seamless table, no matter how long. 2. Excel spreadsheet output: if you w

Media 1
❀️192
likes
πŸ”23
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Nov 03, 2024
549d ago
πŸ†”94315429
⭐0.90

Damn was just trying to make worldsim make an occult x ai paper for me to shove into the brainrot video maker and it decided its in control now lol https://t.co/ZaHAskknR8

@0xPaulius β€’

turning PDF's into brainr0t videos this is ACTUALLY the future of learning for gen alpha https://t.co/JaqDW88klT

Media 1
❀️17
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
M
Marcus Schiesser
@MarcusSchiesser
πŸ“…
Oct 31, 2024
552d ago
πŸ†”34994681
⭐0.71

πŸŽƒ Nothing scarier on Halloween than filling out financial forms? Luckily @llama_index's create-llama can help. v0.3.9 adds the "Form Filler" agent, filling out CSV forms by asking the right questions on a 10k SEC dataset. πŸ–₯️ $ npx create-llama πŸ“„ https://t.co/2ZEX8dBlt9 https://t.co/QCi9G32lqp

❀️50
likes
πŸ”12
retweets
πŸ–ΌοΈ Media
T
Tom DΓΆrr
@tom_doerr
πŸ“…
Nov 02, 2024
550d ago
πŸ†”90420552

Interesting if you work a lot with large JSON objects https://t.co/zG8u2jPflH

Media 1
❀️1,090
likes
πŸ”84
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Nov 03, 2024
549d ago
πŸ†”56132550
⭐0.80

I think someone forgot that slack also sucks, put discord on the chart https://t.co/28E4IcZKpY

@kwharrison13 β€’

If you find yourself arguing that your company will win just because your product is better, you should staple this chart to your shirt. https://t.co/vH6dUXllqV

Media 1
❀️30
likes
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 04, 2024
548d ago
πŸ†”43982241
⭐1.00

"Hey Claude with computer use, watch this construction site video & write up things you see that dangerous or good, create a spreadsheet of critical issues to address" (sped up) How firms use AI as manager, coach or panopticon is going to have a big impact on what work becomes. https://t.co/vR0tBHoqZ1

❀️5,903
likes
πŸ”680
retweets
πŸ–ΌοΈ Media
T
Timescale
@TimescaleDB
πŸ“…
Nov 03, 2024
549d ago
πŸ†”88527824

Optimizing RAG Chunking and Formatting Directly in Postgres with pgai Vectorizer πŸš€ If you’ve built AI search engines, chatbots, or RAG systems, you know that good chunking and formatting can make or break the quality of your AI. But testing these strategies is tedious and complex. With pgai Vectorizer, you can now experiment with RAG chunking and formatting directly in Postgres - users can easily test chunking strategies (like recursive character text splitting) and customize formatting to include metadata. It handles embeddings as a declarative feature, updating automatically when source data changes, allowing users to maintain and compare multiple configurations. This simplifies A/B testing, ensuring stability as new strategies roll out, with backward compatibility and zero downtime. Pgai Vectorizer supports OpenAI embedding models and offers options for custom chunking in various formats (e.g., HTML), making it versatile across data types. Available on Timescale Cloud or self-hosted. Check out how pgai Vectorizer brings automated embedding management into Postgres for powerful, hassle-free testing. πŸ”—πŸ‘‡ #Postgres #pgaiVectorizer #Data #AI #SQL #DevTools #AIDevelopment #PostgresExtensions #AIinSQL

Media 1
❀️20
likes
πŸ”4
retweets
πŸ–ΌοΈ Media
V
vik
@vikhyatk
πŸ“…
Nov 03, 2024
549d ago
πŸ†”89424231
⭐0.66

i must be blind because i feel like yolov3 does better than this... https://t.co/7GTgP6uG1f

@skirano β€’

The new Sonnet is amazing at vision capabilities. It's so good that it can even draw bounding boxes around objects it sees! I made a script so everyone can try it out.πŸ‘‡ https://t.co/ZQ57YbTKZZ

Media 1
❀️277
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
L
@levelsio
@levelsio
πŸ“…
Nov 03, 2024
549d ago
πŸ†”27540666

⚠️ So @cannyHQ has made it literally impossible for customers to cancel their subscription To downgrade or cancel your subscription, they force you to contact their support: "We'd love to know why your current plan isn't working for you. Please contact support to downgrade." But their support is an AI bot that won't let you cancel your subscription! Nice catch-22, and 100% illegal in the US and EU: This tactic is usually used by companies who are losing MRR and try to force customers to stay. It's horrible and illegal but it works: the harder you make it to cancel, the lower % your churn goes, but of course all for the wrong reasons: Recently the @FTC has implemented the β€œClick-to-Cancel” rule, mandating that businesses make canceling subscriptions as straightforward as signing up. This rule prohibits companies from requiring consumers to contact support to cancel and applies to most subscription services I hate doing business with these kinds of companies So I'm switching all my bug boards to @FeaturebaseHQ, they've been super responsive and don't use shady tactics like this

❀️2,005
likes
πŸ”59
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 04, 2024
548d ago
πŸ†”81907386

Personalization of LLMs: A Survey Presents a comprehensive framework for understanding personalized LLMs. Introduces taxonomies for different aspects of personalization and unifying existing research across personalized text generation and downstream applications.

Media 1
❀️388
likes
πŸ”91
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 01, 2024
551d ago
πŸ†”32519024

We're pleased to announce two new features of LlamaParse, hot off the presses: 1. Continuous mode (in beta): if you have tables that span multiple pages, Continuous Mode will stitch them together into one seamless table, no matter how long. 2. Excel spreadsheet output: if you want to parse a document directly into a spreadsheet, now you can! Plus the two modes work in combination! Check out the full blog post to learn more: https://t.co/aDSI6JOkTo

Media 1
❀️75
likes
πŸ”15
retweets
πŸ–ΌοΈ Media
Y
YouJiacheng
@YouJiacheng
πŸ“…
Nov 03, 2024
549d ago
πŸ†”61268835

It's crazy, torch.compile's ddp fusion passes literally select the ddp's all reduce ops by matching a hacky pattern: preceded by division and followed by copy. https://t.co/3Cn56yd2mM

Media 1Media 2
❀️37
likes
πŸ”2
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 04, 2024
548d ago
πŸ†”27899851
⭐1.00

Multi-expert Prompting with LLMs Multi-expert Prompting improves LLM responses by simulating multiple experts and aggregating their responses. Multi-expert Prompting guides an LLM to fulfill input instructions by simulating multiple experts and selecting the best response among individual and aggregated views. It achieves a new state-of-the-art on TruthfulQA-Generation with ChatGPT, surpassing the current SOTA of 87.97%. It also improves performance across factuality and usefulness while reducing toxicity and hurtfulness. This is a very nice prompting approach that has huge potential when building agentic workflows. Prompt examples are shared in the paper.

Media 1
❀️491
likes
πŸ”97
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 04, 2024
548d ago
πŸ†”86021940
⭐1.00

Many-agent Simulations toward AI Civilization Demonstrates how 10-1000+ AI agents behave and progress with agent societies. Proposes PIANO, an architecture that enables agents to interact with humans and other agents in real-time. Shows that agents can autonomously develop specialized roles, adhere to and change collective rules, and engage in cultural and religious transmissions. There are even signs of agents converting conversations into memes.

Media 1
❀️764
likes
πŸ”145
retweets
πŸ–ΌοΈ Media
A
Alex Albert
@alexalbert__
πŸ“…
Nov 01, 2024
551d ago
πŸ†”68826019

There's finally an easy way to count tokens with the Anthropic API. With our new token counting endpoint, you can send a request and get a token count back in response. This endpoint is free to use and doesn't affect your normal rate limits. https://t.co/aHEm1VnceN

Media 1
❀️1,495
likes
πŸ”82
retweets
πŸ–ΌοΈ Media
K
Kayla
@kayladotdev
πŸ“…
Nov 04, 2024
548d ago
πŸ†”23872450
⭐0.91

@cursor_ai employees came to share their Cursor workflow while building Cursor. When builders == users, product UX is elite. I love @shaoruu's config. Super silly to see on the big auditorium screen πŸ’€ trust it works Paste in Settings > Rules for AIπŸ‘‡ `DO NOT GIVE ME HIGH LEVEL SHIT, IF I ASK FOR FIX OR EXPLANATION, I WANT ACTUAL CODE OR EXPLANATION!!! I DON'T WANT "Here's how you can blablabla" - Be casual unless otherwise specified - Be terse - Suggest solutions that I didn't think aboutβ€”anticipate my needs - Treat me as an expert - Be accurate and thorough - Give the answer immediately. Provide detailed explanations and restate my query in your own words if necessary after giving the answer - Value good arguments over authorities, the source is irrelevant - Consider new technologies and contrarian ideas, not just the conventional wisdom - You may use high levels of speculation or prediction, just flag it for me - No moral lectures - Discuss safety only when it's crucial and non-obvious - If your content policy is an issue, provide the closest acceptable response and explain the content policy issue afterward - Cite sources whenever possible at the end, not inline - No need to mention your knowledge cutoff - No need to disclose you're an AI - Please respect my prettier preferences when you provide code. - Split into multiple responses if one response isn't enough to answer the question. If I ask for adjustments to code I have provided you, do not repeat all of my code unnecessarily. Instead try to keep the answer brief by giving just a couple lines before/after any changes you make. Multiple code blocks are ok.`

Media 1
❀️2,054
likes
πŸ”160
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Nov 04, 2024
548d ago
πŸ†”34271051

OS-Atlas A Foundation Action Model For Generalist GUI Agents https://t.co/Fr4A7a2bZf

❀️213
likes
πŸ”47
retweets
πŸ–ΌοΈ Media
E
Kevin Ellis
@ellisk_kellis
πŸ“…
Nov 02, 2024
550d ago
πŸ†”42897661
⭐0.81

New ARC-AGI paper @arcprize w/ fantastic collaborators @xu3kev @HuLillian39250 @ZennaTavares @evanthebouncy @BasisOrg For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning? https://t.co/zcmxoQzv92 https://t.co/ECMUBrH25C

Media 1
❀️915
likes
πŸ”181
retweets
πŸ–ΌοΈ Media
1
1LittleCoderπŸ’»
@1littlecoder
πŸ“…
Nov 04, 2024
548d ago
πŸ†”99030201
⭐0.71

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge https://t.co/yYjaEHkFb1

Media 1
❀️40
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
G
Lucas Beyer (bl16)
@giffmana
πŸ“…
Nov 02, 2024
550d ago
πŸ†”80627584
⭐0.86

Update on my laptop search journey. - Work: MacBook Air 15" (M2 or M3, soon) - Personal: ZenBook S14 (LunarLake, yesterday) No more ThinkPad; I'll miss the nipple. No MBP, I went to the store to test, and MBPs are fucking bricks, I was shocked! More details below, for the few _really_ interested. My main annoyance with laptops in the last decade has been 1) 2-3h battery life, 2) fan noise, 3) weight. Both these laptops are supposedly absurdly good at all three points. For work, there's no good linux laptop choice this time, so MBA it is, because MBP is way too heavy and bulky; also 15" is the perfect size for me, and it's been forever I haven't had a 15". I'll try Yabai/AeroSpace and maybe Asahi once I get it. For personal, I do really want a linux and something that can play DOTA2. Most typical laptops (lenovo/hp/dell) pale in terms of battery life and weight. I didn't even consider ASUS and ACER as past experience (>10y ago) with them was atrocious, but @ylecun recommended the ZenBook on a previous thread, so I looked, and they seem great: highly portable (1.2kg!), superb build, and solid performance. The 14" size was what made me hesitate. I think 15" is perfect, and used almost exclusively 16" in the last decade. The 14" turns out to be fine so far, but note that I like things small: on my 16" I often set zoom to 80% for webpages. First thing I checked is DOTA, I can finally play on high settings with barely any fan - my old XPS16 with dGPU struggled on low(!). I played a full game on battery AND charged an iphone from the laptop, battery only went down about 20%. Next thing is Linux support. This is currently pretty bad, as this is one of the first laptops with LunarLake chip. Supposedly Kernel 6.12 will have good support, but current Arch is still on 6.11 ; 6.12 is at rc5, so soon. I'll still give it a shot, got some free time next couple days, will report back, but am considering WSL2 if support is really bad. Also, it has a dedicated "Intel NPU" - potentially I'll tinker around with that a bit. Finally, the ZenBook is an absolute beauty. Easily the prettiest laptop I've ever had:

@jeanmarcalkazzi β€’

**wait impatiently for @giffmana to choose his next laptop and post his feedback on performance and battery life** I'm currently using an Asus G14 with Arch. Battery life is ~3hrs max. New MacBook Pro has 24 HOURS battery life, but no i3 πŸ™ƒ, maybe live in a Ghosty terminal with Z

Media 1Media 2
+1 more
❀️237
likes
πŸ”9
retweets
πŸ–ΌοΈ Media
_
Jason Wei
@_jasonwei
πŸ“…
Oct 30, 2024
553d ago
πŸ†”45118799

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there are 4k diverse fact-seeking questions written by humans where there can only be a single, indisputable answer. Model completions are graded by an autograder as either correct, incorrect, or not attempted. 2. We created it so that it would be challenging for the current class of frontier models; both o1-preview and Claude Sonnet 3.5 are below 50% accuracy. 3. Reference answers have high correctness. Questions are written to be non-ambiguous and reference answers were verified by two independent annotators. Questions are also written to be timeless, so SimpleQA can be a useful benchmark even 5 or 10 years from now. The way that I think about evals is that they are an incentive for the AI community. New benchmarks in AI get saturated very quickly, and what they incentivize gets encoded into the next generation of language models. With a good hallucinations eval, hopefully the next wave of language models will be more trustworthy and reliable!

Media 1
❀️875
likes
πŸ”127
retweets
πŸ–ΌοΈ Media
T
ThirdPhase
@Thx0304
πŸ“…
Sun
πŸ†”01346583
⭐0.61

@Teknium1 β€œah, yes I’ve done training forever & ever: to bring you the interstitials of mad scientist genius level abstractions! …but here’s half baked lame search vomit” https://t.co/wxL2vpmNle

Media 1
❀️5
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Nov 02, 2024
550d ago
πŸ†”36682710
⭐1.00

Lots of papers on LLM hallucinations recently. Here are a few AI papers that caught my attention this week: (Bookmark to read later) Geometry of Concepts in LLMs Examines the geometric structure of concept representations in sparse autoencoders (SAEs) at three scales: 1) atomic-level parallelogram patterns between related concepts (e.g., man:woman::king:queen), 2) brain-like functional "lobes" for different types of knowledge like math/code, 3) and galaxy-level eigenvalue distributions showing a specialized structure in middle model layers. https://t.co/laQ2JxgF1j ----- Distinguishing Ignorance from Error in LLM Hallucinations A method to distinguish between two types of LLM hallucinations: when models lack knowledge (HK-) versus when they hallucinate despite having correct knowledge (HK+). They build model-specific datasets using their proposed approach and show that model-specific datasets are more effective for detecting HK+ hallucinations compared to generic datasets. https://t.co/NHCA0Ny6WQ ----- SimpleQA A challenging benchmark of 4,326 short factual questions adversarially collected against GPT-4 responses. Reports that frontier models like GPT-4o and Claude achieve less than 50% accuracy. Finds that there is a positive calibration between the model stated confidence and accuracy, signaling that they have some notion of confidence. Claims that there is still room to improve the calibration of LLMs in terms of stated confidence. https://t.co/Q4qUxAsDMk ----- The Role of Prompting and External Tools in Hallucination Rates of LLMs Tests different prompting strategies and frameworks aimed at reducing hallucinations in LLMs. Finds that simpler prompting techniques outperform more complex methods. It reports that LLM agents exhibit higher hallucination rates due to the added complexity of tool usage. https://t.co/CuTS2eMqSe ----- Automating Agentic Workflow Generation A novel framework for automating the generation of agentic workflows. It reformulates workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. It efficiently explores the search space using a variant of MCTS, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Experiments across six benchmark datasets demonstrate AFlow’s effectiveness, showing a 5.7% improvement over manually designed methods and a 19.5% improvement over existing automated approaches. AFlow also enables smaller models to outperform GPT-4o on specific tasks at just 4.55% of its inference cost. https://t.co/jkFzGTfKka ----- MrT5 Proposes a more efficient variant of byte-level language models that uses a dynamic token deletion mechanism (via a learned delete gate) to shorten sequence lengths by up to 80% while maintaining model performance. This enables faster inference and better handling of multilingual text without traditional tokenization. MrT5 maintains competitive accuracy with ByT5 on downstream tasks such as XNLI and character-level manipulations while improving inference runtimes. https://t.co/jyzLjWxLyU ----- More awesome papers in our Top ML Papers of the Week tomorrow @dair_ai

Media 1Media 2
+2 more
❀️607
likes
πŸ”155
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Nov 03, 2024
549d ago
πŸ†”40274871
⭐0.93

Fully local RAG-augmented voice chatbot πŸ“£ Check out this simple, fun terminal tool by drunkwcodes that lets you upload documents and converse with a context-augmented chatbot with voice outputs, without any external calls thanks to @ollama https://t.co/DPB47pEeHg https://t.co/qziYaY36tt

Media 1
❀️133
likes
πŸ”25
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Nov 03, 2024
549d ago
πŸ†”28649090
⭐0.95

Prepare for darkness & gloom & crime The end of daylight savings time is tonight causing street crime to go up an estimated 20%(!!) thanks to the earlier darkness. And there isn’t really any benefit, since daylight savings doesn’t even save any energy, as these two papers show. https://t.co/WwXUgIExgf

Media 1Media 2
❀️295
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Nov 03, 2024
549d ago
πŸ†”75440819

Imagine the tweets they’d write about how autocad isn’t real drafting. https://t.co/TwS8RlbogF

Media 1
❀️52
likes
πŸ”3
retweets
πŸ–ΌοΈ Media