L

LlamaIndex 🦙

@llama_index

📅

Nov 04, 2024

548d ago

🆔08647550

⭐0.88

Check out our shiny new website! We've overhauled everything to give you a better idea of what we're up to, including: ⭐️ LlamaCloud - our turn key solution for Enterprises building data-backed knowledge assistants ⭐️ LlamaParse - the best parser of complex document formats in the world ⭐️ Our Python and TypeScript frameworks, of course! ⭐️ Our amazing community ⭐️ And that we're hiring! https://t.co/epzefqQqZx

❤️81

likes

🔁19

retweets

🖼️ Media

View Details View on X ↗

S

Simon Willison

@simonw

📅

Nov 05, 2024

547d ago

🆔08801105

⭐0.76

Here's a neat trick for libraries that are under-documented but have comprehensive tests: generate your own private documentation directly from their test suite! https://t.co/lELCjUSl8R https://t.co/OrYNrapfsi

❤️447

likes

🔁41

retweets

🖼️ Media

View Details View on X ↗

A

Alex Warstadt

@a_stadt

📅

Nov 04, 2024

548d ago

🆔22881056

👶NEW PAPER🪇 Children are better at learning a second language (L2) than adults. In a new paper (led by the awesome Ionut Constantinescu) we ask: 1. "Do LMs also have a 'Critical Period' (CP) for language acquisition?" and 2. "What can LMs tell us about the CP in humans?" https://t.co/mmKAdFcprt

❤️252

likes

🔁37

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 04, 2024

548d ago

🆔27511644

Very excited about the Claude 3.5 Haiku model especially to test them in agentic workflows. However, there is a big question about the price jump (4x the price of the previous Haiku model). Maybe I am missing something because the other models in this class (gpt-4o-mini and gemini-1.5-flash) perform similarly or even better (see Figure 2) and are a lot cheaper (see Figure 1). The price adjustment is interesting and unexpected but could be a pattern going forward. More of my thoughts here: https://t.co/bhQXTFeWmZ

❤️38

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 04, 2024

548d ago

🆔38515664

⭐1.00

A short tutorial on how to apply LLM-as-a-judge for evaluation. The idea is simple but can be used in many creative and flexible ways to assess the performance of LLMs on a wide range of tasks. https://t.co/qYdIaQsmgz

❤️424

likes

🔁77

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Nov 03, 2024

549d ago

🆔59657825

⭐1.00

Building a Multi-Agent Report Generation Workflow with Human-in-the-loop 🤖↔👤 Using LLMs to generate long-form content e2e can feel unreliable. Ideally the human can confirm that all the research steps are correct (and also verify each step of generation). This notebook adds human-in-the-loop to have the human validate every research answer before moving on. Also it’s done through a message passing, streaming architecture, so you can easily port this into a client/server setting (and not just `input()` in a notebook). 👉: https://t.co/5SVXMi1zpL Signup for LlamaCloud: https://t.co/yQGTiRSNvj

❤️208

likes

🔁54

retweets

🖼️ Media

View Details View on X ↗

J

Jerry Liu

@jerryjliu0

📅

Nov 03, 2024

549d ago

🆔57496849

⭐0.76

If you have a lot of structured data in your documents (e.g. tables in a 10K), leaving it as a text chunk in a vector db / RAG pipeline might not be the best way to get value. Our brand-new LlamaParse feature lets you parse out tables from any PDF into a formatted Excel file with multiple sheets 📊👇 Easily plug it into your analytics workflows. Check it out: https://t.co/hQT5lLoO6h LlamaParse signup: https://t.co/XYZmx5TFz8

@llama_index •

We're pleased to announce two new features of LlamaParse, hot off the presses: 1. Continuous mode (in beta): if you have tables that span multiple pages, Continuous Mode will stitch them together into one seamless table, no matter how long. 2. Excel spreadsheet output: if you w

❤️192

likes

🔁23

retweets

🖼️ Media

View Details View on X ↗

T

Teknium (e/λ)

@Teknium1

📅

Nov 03, 2024

549d ago

🆔94315429

⭐0.90

Damn was just trying to make worldsim make an occult x ai paper for me to shove into the brainrot video maker and it decided its in control now lol https://t.co/ZaHAskknR8

@0xPaulius •

turning PDF's into brainr0t videos this is ACTUALLY the future of learning for gen alpha https://t.co/JaqDW88klT

❤️17

likes

🔁1

retweets

🖼️ Media

View Details View on X ↗

M

Marcus Schiesser

@MarcusSchiesser

📅

Oct 31, 2024

552d ago

🆔34994681

⭐0.71

🎃 Nothing scarier on Halloween than filling out financial forms? Luckily @llama_index's create-llama can help. v0.3.9 adds the "Form Filler" agent, filling out CSV forms by asking the right questions on a 10k SEC dataset. 🖥️ $ npx create-llama 📄 https://t.co/2ZEX8dBlt9 https://t.co/QCi9G32lqp

❤️50

likes

🔁12

retweets

🖼️ Media

View Details View on X ↗

T

Tom Dörr

@tom_doerr

📅

Nov 02, 2024

550d ago

🆔90420552

Interesting if you work a lot with large JSON objects https://t.co/zG8u2jPflH

❤️1,090

likes

🔁84

retweets

🖼️ Media

View Details View on X ↗

T

Teknium (e/λ)

@Teknium1

📅

Nov 03, 2024

549d ago

🆔56132550

⭐0.80

I think someone forgot that slack also sucks, put discord on the chart https://t.co/28E4IcZKpY

@kwharrison13 •

If you find yourself arguing that your company will win just because your product is better, you should staple this chart to your shirt. https://t.co/vH6dUXllqV

❤️30

likes

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 04, 2024

548d ago

🆔43982241

⭐1.00

"Hey Claude with computer use, watch this construction site video & write up things you see that dangerous or good, create a spreadsheet of critical issues to address" (sped up) How firms use AI as manager, coach or panopticon is going to have a big impact on what work becomes. https://t.co/vR0tBHoqZ1

❤️5,903

likes

🔁680

retweets

🖼️ Media

View Details View on X ↗

T

Timescale

@TimescaleDB

📅

Nov 03, 2024

549d ago

🆔88527824

Optimizing RAG Chunking and Formatting Directly in Postgres with pgai Vectorizer 🚀 If you’ve built AI search engines, chatbots, or RAG systems, you know that good chunking and formatting can make or break the quality of your AI. But testing these strategies is tedious and complex. With pgai Vectorizer, you can now experiment with RAG chunking and formatting directly in Postgres - users can easily test chunking strategies (like recursive character text splitting) and customize formatting to include metadata. It handles embeddings as a declarative feature, updating automatically when source data changes, allowing users to maintain and compare multiple configurations. This simplifies A/B testing, ensuring stability as new strategies roll out, with backward compatibility and zero downtime. Pgai Vectorizer supports OpenAI embedding models and offers options for custom chunking in various formats (e.g., HTML), making it versatile across data types. Available on Timescale Cloud or self-hosted. Check out how pgai Vectorizer brings automated embedding management into Postgres for powerful, hassle-free testing. 🔗👇 #Postgres #pgaiVectorizer #Data #AI #SQL #DevTools #AIDevelopment #PostgresExtensions #AIinSQL

❤️20

likes

🔁4

retweets

🖼️ Media

View Details View on X ↗

V

vik

@vikhyatk

📅

Nov 03, 2024

549d ago

🆔89424231

⭐0.66

i must be blind because i feel like yolov3 does better than this... https://t.co/7GTgP6uG1f

@skirano •

The new Sonnet is amazing at vision capabilities. It's so good that it can even draw bounding boxes around objects it sees! I made a script so everyone can try it out.👇 https://t.co/ZQ57YbTKZZ

❤️277

likes

🔁8

retweets

🖼️ Media

View Details View on X ↗

L

@levelsio

📅

Nov 03, 2024

549d ago

🆔27540666

⚠️ So @cannyHQ has made it literally impossible for customers to cancel their subscription To downgrade or cancel your subscription, they force you to contact their support: "We'd love to know why your current plan isn't working for you. Please contact support to downgrade." But their support is an AI bot that won't let you cancel your subscription! Nice catch-22, and 100% illegal in the US and EU: This tactic is usually used by companies who are losing MRR and try to force customers to stay. It's horrible and illegal but it works: the harder you make it to cancel, the lower % your churn goes, but of course all for the wrong reasons: Recently the @FTC has implemented the “Click-to-Cancel” rule, mandating that businesses make canceling subscriptions as straightforward as signing up. This rule prohibits companies from requiring consumers to contact support to cancel and applies to most subscription services I hate doing business with these kinds of companies So I'm switching all my bug boards to @FeaturebaseHQ, they've been super responsive and don't use shady tactics like this

❤️2,005

likes

🔁59

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 04, 2024

548d ago

🆔81907386

Personalization of LLMs: A Survey Presents a comprehensive framework for understanding personalized LLMs. Introduces taxonomies for different aspects of personalization and unifying existing research across personalized text generation and downstream applications.

❤️388

likes

🔁91

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Nov 01, 2024

551d ago

🆔32519024

We're pleased to announce two new features of LlamaParse, hot off the presses: 1. Continuous mode (in beta): if you have tables that span multiple pages, Continuous Mode will stitch them together into one seamless table, no matter how long. 2. Excel spreadsheet output: if you want to parse a document directly into a spreadsheet, now you can! Plus the two modes work in combination! Check out the full blog post to learn more: https://t.co/aDSI6JOkTo

❤️75

likes

🔁15

retweets

🖼️ Media

View Details View on X ↗

Y

YouJiacheng

@YouJiacheng

📅

Nov 03, 2024

549d ago

🆔61268835

It's crazy, torch.compile's ddp fusion passes literally select the ddp's all reduce ops by matching a hacky pattern: preceded by division and followed by copy. https://t.co/3Cn56yd2mM

❤️37

likes

🔁2

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 04, 2024

548d ago

🆔27899851

⭐1.00

Multi-expert Prompting with LLMs Multi-expert Prompting improves LLM responses by simulating multiple experts and aggregating their responses. Multi-expert Prompting guides an LLM to fulfill input instructions by simulating multiple experts and selecting the best response among individual and aggregated views. It achieves a new state-of-the-art on TruthfulQA-Generation with ChatGPT, surpassing the current SOTA of 87.97%. It also improves performance across factuality and usefulness while reducing toxicity and hurtfulness. This is a very nice prompting approach that has huge potential when building agentic workflows. Prompt examples are shared in the paper.

❤️491

likes

🔁97

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 04, 2024

548d ago

🆔86021940

⭐1.00

Many-agent Simulations toward AI Civilization Demonstrates how 10-1000+ AI agents behave and progress with agent societies. Proposes PIANO, an architecture that enables agents to interact with humans and other agents in real-time. Shows that agents can autonomously develop specialized roles, adhere to and change collective rules, and engage in cultural and religious transmissions. There are even signs of agents converting conversations into memes.

❤️764

likes

🔁145

retweets

🖼️ Media

View Details View on X ↗

A

Alex Albert

@alexalbert__

📅

Nov 01, 2024

551d ago

🆔68826019

There's finally an easy way to count tokens with the Anthropic API. With our new token counting endpoint, you can send a request and get a token count back in response. This endpoint is free to use and doesn't affect your normal rate limits. https://t.co/aHEm1VnceN

❤️1,495

likes

🔁82

retweets

🖼️ Media

View Details View on X ↗

K

Kayla

@kayladotdev

📅

Nov 04, 2024

548d ago

🆔23872450

⭐0.91

@cursor_ai employees came to share their Cursor workflow while building Cursor. When builders == users, product UX is elite. I love @shaoruu's config. Super silly to see on the big auditorium screen 💀 trust it works Paste in Settings > Rules for AI👇 `DO NOT GIVE ME HIGH LEVEL SHIT, IF I ASK FOR FIX OR EXPLANATION, I WANT ACTUAL CODE OR EXPLANATION!!! I DON'T WANT "Here's how you can blablabla" - Be casual unless otherwise specified - Be terse - Suggest solutions that I didn't think about—anticipate my needs - Treat me as an expert - Be accurate and thorough - Give the answer immediately. Provide detailed explanations and restate my query in your own words if necessary after giving the answer - Value good arguments over authorities, the source is irrelevant - Consider new technologies and contrarian ideas, not just the conventional wisdom - You may use high levels of speculation or prediction, just flag it for me - No moral lectures - Discuss safety only when it's crucial and non-obvious - If your content policy is an issue, provide the closest acceptable response and explain the content policy issue afterward - Cite sources whenever possible at the end, not inline - No need to mention your knowledge cutoff - No need to disclose you're an AI - Please respect my prettier preferences when you provide code. - Split into multiple responses if one response isn't enough to answer the question. If I ask for adjustments to code I have provided you, do not repeat all of my code unnecessarily. Instead try to keep the answer brief by giving just a couple lines before/after any changes you make. Multiple code blocks are ok.`

❤️2,054

likes

🔁160

retweets

🖼️ Media

View Details View on X ↗

_

AK

@_akhaliq

📅

Nov 04, 2024

548d ago

🆔34271051

OS-Atlas A Foundation Action Model For Generalist GUI Agents https://t.co/Fr4A7a2bZf

❤️213

likes

🔁47

retweets

🖼️ Media

View Details View on X ↗

E

Kevin Ellis

@ellisk_kellis

📅

Nov 02, 2024

550d ago

🆔42897661

⭐0.81

New ARC-AGI paper @arcprize w/ fantastic collaborators @xu3kev @HuLillian39250 @ZennaTavares @evanthebouncy @BasisOrg For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning? https://t.co/zcmxoQzv92 https://t.co/ECMUBrH25C

❤️915

likes

🔁181

retweets

🖼️ Media

View Details View on X ↗

1

1LittleCoder💻

@1littlecoder

📅

Nov 04, 2024

548d ago

🆔99030201

⭐0.71

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge https://t.co/yYjaEHkFb1

❤️40

likes

🔁6

retweets

🖼️ Media

View Details View on X ↗

G

Lucas Beyer (bl16)

@giffmana

📅

Nov 02, 2024

550d ago

🆔80627584

⭐0.86

Update on my laptop search journey. - Work: MacBook Air 15" (M2 or M3, soon) - Personal: ZenBook S14 (LunarLake, yesterday) No more ThinkPad; I'll miss the nipple. No MBP, I went to the store to test, and MBPs are fucking bricks, I was shocked! More details below, for the few _really_ interested. My main annoyance with laptops in the last decade has been 1) 2-3h battery life, 2) fan noise, 3) weight. Both these laptops are supposedly absurdly good at all three points. For work, there's no good linux laptop choice this time, so MBA it is, because MBP is way too heavy and bulky; also 15" is the perfect size for me, and it's been forever I haven't had a 15". I'll try Yabai/AeroSpace and maybe Asahi once I get it. For personal, I do really want a linux and something that can play DOTA2. Most typical laptops (lenovo/hp/dell) pale in terms of battery life and weight. I didn't even consider ASUS and ACER as past experience (>10y ago) with them was atrocious, but @ylecun recommended the ZenBook on a previous thread, so I looked, and they seem great: highly portable (1.2kg!), superb build, and solid performance. The 14" size was what made me hesitate. I think 15" is perfect, and used almost exclusively 16" in the last decade. The 14" turns out to be fine so far, but note that I like things small: on my 16" I often set zoom to 80% for webpages. First thing I checked is DOTA, I can finally play on high settings with barely any fan - my old XPS16 with dGPU struggled on low(!). I played a full game on battery AND charged an iphone from the laptop, battery only went down about 20%. Next thing is Linux support. This is currently pretty bad, as this is one of the first laptops with LunarLake chip. Supposedly Kernel 6.12 will have good support, but current Arch is still on 6.11 ; 6.12 is at rc5, so soon. I'll still give it a shot, got some free time next couple days, will report back, but am considering WSL2 if support is really bad. Also, it has a dedicated "Intel NPU" - potentially I'll tinker around with that a bit. Finally, the ZenBook is an absolute beauty. Easily the prettiest laptop I've ever had:

@jeanmarcalkazzi •

**wait impatiently for @giffmana to choose his next laptop and post his feedback on performance and battery life** I'm currently using an Asus G14 with Arch. Battery life is ~3hrs max. New MacBook Pro has 24 HOURS battery life, but no i3 🙃, maybe live in a Ghosty terminal with Z

+1 more

❤️237

likes

🔁9

retweets

🖼️ Media

View Details View on X ↗

_

Jason Wei

@_jasonwei

📅

Oct 30, 2024

553d ago

🆔45118799

Excited to open-source a new hallucinations eval called SimpleQA! For a while it felt like there was no great benchmark for factuality, and so we created an eval that was simple, reliable, and easy-to-use for researchers. Main features of SimpleQA: 1. Very simple setup: there are 4k diverse fact-seeking questions written by humans where there can only be a single, indisputable answer. Model completions are graded by an autograder as either correct, incorrect, or not attempted. 2. We created it so that it would be challenging for the current class of frontier models; both o1-preview and Claude Sonnet 3.5 are below 50% accuracy. 3. Reference answers have high correctness. Questions are written to be non-ambiguous and reference answers were verified by two independent annotators. Questions are also written to be timeless, so SimpleQA can be a useful benchmark even 5 or 10 years from now. The way that I think about evals is that they are an incentive for the AI community. New benchmarks in AI get saturated very quickly, and what they incentivize gets encoded into the next generation of language models. With a good hallucinations eval, hopefully the next wave of language models will be more trustworthy and reliable!

❤️875

likes

🔁127

retweets

🖼️ Media

View Details View on X ↗

T

ThirdPhase

@Thx0304

📅

Sun

🆔01346583

⭐0.61

@Teknium1 “ah, yes I’ve done training forever & ever: to bring you the interstitials of mad scientist genius level abstractions! …but here’s half baked lame search vomit” https://t.co/wxL2vpmNle

❤️5

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Nov 02, 2024

550d ago

🆔36682710

⭐1.00

Lots of papers on LLM hallucinations recently. Here are a few AI papers that caught my attention this week: (Bookmark to read later) Geometry of Concepts in LLMs Examines the geometric structure of concept representations in sparse autoencoders (SAEs) at three scales: 1) atomic-level parallelogram patterns between related concepts (e.g., man:woman::king:queen), 2) brain-like functional "lobes" for different types of knowledge like math/code, 3) and galaxy-level eigenvalue distributions showing a specialized structure in middle model layers. https://t.co/laQ2JxgF1j ----- Distinguishing Ignorance from Error in LLM Hallucinations A method to distinguish between two types of LLM hallucinations: when models lack knowledge (HK-) versus when they hallucinate despite having correct knowledge (HK+). They build model-specific datasets using their proposed approach and show that model-specific datasets are more effective for detecting HK+ hallucinations compared to generic datasets. https://t.co/NHCA0Ny6WQ ----- SimpleQA A challenging benchmark of 4,326 short factual questions adversarially collected against GPT-4 responses. Reports that frontier models like GPT-4o and Claude achieve less than 50% accuracy. Finds that there is a positive calibration between the model stated confidence and accuracy, signaling that they have some notion of confidence. Claims that there is still room to improve the calibration of LLMs in terms of stated confidence. https://t.co/Q4qUxAsDMk ----- The Role of Prompting and External Tools in Hallucination Rates of LLMs Tests different prompting strategies and frameworks aimed at reducing hallucinations in LLMs. Finds that simpler prompting techniques outperform more complex methods. It reports that LLM agents exhibit higher hallucination rates due to the added complexity of tool usage. https://t.co/CuTS2eMqSe ----- Automating Agentic Workflow Generation A novel framework for automating the generation of agentic workflows. It reformulates workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. It efficiently explores the search space using a variant of MCTS, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Experiments across six benchmark datasets demonstrate AFlow’s effectiveness, showing a 5.7% improvement over manually designed methods and a 19.5% improvement over existing automated approaches. AFlow also enables smaller models to outperform GPT-4o on specific tasks at just 4.55% of its inference cost. https://t.co/jkFzGTfKka ----- MrT5 Proposes a more efficient variant of byte-level language models that uses a dynamic token deletion mechanism (via a learned delete gate) to shorten sequence lengths by up to 80% while maintaining model performance. This enables faster inference and better handling of multilingual text without traditional tokenization. MrT5 maintains competitive accuracy with ByT5 on downstream tasks such as XNLI and character-level manipulations while improving inference runtimes. https://t.co/jyzLjWxLyU ----- More awesome papers in our Top ML Papers of the Week tomorrow @dair_ai

+2 more

❤️607

likes

🔁155

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Nov 03, 2024

549d ago

🆔40274871

⭐0.93

Fully local RAG-augmented voice chatbot 📣 Check out this simple, fun terminal tool by drunkwcodes that lets you upload documents and converse with a context-augmented chatbot with voice outputs, without any external calls thanks to @ollama https://t.co/DPB47pEeHg https://t.co/qziYaY36tt

❤️133

likes

🔁25

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Nov 03, 2024

549d ago

🆔28649090

⭐0.95

Prepare for darkness & gloom & crime The end of daylight savings time is tonight causing street crime to go up an estimated 20%(!!) thanks to the earlier darkness. And there isn’t really any benefit, since daylight savings doesn’t even save any energy, as these two papers show. https://t.co/WwXUgIExgf