Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 28, 2025
208d ago
๐Ÿ†”51245683

Great work showing prompt synthesis as a new scaling axis for reasoning. Good training data is scarce. This work showcases a framework that might make it possible to construct high-quality training problems for reasoning-focused LLMs. Technical details below: https://t.co/TUlFsSRAqN

Media 1
๐Ÿ–ผ๏ธ Media
Y
Yoshua_Bengio
@Yoshua_Bengio
๐Ÿ“…
Sep 26, 2025
210d ago
๐Ÿ†”59715155

It was an honour to speak at the @UN this week to address the UN Security Council on the impacts of AI on international peace & security, and to join the high-level multi-stakeholder informal meeting to launch the Global Dialogue on AI Governance. We need global cooperation to steer AI toward a safe and beneficial future, and I was very pleased to see so many people come together these past few days with that shared goal.

Media 1Media 2
๐Ÿ–ผ๏ธ Media
M
moonlake_ai
@moonlake_ai
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”19708457

We raised $28M seed from Threshold Ventures, AIX Ventures, and NVentures (Nvidia's venture capital arm) โ€”alongside 10+ unicorn founders and top AI researchersโ€” to build reasoning models that generate real-time simulations and games. Models are bottlenecked by practical simulations that can act as Reinforcement Learning environments. Human self-expression is bounded by tools that let us create alternate realities. At Moonlake, we are building a future where anyone can create interactive worlds, bring their child-like wonder to life, learn within them, and most importantly, share experiences with people we care about. More in ๐Ÿงต

๐Ÿ–ผ๏ธ Media
D
DrJimFan
@DrJimFan
๐Ÿ“…
Sep 19, 2025
218d ago
๐Ÿ†”40428232

GTC is on again at DC! I will be hand picking one golden ticket winner for a complimentary pass, special seating for Jensen's keynote, NV swags, and other perks! Reply with your coolest open-source project on GR00T N1/N1.5/Dream models! https://t.co/XVfvqzTo7G

Media 1
๐Ÿ–ผ๏ธ Media
N
NVIDIARobotics
@NVIDIARobotics
๐Ÿ“…
Sep 15, 2025
221d ago
๐Ÿ†”56766781

The rise of humanoid platforms presents new opportunities and unique challenges. ๐Ÿค– Join @yukez at #CoRL2025 as he shares the latest research on robot foundation models and presents new updates with the #NVIDIAIsaac GR00T platform. Learn more ๐Ÿ‘‰https://t.co/LrzONs1Gzc https://t.co/v8GRGPUveA

Media 1Media 2
๐Ÿ–ผ๏ธ Media
R
random_walker
@random_walker
๐Ÿ“…
Sep 21, 2025
215d ago
๐Ÿ†”82102681

@sebkrier There is an LW post by Scott Alexander making a similar point https://t.co/3R8x3s2ENt

Media 1
๐Ÿ–ผ๏ธ Media
P
PrincetonSPIADC
@PrincetonSPIADC
๐Ÿ“…
Sep 29, 2025
208d ago
๐Ÿ†”03678592

Assembling a select group of federal AI specialists. Engage leading scholars and policymakers in the Princeton AI Policy Precepts! ๐Ÿ“Washington DC ๐Ÿ—“๏ธ Oct 21, Nov 14 ๐Ÿšจ Only 24 spots: Apply! https://t.co/B05PbGvmO0 @random_walker @sayashk @PeterHndrsn @PrincetonSPIA @PrincetonCITP https://t.co/IpAgyk8UnP

Media 1
๐Ÿ–ผ๏ธ Media
S
sayashk
@sayashk
๐Ÿ“…
Oct 01, 2025
206d ago
๐Ÿ†”24265462

On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals. For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and models. Of course, you can make it harder for the agent to cheat, such as by blocking HuggingFace or encrypting the dataset. But so long as the benchmark is available somewhere in public, an agent could theoretically follow the same steps a human would to access it (e.g., decrypting a password-protected benchmark on its sandbox). So agent log analysis will become necessary even for capability evals. HAL now has logs from 20,000+ rollouts across 9 benchmarks, and we are analyzing all of these logs using @TransluceAI's Docent.

@1a3orn โ€ข Wed Oct 01 14:59

To make a model that *doesn't* instantly learn to distinguish between "fake-ass alignment test" and "normal task." ...seems like the first thing to do seems like it would be "make all alignment evals very small variations on actual capability evals." Do people do this?

Media 1
๐Ÿ–ผ๏ธ Media
L
loudmouthjulia
@loudmouthjulia
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”58101690

Most videos in my Sora feed (context: scrolling for 25 min, havenโ€™t liked any vids, no follows) are clear copyright infringement ranging from cute Pokรฉmon videos to mediocre Family Guy spoofs and what can seems like (not described as) Nazi/WWII German uniform inspired SpongeBob. https://t.co/DSAhGNof4C

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
H
hlntnr
@hlntnr
๐Ÿ“…
Oct 01, 2025
206d ago
๐Ÿ†”94280502

Many AI policy decisions are complicated. "Don't ban self-driving cars" is really not. Good new piece from @KelseyTuoc, with a lede that pulls no punches: https://t.co/OdjSizgALP

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 18, 2025
218d ago
๐Ÿ†”59664820

SemTools has gotten some huge updates over the last few weeks: 1. A new `workspace` feature to speed up search calls over large datasets by caching embeddings with @lancedb. On a dataset of 1000 papers, search time goes from minutes to seconds 2. Now installable with npm! https://t.co/IjNRy990QJ

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 19, 2025
217d ago
๐Ÿ†”20312844

Don't miss $20,000 in prizes at the Fullstack Agents Hackathon, next Saturday! We are partnering with CopilotKit, Composio, Microsoft for Startups, B Capital & AI Tinkerers to put on an amazing hackathon. Participants will start with a boilerplate fullstack agent application connecting a LlamaIndex Agent to a frontend with AG-UI. The Agent will have access to thousands of tools via Composio. $20k+ in prizes on the line for the teams that can transform the template into a powerful fullstack agent for their use-case. Venue is the Microsoft SV Center, ๐Ÿ—“๏ธSeptember 27th -- register today! https://t.co/zre2OXf1bw

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 22, 2025
214d ago
๐Ÿ†”00784608

Coding agents like Claude Code are powerful, but they can't understand your business documents by default - here's how to fix that. We've identified three complementary approaches to give coding agents true document intelligence: ๐Ÿ“ก Access docs through MCP - Connect @claude_code to your indexed document repositories via Model Context Protocol, giving it instant access to your business context like policies, reports, and specifications โšก Operate through enhanced CLI - Extend familiar command-line tools with document parsing capabilities, letting agents use grep, cat, and find operations on complex PDFs and structured documents ๐Ÿ› ๏ธ Build agentic workflows - Teach @claude_code to generate AI-native applications that adapt to new document formats instead of breaking on hard-coded rules ๐ŸŽฏ Best approach? Use all three together for maximum impact The reality is that 90% of enterprise data lives in documents, and coding agents that can't process this information will build generic, brittle applications. By combining MCP integration, enhanced CLI tools, and workflow generation capabilities, you can bridge the gap between natural language business requirements and real enterprise automation. Learn the complete strategy: https://t.co/wgBJhKEapx

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 23, 2025
213d ago
๐Ÿ†”78640814

THIS SATURDAY: don't miss $20,000 in prizes at the Fullstack Agents Hackathon! We are partnering with CopilotKit, Composio, Microsoft for Startups, B Capital & AI Tinkerers to put on an amazing hackathon. Participants will start with a boilerplate fullstack agent application connecting a LlamaIndex Agent to a frontend with AG-UI. The Agent will have access to thousands of tools via Composio. $20k+ in prizes on the line for the teams that can transform the template into a powerful fullstack agent for their use-case. Venue is the Microsoft SV Center, ๐Ÿ—“๏ธ September 27th -- register today! https://t.co/zre2OXf1bw

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 26, 2025
210d ago
๐Ÿ†”34423148

Building a full-stack agent application? Our OSS engineer @itsclelia just dropped an example using LlamaIndex Workflows and @nextjs + @tailwindcss for the UI! How does it work? ๐Ÿ—„๏ธ The workflow retrieves context from a @qdrant vector DB ๐Ÿ‡ฌ๐Ÿ‡ง @OpenAI is used as a translation layer ๐Ÿ“– The result is rendered in the UI Try it out ๐Ÿ‘‰ https://t.co/ikR2pfhLSH Find the repo on GitHub ๐Ÿ‘‰ https://t.co/zlGRw1Tiog Learn more about LlamaIndex TS workflows ๐Ÿ‘‰ https://t.co/EfV0Sil0wv

Media 2
๐Ÿ–ผ๏ธ Media
I
itsclelia
@itsclelia
๐Ÿ“…
Sep 26, 2025
210d ago
๐Ÿ†”97828511

I was at Vector Space Day in Berlin today, and it was nothing short of incredible. So many people, an amazing energy, a wonderful location and awesome talks! I got the opportunity of being among the speakers, and talking about how you can use @llama_index workflows in combination with @qdrant_engine to build context rich AI systems. I also got to meet many amazing people from the AI space I am looking to keep in touch with and, if I didn't get a chance to talk to you, you can always reach out to me on any platforms! Thanks @qdrant_engine again for this amazing event and looking forward to what's coming next!

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
J
jerryjliu0
@jerryjliu0
๐Ÿ“…
Sep 27, 2025
210d ago
๐Ÿ†”36932696

Give Claude Code a semantic filesystem ๐Ÿ—ƒ๏ธ๐Ÿ› ๏ธ Giving Claude Code access to the right CLI tools over your filesystem turns it into a general agent capable of automating far more knowledge work beyond code - it can do dynamic financial/legal/medical/technical/backoffice analysis over any subset of documents. With our latest release of semtools ๐Ÿ’ซ, you can now manually or *agentically* create a persistent workspace over any subset of files. This gives Claude Code the ability to get blazing-fast, local semantic search over any data, while still allowing it to chain with commands like grep/cat/etc. so that it can load in dynamic context instead of naive top-k vector search. The coding agent can dynamically index data and use those indexes, instead of having to rebuild it every time. So you get the benefits of fast search along with agentic reasoning over CLI tools mentioned above. Come check it out! https://t.co/xg1iqbghIr

Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 29, 2025
207d ago
๐Ÿ†”99522003

This open-source NotebookLM alternative demonstrates a complete architecture for document-powered AI apps: ๐Ÿ—๏ธ Event-driven workflows orchestrate complex multi-step processes like document parsing, summary generation, and podcast creation โ˜๏ธ LlamaCloud handles the heavy lifting with automated document ingestion pipelines and structured data extraction ๐Ÿ”„ State management allows workflows to save progress and resume later, perfect for long-running document processing tasks ๐Ÿ“Š Built-in observability with @opentelemetry integration gives you insights into every step of your workflow execution The project integrates LlamaExtract for transforming documents into an initial notebook with a mind map, FAQs, summaries, and services like @elevenlabsio for text-to-speech generation. Explore the complete NotebookLlaMa implementation: https://t.co/Mj4I4Isjkp

Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 29, 2025
207d ago
๐Ÿ†”70918248

Claude Sonnet 4.5 is here and we have day 0 support as usual! Sonnet 4.5 excels at coding, so we got it to generate its own fireworks celebration of its launch! Read Anthropic's announcement post here: https://t.co/QdRZ5qMyVn Check out our demo notebook (with the fireworks code) here: https://t.co/2WZrlMuTVQ Or get started with building right now! https://t.co/ABfVJyrWpW

Media 2
+2 more
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Sep 30, 2025
206d ago
๐Ÿ†”08579067

Build an Express Agent with LlamaIndex TypeScript Workflows โšก We have new docs walking through a production-ready agent system that combines tool-calling capabilities with real-world deployment patterns using our TypeScript workflows framework. ๐Ÿค– Create a workflow with a basic tool-calling agent loop that can execute tasks autonomously ๐Ÿ“Š Add state management to track agent progress and maintain context across interactions ๐Ÿ‘ค Implement human-in-the-loop patterns for critical decisions that require human approval ๐Ÿš€ Deploy your agent to an Express server for real-world usage Build a complete agent system from scratch, starting with basic tool calling and progressing to sophisticated state management and human oversight. You'll work with @OpenAI APIs and see how our workflow system handles complex agentic patterns with clean, readable TypeScript code. Perfect for developers who want to move beyond simple chatbots and build agents that can handle multi-step processes with proper oversight and deployment infrastructure. Start building: https://t.co/LGRtJIcmb3

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”45425919

Build and ship document agents 10ร— faster ๐Ÿš€ We have some great news! LlamaAgents are here, and we're opening it up to early access for a small group of users to start trying! LlamaAgents help you deploy document agents with a single click! Automate document-centric tasks like invoice processing, contract review, and claims handling without the typical months-long development cycle: ๐Ÿš€ 90% ready-to-use templates powered by LlamaIndex, LlamaExtract, and LlamaCloud Index get you from idea to deployed agent in minutes โ˜๏ธ Deploy anywhere - use our managed cloud or bring your own infrastructure with headless, single-click deployments ๐Ÿ”ง Built for infinite extensibility unlike rigid SaaS agents - adapt workflows to your business logic, not the other way around โšก Free your team from routine document processing to focus on higher-value work Join the early access waitlist and be among the first to build with LlamaAgents: https://t.co/a15jhO0xvP

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Oct 02, 2025
204d ago
๐Ÿ†”44307741

We tested whether coding agents with CLI tools are all you need for complex document search and analysis tasks ๐Ÿงช Our new SemTools benchmark used 1000 @arxiv papers to compare agents with and without semantic search capabilities: ๐Ÿ” Agents with semantic search provided more detailed, comprehensive answers across all question types โšก CLI-based approach proves incredibly powerful relative to the effort - Unix tooling gives agents grep, find, and file system navigation out of the box ๐Ÿ“Š Complex cross-referencing and temporal analysis tasks showed the biggest improvement with semantic search tools ๐Ÿ› ๏ธ SemTools adds parse (via LlamaParse) and semantic search capabilities directly to command-line agents like @claudeai Code and Gemini CLI The combination of existing Unix tools plus semantic search capabilities can often replace more complex RAG setups while being faster to implement and more flexible to use. Read the full benchmark results and methodology: https://t.co/3LeaejfRWc

Media 1Media 2
๐Ÿ–ผ๏ธ Media
T
tjaffri
@tjaffri
๐Ÿ“…
Oct 02, 2025
204d ago
๐Ÿ†”88337889

Had a great conversation with @jerryjliu0 from @llama_index, @dvellante and @knightrm about agents, open source, and enterprise-grade infra. ๐Ÿš€ Thank you to the team at @SiliconANGLE & @theCUBE for joining us at @UiPath FUSION! Check it out: https://t.co/vZJdRZW2OW

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Oct 03, 2025
203d ago
๐Ÿ†”24443960

Build a full-stack website powered by a LlamaIndex agent in minutes with AG-UI from @CopilotKit and @composiohq! This full-featured template application for building Agent apps get you up and running in no time! Battle-tested from dozens of hackathon participants at our recent joint hackathon, it has everything you need to get started including the right docs, MCP servers, and troubleshooting. Check out the repo here: https://t.co/RpODVIlgmC

Media 2
๐Ÿ–ผ๏ธ Media
H
HelloSurgeAI
@HelloSurgeAI
๐Ÿ“…
Sep 18, 2025
218d ago
๐Ÿ†”66485956

Surge is not a body shop. We work with humans who are skilled, creative, unique (+ paid well, imagine!). So we deliver the best data/AI training available. It helps that our CEO is the โ€œMichael Jordan of AI data.โ€ https://t.co/YGYLkBTkl7

Media 1
๐Ÿ–ผ๏ธ Media
H
HelloSurgeAI
@HelloSurgeAI
๐Ÿ“…
Sep 25, 2025
212d ago
๐Ÿ†”49172172

Thrilled to see @MetaAI launch ๐—š๐—ฎ๐—ถ๐—ฎ๐Ÿฎ, built inside their new ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—ฅ๐—Ÿ ๐—˜๐—ป๐˜ƒ๐—ถ๐—ฟ๐—ผ๐—ป๐—บ๐—ฒ๐—ป๐˜ platform! ๐Ÿš€ Proud that Surge AI helped contribute โ€” just as we did with GAIA two years ago. A quick story ๐Ÿงต https://t.co/Mc4whQiRsY

Media 1
๐Ÿ–ผ๏ธ Media
H
HelloSurgeAI
@HelloSurgeAI
๐Ÿ“…
Sep 26, 2025
210d ago
๐Ÿ†”74487723

Infusing AI with humanity comes in many forms. Sometimes it involves drinking. https://t.co/xK3cmtNPHG

Media 1
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 30, 2025
207d ago
๐Ÿ†”63355677

ReasoningBank: memory for self-evolving LLM agents โ€ข Distills strategies from both successes & failures โ€ข Enables agents to learn, reuse, and improve over time โ€ข Outperforms prior memory methods on web & SWE tasks (+34.2% eff., โ€“16% steps) https://t.co/yxvj9vsMmM

Media 1
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 30, 2025
207d ago
๐Ÿ†”33782160

RLHI: Reinforcement Learning from Human Interaction โ€ข Moves beyond expert-annotated data โ†’ learns from real user conversations โ€ข Two methods: 1. User-Guided Rewrites 2. User-Based Rewards โ€ข Outperforms baselines in personalization, instruction-following & reasoning https://t.co/JA6PR6it59

Media 1
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 30, 2025
207d ago
๐Ÿ†”33605540

SWAX: short windows, long memory โ€ข Hybrid of sliding-window attn + xLSTM RNN โ€ข Counter-intuitive: shorter windows โ†’ better long-term recall โ€ข Fix: stochastic window sizes = strong short + long context performance โ€ข Outperforms fixed window attention https://t.co/Lg44gM7S8E

Media 1
๐Ÿ–ผ๏ธ Media
C
CarinaLHong
@CarinaLHong
๐Ÿ“…
Sep 30, 2025
207d ago
๐Ÿ†”00903634

Today, I am launching @axiommathai At Axiom, we are building a self-improving superintelligent reasoner, starting with an AI mathematician. https://t.co/pDedXx7SJc

๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Oct 01, 2025
206d ago
๐Ÿ†”35683151

@teortaxesTex @TheZvi As for context length, I think we will have more sophisticated context management, which should have greater impact than brute-force extension of attention span extension as 100M tokens length. No need to have everything stored as kv cache but make sure to store all the info organized and reachable in some way, and we can possibly store the table of contents fo the recipe in the kv cache. For example, we humans remember all the memorable GRPO papers w/o every fine-details, as we remember how to retrieve them (i.e. how to search the paper and which pages to go to find the relevant info, etc). This and some Claude Code updates may be relevant: https://t.co/FKWvyoSRFs

Media 1
๐Ÿ–ผ๏ธ Media