Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
πŸ”drfeifei retweeted
B
Ben Mildenhall
@BenMildenhall
πŸ“…
Mar 03, 2026
6d ago
πŸ†”55852964
⭐0.36

We don't expect LLMs to multiply numbers or sort lists directly within their output token stream. Instead, we ask them emit code and execute it in a separate runtime. Why predict the opposite outcome for simulating interactive worlds? https://t.co/b2QNOBTWjN

❀️251
likes
πŸ”19
retweets
πŸ”ai_fast_track retweeted
L
Liquid AI
@liquidai
πŸ“…
Mar 05, 2026
5d ago
πŸ†”89086198
⭐0.34

> 385ms average tool selection. > 67 tools across 13 MCP servers. > 14.5GB memory footprint. > Zero network calls. LocalCowork is an AI agent that runs on a MacBook. Open source. 🧡 https://t.co/bnXupspSXc

❀️1,241
likes
πŸ”123
retweets
P
Prince_Canuma
@Prince_Canuma
πŸ“…
Mar 02, 2026
7d ago
πŸ†”69466787

already on mlx :) https://t.co/NXxd7hAWMh

Media 1
πŸ–ΌοΈ Media
H
HuggingModels
@HuggingModels
πŸ“…
Mar 05, 2026
5d ago
πŸ†”31773642

Meet GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill: a distilled powerhouse that brings elite reasoning to local machines. This GGUF model delivers Claude-level intelligence in a compact package, perfect for developers wanting high-performance AI without cloud costs. https://t.co/Q0HCPTI2oe

Media 1
πŸ–ΌοΈ Media
0
0xCVYH
@0xCVYH
πŸ“…
Mar 03, 2026
7d ago
πŸ†”47784783

Claude Code acabou de lancar Voice Mode. Voce fala. O agente de IA codifica. "/voice" pra ativar. Rollout pra 5% dos usuarios agora, expandindo nas proximas semanas. Hoje: KREA AI Voice no iPad. Claude Code Voice no terminal. A era da programacao por voz chegou. https://t.co/9adiksDX0r

πŸ–ΌοΈ Media
J
johnrobinsn
@johnrobinsn
πŸ“…
Mar 03, 2026
7d ago
πŸ†”66316497

Comprehensive Python API for Google NotebookLM. Full programmatic access to NotebookLM's featuresβ€”including capabilities the web UI doesn't exposeβ€”from Python or the command line. https://t.co/5YQhAKiGuD

Media 1
πŸ–ΌοΈ Media
A
Alibaba_Qwen
@Alibaba_Qwen
πŸ“…
Mar 02, 2026
8d ago
πŸ†”10965160

πŸš€ Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B Β· Qwen3.5-2B Β· Qwen3.5-4B Β· Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation β€” native multimodal, improved architecture, scaled RL: β€’ 0.8B / 2B β†’ tiny, fast, great for edge device β€’ 4B β†’ a surprisingly strong multimodal base for lightweight agents β€’ 9B β†’ compact, but already closing the gap with much larger models And yes β€” we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: https://t.co/wFMdX5pDjU ModelScope: https://t.co/9NGXcIdCWI

Media 1Media 2
πŸ–ΌοΈ Media
A
AlphaSignalAI
@AlphaSignalAI
πŸ“…
Mar 04, 2026
5d ago
πŸ†”21405344

A trillion-parameter model just made half its brain disappear. It got smarter. Yuan3.0 Ultra is a new open-source multimodal MoE model from Yuan Lab. 1010B total parameters, only 68.8B active at inference. It beat GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on RAG benchmarks by wide margins. 67.4% on Docmatix vs GPT-4o's 56.8%. Here's what it unlocks: > Enterprise RAG with 68.2% avg accuracy across 10 retrieval tasks > Complex table understanding at 62.3% on MMTab > Text-to-SQL generation scoring 83.9% on Spider 1.0 > Multimodal doc analysis with a 64K context window The key innovation: Layer-Adaptive Expert Pruning (LAEP). During pretraining, expert token loads become wildly imbalanced. Some experts get 500x more tokens than others. LAEP prunes the underused ones layer by layer, cutting 33% of parameters while boosting training efficiency by 49%. They also refined "fast-thinking" RL. Correct answers with fewer reasoning steps get rewarded more. This cut output tokens by 14.38% while improving accuracy by 16.33%. The bigger signal here: MoE models are learning to self-compress during training, not after. If pruning becomes part of pretraining, the cost curve for trillion-scale models shifts dramatically.

Media 1
πŸ–ΌοΈ Media
C
cursor_ai
@cursor_ai
πŸ“…
Mar 05, 2026
4d ago
πŸ†”86856663

We're introducing Cursor Automations to build always-on agents. https://t.co/uxgTbncJlM

πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Mar 04, 2026
6d ago
πŸ†”96519070

Pay close attention to proactive AI agents. This is one of the wildest applications of agent harnesses I've seen. The MIT paper introduces NeuroSkill, a real-time agentic system that models human cognitive and emotional state by integrating Brain-Computer Interface signals with foundation models. "Human State of Mind" provided via SKILL dot md. The system runs fully offline on the edge. Its NeuroLoop harness enables agentic workflows that engage users across cognitive and emotional levels, responding to both explicit and implicit requests through actionable tool calls. Why does it matter? Most AI agents respond only to explicit user requests. NeuroSkill explores the frontier of proactive agents that sense and respond to implicit human states, opening new possibilities for adaptive human-AI interaction. Paper: https://t.co/kO3Ie2Dbvz Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Media 1
πŸ–ΌοΈ Media
D
dair_ai
@dair_ai
πŸ“…
Mar 04, 2026
6d ago
πŸ†”56234562

Interesting new research on LLM agent memory. Agent engineers, pay attention to this one. (bookmark it) It introduces a diagnostic framework that separates retrieval failures from utilization failures in agent memory systems. The main findings: - Retrieval method matters far more than how you write memories. - Accuracy varies 20 percentage points across retrieval approaches but only 3-8 points across writing strategies. - Simple raw chunking matches or outperforms expensive alternatives like Mem0-style fact extraction or MemGPT-style summarization. Teams investing heavily in sophisticated memory writing pipelines may be optimizing the wrong thing. Improving retrieval quality yields larger gains than increasing write-time sophistication. Paper: https://t.co/ZZvtsJXIJp Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 03, 2026
6d ago
πŸ†”79048472

Most agents don’t fail on models… they fail on context: those ugly, messy, complex documents that trip up even the latest LLMs (PDFs, tables, messy scans). Don't worry. We got you. πŸš€ VC-backed (seed+) startup? Join the LlamaParse Startup Program: βœ… free credits βœ… dedicated slack channel + priority support βœ… alignment call with our founder Jerry Liu βœ… community spotlight (millions of devs) βœ… production-ready ingestion pipelines Apply today spots are limited β†’ https://t.co/61csPhQULp

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 03, 2026
6d ago
πŸ†”69269691

LlamaIndex has evolved far beyond a RAG framework - we're now focused on agentic document processing that automates knowledge work. πŸš€ Agent orchestration has fundamentally changed with sophisticated reasoning loops, tool discovery through Skills/MCP, and coding agents that write Python for you πŸ“„ Document understanding remains a massive opportunity - frontier vision models still struggle with complex tables, charts, and long documents at scale 🏒 LlamaParse now processes 300k+ users across 50+ formats for enterprises like @OneCarlyle, @CEMEX, and @KPMG with multi-agent workflows combining OCR, computer vision, and LLM reasoning βš™οΈ Real automation potential exists in workflows where humans manually process documents daily - financial analysis, contract review, insurance underwriting can all become end-to-end agentic processes Our mission is now providing core infrastructure to automate knowledge work over documents, not just being connective tissue between LLMs and data. Read about our evolution and what's next: https://t.co/M0DbsIdGrF

Media 1Media 2
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 05, 2026
5d ago
πŸ†”30425369

Adobe Acrobat has PDF splitting. We have agentic PDF splitting πŸ€–βœ‚οΈ Simply define the categories you want in natural language, and our split agent will automatically β€œchunk” the document into subsets of pages and tag them with the appropriate categories. This is super useful to break apart complicated document packets like resumes, tax forms, identification docs, expense reports, and more. Check out @itsclelia’s video below, and come sign up to LlamaParse if you’re interested! Docs: https://t.co/UdxT3sJfkF LlamaParse: https://t.co/TqP6OT5U5O

πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 05, 2026
4d ago
πŸ†”65563933

I love the Big Arch Burger πŸ” I also love Big Harnessesβ„’ and Big Complex PDFsβ„’ with hundreds of pages of tables, images and forms. https://t.co/deD8sUcyj0

πŸ–ΌοΈ Media
πŸ”llama_index retweeted
J
Jerry Liu
@jerryjliu0
πŸ“…
Mar 05, 2026
4d ago
πŸ†”65563933
⭐0.32

I love the Big Arch Burger πŸ” I also love Big Harnessesβ„’ and Big Complex PDFsβ„’ with hundreds of pages of tables, images and forms. https://t.co/deD8sUcyj0

❀️67
likes
πŸ”4
retweets
M
Modular
@Modular
πŸ“…
Mar 03, 2026
6d ago
πŸ†”76941593

MAX is how Modular is rethinking the AI stack from first principles, bringing together modeling, performance, and portability in one open framework. Hear directly from our co-founder and CEO @clattner_llvm on why the stack needs to evolve and what that means for the future of AI infrastructure.

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Mar 05, 2026
4d ago
πŸ†”12130301

You shouldn't have to choose between peak GPU performance and code you can actually maintain. We built Structured Mojo πŸ”₯ Kernels to fix that. Performance, usability, and portability without the tradeoff. 14k to 7k lines. ~1.8k TFLOPS held. We wrote a 4-part series on how. Part 1 is up https://t.co/zMYWMfDOb2

Media 1
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Mar 05, 2026
4d ago
πŸ†”82976280

Given the GDPval benchmark for GPT-5.4, I've updated this chart, the new model ties or beats humans as judged by other experts at professional tasks 82% of the time If you give a 7 hour task to AI, even with failure rates and the need to check results, you'd save 4h 38m average https://t.co/U4PQSArQo2

Media 1
πŸ–ΌοΈ Media
O
overworld_ai
@overworld_ai
πŸ“…
Mar 04, 2026
5d ago
πŸ†”95135229

This month, we’re in SF for @Official_GDC and in San Jose for @NVIDIAGTC with a new live demo of our real-time diffusion world model. If you want to see it running under real user input and tight latency constraints, meet us. https://t.co/QputPCxkyk

πŸ–ΌοΈ Media
G
GergelyOrosz
@GergelyOrosz
πŸ“…
Mar 02, 2026
8d ago
πŸ†”70884640

On one end, the Anthropic team is a massive user of AI to write code (80%+ of all code deployed is written by Claude Code). They ship amazingly fast. On the other hand, seeing these beyond terrible reliability numbers suggests there might be a downside to all this speed: https://t.co/9nYoH7KYOc

Media 1
πŸ–ΌοΈ Media
G
ggerganov
@ggerganov
πŸ“…
Mar 02, 2026
8d ago
πŸ†”52531340

Looking for user feedback about the upcoming ggml official Debian and Ubuntu packages https://t.co/8lcGZzSgLK

Media 1
πŸ–ΌοΈ Media
S
sukh_saroy
@sukh_saroy
πŸ“…
Mar 01, 2026
8d ago
πŸ†”28257218

New research just exposed the biggest lie in AI coding benchmarks. LLMs score 84-89% on standard coding tests. On real production code? 25-34%. That's not a gap. That's a different reality. Here's what happened: Researchers built a benchmark from actual open-source repositories real classes with real dependencies, real type systems, real integration complexity. Then they tested the same models that dominate HumanEval leaderboards. The results were brutal. The models weren't failing because the code was "harder." They were failing because it was *real*. Synthetic benchmarks test whether a model can write a self-contained function with a clean docstring. Production code requires understanding inheritance hierarchies, framework integrations, and project-specific utilities. Different universe. Same leaderboard score. But it gets worse. A separate study ran 600,000 debugging experiments across 9 LLMs. They found a bug in a program. The LLM found it too. Then they renamed a variable. Added a comment. Shuffled function order. Changed nothing about the bug itself. The LLM couldn't find the same bug anymore. 78% of the time, cosmetic changes that don't affect program behavior completely broke the model's ability to debug. Function shuffling alone reduced debugging accuracy by 83%. The models aren't reading code. They're pattern-matching against what code *looks like* in their training data. A third study confirmed this from another angle: when researchers obfuscated real-world code changing symbols, structure, and semantics while keeping functionality identical LLM pass rates dropped by up to 62.5%. The researchers call this the "Specialist in Familiarity" problem. LLMs perform well on code they've memorized. The moment you show them something unfamiliar with the same logic, they collapse. Three papers. Three different methodologies. Same conclusion: The benchmarks we use to evaluate AI coding tools are measuring memorization, not understanding. If you're shipping code generated by LLMs into production without review, these numbers should concern you. If you're building developer tools, the question isn't "what's your HumanEval score." It's "what happens when the code doesn't look like the training data."

Media 1
πŸ–ΌοΈ Media
O
ollama
@ollama
πŸ“…
Mar 02, 2026
8d ago
πŸ†”65102237
⭐0.38

A big milestone @MiniMax_AI! Open weight models like M2.5 are beginning handle agentic tasks people used to trust only to opus or gpt.