Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
πŸ”iScienceLuvr retweeted
O
Overworld
@overworld_ai
πŸ“…
Mar 04, 2026
5d ago
πŸ†”95135229
⭐0.34

This month, we’re in SF for @Official_GDC and in San Jose for @NVIDIAGTC with a new live demo of our real-time diffusion world model. If you want to see it running under real user input and tight latency constraints, meet us. https://t.co/QputPCxkyk

❀️61
likes
πŸ”6
retweets
G
GergelyOrosz
@GergelyOrosz
πŸ“…
Mar 02, 2026
8d ago
πŸ†”70884640

On one end, the Anthropic team is a massive user of AI to write code (80%+ of all code deployed is written by Claude Code). They ship amazingly fast. On the other hand, seeing these beyond terrible reliability numbers suggests there might be a downside to all this speed: https://t.co/9nYoH7KYOc

Media 1
πŸ–ΌοΈ Media
G
ggerganov
@ggerganov
πŸ“…
Mar 02, 2026
8d ago
πŸ†”52531340

Looking for user feedback about the upcoming ggml official Debian and Ubuntu packages https://t.co/8lcGZzSgLK

Media 1
πŸ–ΌοΈ Media
S
sukh_saroy
@sukh_saroy
πŸ“…
Mar 01, 2026
9d ago
πŸ†”28257218

New research just exposed the biggest lie in AI coding benchmarks. LLMs score 84-89% on standard coding tests. On real production code? 25-34%. That's not a gap. That's a different reality. Here's what happened: Researchers built a benchmark from actual open-source repositories real classes with real dependencies, real type systems, real integration complexity. Then they tested the same models that dominate HumanEval leaderboards. The results were brutal. The models weren't failing because the code was "harder." They were failing because it was *real*. Synthetic benchmarks test whether a model can write a self-contained function with a clean docstring. Production code requires understanding inheritance hierarchies, framework integrations, and project-specific utilities. Different universe. Same leaderboard score. But it gets worse. A separate study ran 600,000 debugging experiments across 9 LLMs. They found a bug in a program. The LLM found it too. Then they renamed a variable. Added a comment. Shuffled function order. Changed nothing about the bug itself. The LLM couldn't find the same bug anymore. 78% of the time, cosmetic changes that don't affect program behavior completely broke the model's ability to debug. Function shuffling alone reduced debugging accuracy by 83%. The models aren't reading code. They're pattern-matching against what code *looks like* in their training data. A third study confirmed this from another angle: when researchers obfuscated real-world code changing symbols, structure, and semantics while keeping functionality identical LLM pass rates dropped by up to 62.5%. The researchers call this the "Specialist in Familiarity" problem. LLMs perform well on code they've memorized. The moment you show them something unfamiliar with the same logic, they collapse. Three papers. Three different methodologies. Same conclusion: The benchmarks we use to evaluate AI coding tools are measuring memorization, not understanding. If you're shipping code generated by LLMs into production without review, these numbers should concern you. If you're building developer tools, the question isn't "what's your HumanEval score." It's "what happens when the code doesn't look like the training data."

Media 1
πŸ–ΌοΈ Media
O
ollama
@ollama
πŸ“…
Mar 02, 2026
8d ago
πŸ†”65102237
⭐0.38

A big milestone @MiniMax_AI! Open weight models like M2.5 are beginning handle agentic tasks people used to trust only to opus or gpt.

πŸ”HamelHusain retweeted
I
Eleanor Berger
@intellectronica
πŸ“…
Mar 01, 2026
8d ago
πŸ†”30289329
⭐0.36

@HamelHusain I love it. I have this in my global AGENTS .md to maximise the use of the questions tool (works in Claude, @opencode, @code, and @GitHubCopilot CLI). https://t.co/cPDwXHjwrP

πŸ’¬ Reply:
@intellectronica β€’ 2026-03-02T12:08

@HamelHusain I love it. I have this in my global AGENTS .md to maximise the use of the questions tool (works in Claude, @opencode, @code, and @GitHubCopilot CLI). https://t.co/cPDwXHjwrP

❀️34
likes
πŸ”1
retweets
B
badlogicgames
@badlogicgames
πŸ“…
Mar 01, 2026
8d ago
πŸ†”33013581
⭐0.40

do you need MCP for dev workflows? no, for the most part. allows out of context data transforms, conserves context window space. do enterprises need MCP? likely, specifically wrt to auth, which is a bad idea via a fully LLM exposed cli. do normies need MCP? yes, no other way to connect emails/etc. still a bad idea to let them use any old MCP, specifically stdio based ones. it's like your grandma installing all those .exe email attachments.

H
htihle
@htihle
πŸ“…
Mar 02, 2026
8d ago
πŸ†”65955244

GPT 5.3 Codex (xhigh) scores 79.3% and takes the lead on WeirdML, just ahead of opus 4.6 (77.9%) at less than half the prize. It is very solid across the board, but I still feel the peak performance of gemini 3.1 is stronger. https://t.co/WRYosAStGY

Media 1Media 2
πŸ–ΌοΈ Media
K
karpathy
@karpathy
πŸ“…
Feb 25, 2026
13d ago
πŸ†”80672741
⭐0.38

@adrian_valentim Yeah, 95% of people misunderstand the tweet. I’m referring to gradient descent as a programmer (in the distributed representation space.) . In coding AI today the LLM is the programmer and in the regular β€œtext space”. Ah well :)

K
karpathy
@karpathy
πŸ“…
Feb 25, 2026
12d ago
πŸ†”77135652
⭐0.38

@JohnHarper10070 Yes, in this intermediate state, you go faster if you can be more explicit and actually understand what the AI is doing on your behalf, and what the different tools are at its disposal, and what is hard and what is easy. It's not magic, it's delegation.

K
karpathy
@karpathy
πŸ“…
Feb 04, 2026
33d ago
πŸ†”10836075
⭐0.42

A lot of people quote tweeted this as 1 year anniversary of vibe coding. Some retrospective - I've had a Twitter account for 17 years now (omg) and I still can't predict my tweet engagement basically at all. This was a shower of thoughts throwaway tweet that I just fired off without thinking but somehow it minted a fitting name at the right moment for something that a lot of people were feeling at the same time, so here we are: vibe coding is now mentioned on my Wikipedia as a major memetic "contribution" and even its article is longer. lol The one thing I'd add is that at the time, LLM capability was low enough that you'd mostly use vibe coding for fun throwaway projects, demos and explorations. It was good fun and it almost worked. Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software. Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favorite "agentic engineering": - "agentic" because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. - "engineering" to emphasize that there is an art & science and expertise to it. It's something you can learn and become better at, with its own depth of a different kind. In 2026, we're likely to see continued improvements on both the model layer and the new agent layer. I feel excited about the product of the two and another year of progress.

πŸ”ivanleomk retweeted
V
varick
@vamonke
πŸ“…
Mar 01, 2026
9d ago
πŸ†”07361965
⭐0.34

won 1st place at the @OpenAI codex hackathon! πŸ₯‡ i built StoryWorld, a 3D movie studio in your pocket. made with iOS ARKit + RealityKit, @DeemosTech Rodin, and @fal https://t.co/fWIKy6sCZQ

❀️2,901
likes
πŸ”160
retweets
O
openclaw
@openclaw
πŸ“…
Mar 02, 2026
8d ago
πŸ†”59426953

OpenClaw 2026.3.1 🦞 ⚑ OpenAI WebSocket streaming 🧠 Claude 4.6 adaptive thinking 🐳 Better Docker and Native K8s support 🧡 Discord threads, TG DM topics, Feishu fixes πŸ”§ Agent-powered visual diffs plugin Reports of our death were greatly exaggerated. https://t.co/ISJH09of5U

Media 1
πŸ–ΌοΈ Media
πŸ”Sanemavcil retweeted
O
OpenClaw🦞
@openclaw
πŸ“…
Mar 02, 2026
8d ago
πŸ†”59426953
⭐0.36

OpenClaw 2026.3.1 🦞 ⚑ OpenAI WebSocket streaming 🧠 Claude 4.6 adaptive thinking 🐳 Better Docker and Native K8s support 🧡 Discord threads, TG DM topics, Feishu fixes πŸ”§ Agent-powered visual diffs plugin Reports of our death were greatly exaggerated. https://t.co/ISJH09of5U

❀️2,655
likes
πŸ”242
retweets
M
memU_ai
@memU_ai
πŸ“…
Mar 01, 2026
9d ago
πŸ†”23943731

We’ve officially open-sourced memU bot. πŸ€– It’s not a chatbot that waits for commands. It’s a proactive assistant that understands you, remembers you, and gradually becomes more aligned with how you work. Runs locally. Built on the memU memory framework. GitHub πŸ‘‰ https://t.co/pmOnl5czYs Feel free to explore it, try it out, share feedback, and help us improve it together.

Media 1
πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
M
memU
@memU_ai
πŸ“…
Mar 01, 2026
9d ago
πŸ†”23943731
⭐0.34

We’ve officially open-sourced memU bot. πŸ€– It’s not a chatbot that waits for commands. It’s a proactive assistant that understands you, remembers you, and gradually becomes more aligned with how you work. Runs locally. Built on the memU memory framework. GitHub πŸ‘‰ https://t.co/pmOnl5czYs Feel free to explore it, try it out, share feedback, and help us improve it together.

❀️207
likes
πŸ”30
retweets
N
nicopreme
@nicopreme
πŸ“…
Mar 01, 2026
9d ago
πŸ†”86054807

The "Visual Explainer" agent skill just crossed 3.5K stars on GitHub πŸŽ‰ Just updated with: /generate-visual-plan slash command for more structured plan specs, code block patterns, typography polish, mermaid fixes, anti slop guardrails https://t.co/qzde42tVEV

Media 1
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 02, 2026
8d ago
πŸ†”17356919
⭐0.38

The fundamental issue with PDF parsing is that PDFs are designed for display purposes. The internal representation of data is outputting shapes at specific coordinates on the page (e.g. "render this string at coordinate (84, 720) with this font") each displayed character could be not contiguous at all, there could be no font mapping back to unicode so you have no idea what the character is. Any PDF parser needs to magically reconstruct this random sequence of display coordinate data into semantically meaningful text, tables, and more. VLMs do help (screenshot the page and read it), but besides collapsing the metadata they still struggle in terms of accuracy and cost. note: parsing Word/Pptx as text representations so typically a bit easier too read. Our entire company at @llama_index is laser-focused on PDF parsing so we've been really trying to understand all the nuances of doc formats, especially PDFs πŸ™‚ more notes on this coming soon

A
AskPerplexity
@AskPerplexity
πŸ“…
Mar 01, 2026
8d ago
πŸ†”37762382

"Build me Perplexity Finance but for Pokemon cards. Make no mistakes." Computer: β†’ researched Pokemon card APIs on its own β†’ wrote 5,000 lines of React + Python β†’ debugged itself using browser devtools β†’ deployed and pushed to GitHub (built by u/NoSquirrel4840 on Reddit) https://t.co/kLBQnyA2Vk

πŸ–ΌοΈ Media
Z
zoomyzoomm
@zoomyzoomm
πŸ“…
Mar 01, 2026
8d ago
πŸ†”59333272

He’s not kidding. Took me HALF AN HOUR to vibe code Notion with Perplexity Computer. Software is legit a zero. https://t.co/eBbIDQsNRI

πŸ–ΌοΈ Media
A
alexgilev
@alexgilev
πŸ“…
Mar 01, 2026
8d ago
πŸ†”93673883

Ok, this is insane...🀯 I've just built the most comprehensive RAG system (UX Knowledge base) for me to use in my projects with @perplexity_ai . > Instant, research-backed best practices (548 items) for design > 10X the output quality for Project Aristotle with a grounded knowledge layer: https://t.co/ko1oELOvaA > Ability to present design decisions to stakeholders with cited rationale and data. Data is the new oil. Already shared it with those who pre-purchased @AgenticUi in January as a token of appreciation for support.

Media 1
πŸ–ΌοΈ Media
I
InfiniAILab
@InfiniAILab
πŸ“…
Feb 18, 2026
20d ago
πŸ†”93728105

Video generation models are improving fastβ€”real-time autoregressive models now deliver high quality at low latency, and they’re quickly being adopted for world models and robotics applications. So what’s the problem? They’re still too slow on consumer hardware. πŸš€ What if we told you that we can get true real-time 16 FPS video generation on a single RTX 5090? (1.5-12x over FA 2/3/4 on 5090, H100, B200) Today we release MonarchRT πŸ¦‹, an efficient video attention that parameterizes attention maps as (tiled) Monarch matrices and delivers real E2E gains. πŸ“„ Paper: https://t.co/d1AAMIseow 🌐 Website: https://t.co/41mqriKekx πŸ”— GitHub: https://t.co/hp5iJttviA 🧡1/n

Media 2
πŸ–ΌοΈ Media
G
ggerganov
@ggerganov
πŸ“…
Jan 29, 2026
40d ago
πŸ†”44057045

Introducing LlamaBarn β€” a tiny macOS menu bar app for running local LLMs Open source, built on llama.cpp https://t.co/F1Z3DVl9Kg

Media 1
πŸ–ΌοΈ Media
T
Tim_Dettmers
@Tim_Dettmers
πŸ“…
Jan 27, 2026
42d ago
πŸ†”82548999

This is very impactful: you can now distill frontier performance into small models that are specialized to private repositories. Companies can quickly and cheaply train on their data and have super-efficient deployments of 32B agents. https://t.co/03jsS6cWJ3

Media 1
πŸ–ΌοΈ Media