Your curated collection of saved posts and media
Tom's ongoing peek at the personalities and personal stories behind machine learning's history is available wherever you find your podcasts. ๐ฅWatch on YouTube: https://t.co/czYb2iXB2l โถ๏ธListen on Apple: https://t.co/z07MFJaTr1

As @bradrcarson explains, the contract language released so far does not restrict the gov from using AI to kill without human oversight. https://t.co/To1RKsQTGg
@ch402 @sebgehr Too many to count. NatSec in general agrees with you @ch402. Jack Shannanโs background and placement in Operation Maven is noteworthy so his understanding of how critical Claude is to American military effectiveness is not just hot air. https://t.co/0fjZrDKWLh https://t.co/I8ISpgEMve

On one end, the Anthropic team is a massive user of AI to write code (80%+ of all code deployed is written by Claude Code). They ship amazingly fast. On the other hand, seeing these beyond terrible reliability numbers suggests there might be a downside to all this speed: https://t.co/9nYoH7KYOc
we are about to hit 1 9 of availability while coding is largely solved https://t.co/4NJB1YNsPk
we are about to hit 1 9 of availability while coding is largely solved https://t.co/4NJB1YNsPk
It's very common for people to claim that open LLMs will be used to commit cyber attacks at massive scale. What public evidence is there for this claim? The best (and one of the only) accounts I've seen of a cyber LLM attack was done using Claude https://t.co/v63Lolv5iH
Looking for user feedback about the upcoming ggml official Debian and Ubuntu packages https://t.co/8lcGZzSgLK
ๅ ้ฑใSakana AIใฎใชใผใๆ่ณๅฎถใงใใKhosla VenturesใฎVinod Khoslaๆฐใๆฅๆฅใใพใใใ๐ฏ๐ต ๅฝ็คพๅ ฑๅๅตๆฅญ่ ใฎไผ่คใจใจใใซ็ๅฑฑใใคใ่ฒกๅๅคง่ฃใๅ ฑใซ่กจๆฌใใๆฅๆฌใฎ็ฃๆฅญ็ซถไบๅใ้ซใใAIๆฆ็ฅใใๅ ฌๅ ฑ้จ้ใซใใใๆๆฌ็ใชAIๆดป็จใซใคใใฆใใฐใญใผใใซใช่ฆ็นใใๆ่ฆไบคๆใ่กใใพใใใ ็ถใใฆใVinodๆฐ ใซใฏSakana AIใฎๆฐใชใใฃในใซใใ่ถใใใใ ใใพใใใCEO David Ha (@hardmaru) ใ CTO Llion Jones (@YesThisIsLion) ใไบคใใๆฅๆฌใฎๅฎๅ จไฟ้ใป้ฒ่กๅ้ใซใใใAIๅฉ็จใๅซใใๅตๆฅญๆใใๆๅพ ใใใใฆใใใ ใใฆใใๅฝ็คพใฎใฆใใผใฏใชๆ่กใๆดป็จใใๅฝๅ ๅคใฎใใพใใพใช็ฃๆฅญใปใฏใฟใผใซใใใAIๅฑ้ใฎๅฏ่ฝๆงใซใคใใฆๅฏพ่ฉฑใ่กใใพใใใ โโโโโโโโโโโโ Last week, Vinod Khosla (@vkhosla) of Khosla Ventures, Sakana AIโs lead investor, visited Japan. ๐ฏ๐ต Together with our Co-founder COO Ren Ito, they paid a courtesy visit to Finance Minister Satsuki Katayama to exchange views from a global perspective on AI strategies to boost Japanโs industrial competitiveness and the fundamental integration of AI within the public sector. Following that, Vinod visited Sakana AIโs new office. Joined by CEO David Ha (@hardmaru) and CTO Llion Jones (@YesThisIsLion), we discussed the potential of deploying AI across various domestic and global industrial sectors using our unique technologyโsomething he has supported since our founding. This included conversations on utilizing AI in Japanโs security and defense fields.

Can AI companies restrict government use of their technology? They do it all the time. Whether and how depends on the acquisition pathway, contract type, and terms. My explainer: https://t.co/QHSZrooFoH #Anthropic #openai #pentagon #DoD #govcon
@CharlieBul58993 @JTillipman @bridgewriter (former NSC counsel) - https://t.co/K8WEStCDhc
A deep dive in @lawfare on the many legal problems with the Pentagon's designation of Anthropic as a supply chain risk. https://t.co/6mlWhgwMge
New research just exposed the biggest lie in AI coding benchmarks. LLMs score 84-89% on standard coding tests. On real production code? 25-34%. That's not a gap. That's a different reality. Here's what happened: Researchers built a benchmark from actual open-source repositories real classes with real dependencies, real type systems, real integration complexity. Then they tested the same models that dominate HumanEval leaderboards. The results were brutal. The models weren't failing because the code was "harder." They were failing because it was *real*. Synthetic benchmarks test whether a model can write a self-contained function with a clean docstring. Production code requires understanding inheritance hierarchies, framework integrations, and project-specific utilities. Different universe. Same leaderboard score. But it gets worse. A separate study ran 600,000 debugging experiments across 9 LLMs. They found a bug in a program. The LLM found it too. Then they renamed a variable. Added a comment. Shuffled function order. Changed nothing about the bug itself. The LLM couldn't find the same bug anymore. 78% of the time, cosmetic changes that don't affect program behavior completely broke the model's ability to debug. Function shuffling alone reduced debugging accuracy by 83%. The models aren't reading code. They're pattern-matching against what code *looks like* in their training data. A third study confirmed this from another angle: when researchers obfuscated real-world code changing symbols, structure, and semantics while keeping functionality identical LLM pass rates dropped by up to 62.5%. The researchers call this the "Specialist in Familiarity" problem. LLMs perform well on code they've memorized. The moment you show them something unfamiliar with the same logic, they collapse. Three papers. Three different methodologies. Same conclusion: The benchmarks we use to evaluate AI coding tools are measuring memorization, not understanding. If you're shipping code generated by LLMs into production without review, these numbers should concern you. If you're building developer tools, the question isn't "what's your HumanEval score." It's "what happens when the code doesn't look like the training data."
Gift link: https://t.co/S1D5ZMpE3l
๐ซ @bradlightcap has stopped following @GaryMarcus (๐ค๐: any thoughts on this?) https://t.co/kI0mBNCoxY
Gift link: https://t.co/S1D5ZMpE3l
In the last few days, OpenAI and its executives have claimed that its DoW deal prevents its models being used for mass domestic surveillance. As I write in a lengthy explainer for @ReadTransformer today, that appears to be misleading at best. https://t.co/IdlpVUSY0p
Be like Sam Altman > runs YC > starts a open-sourced non profit to regulate ai & protect humanity > raise money for non profit > use that money to build a closed source AI > create a new for profit company > raise money & kick out existing investors > use our data for ads in ChatGPT > go on the news and stand up for Anthropic against US gov > 24hrs later sign a deal with US to do exactly the opposite
Folks, this is not normal. Four American soldiers have died, but let me tell you about the curtains. โI always liked gold.โ https://t.co/1Kt9tvNi8g
Shocked! https://t.co/8EX9ADZibS
Satya Nadella just said what the entire industry is too invested to admit. Every CEO signing $100 billion data center contracts right now is making a bet that history may not honor. Nadella: โWe are one sort of innovation away from the entire regime changing.โ Right now, every major player is running the same play. More data. More GPUs. Bigger clusters. Same architecture. Theyโve convinced themselves scale is destiny. Theyโve convinced themselves the biggest balance sheet wins. Theyโve convinced themselves this is a resource war. Itโs not. Nadella: โIf you look at where weโve gone, it was all about pre-training scale, then it was about post-training, then we came up with reasoning, then we said, โoh, thereโs RL.โโ The architecture isnโt stable. It never was. Itโs been mutating the entire time. Each shift rewriting the rules. Each breakthrough making the previous moat irrelevant. And the companies that didnโt see it coming didnโt get a warning. They just woke up behind. Nadella: โA new model architecture that could even be more efficient in its performance.โ When that lands, the $100 billion clusters donโt matter. The hoarded GPUs donโt matter. The multi-decade infrastructure advantage doesnโt matter. Every castle built for the current paradigm becomes a monument to the wrong bet. This is what makes the AI race unlike anything in history. In nuclear competition, more warheads meant more power. The advantage was permanent. Cumulative. Compounding. In this race, one person with the right insight at 2am in an apartment somewhere erases a trillion dollars of infrastructure before the market opens. No warning. No negotiation. No second place. The most dangerous competitor in this race doesnโt have a data center. They just have the equation.
Anyone else having those weird dreams where future generations hate you? From @TheOnion https://t.co/y2OI4SqdpC
Markets have a history of overreacting to narratives long before underlying economics change. AI is no exception. Capital tends to swing between euphoria and panic when it struggles to price uncertainty. The work, as always, happens between those extremes. What's your take on this development? @pchamard @Khulood_Almani @antgrasso @GlenGilmore @Shi4Tech @CurieuxExplorer @FrRonconi @chidambara09 @theomitsa @Analytics_699 @Nicochan33 @nafisalam @pierrepinna @smaksked @Corix_JC @amalmerzouk @AdityaRPatro @quepasachico @IngridVasiliu @EstelaMandela @sonu_monika @RLDI_Lamy @SpirosMargaris @IanLJones98 @Timothy_Hughes @avrohomg @bimedotcom @HaroldSinnott @c4trends @mvollmer1 @DG_Collective @bamitav @rwang0 @ipfconline1 @sijlalhussain https://t.co/ZxZhQSG0py
โI probably spend a third, maybe 40%, of my time making sure the culture of Anthropic is good,โ Anthropic CEO Dario Amodei said. https://t.co/wATWuk7bZO https://t.co/Pbo7qTjhEm
