Your curated collection of saved posts and media

Showing 24 posts ยท last 7 days ยท quality filtered
M
miniapeur
@miniapeur
๐Ÿ“…
Mar 03, 2026
11d ago
๐Ÿ†”67239265

https://t.co/xQ0tVdFoV4

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”iScienceLuvr retweeted
M
Mathieu
@miniapeur
๐Ÿ“…
Mar 03, 2026
11d ago
๐Ÿ†”67239265

https://t.co/xQ0tVdFoV4

Media 1
โค๏ธ1,214
likes
๐Ÿ”47
retweets
๐Ÿ–ผ๏ธ Media
I
iScienceLuvr
@iScienceLuvr
๐Ÿ“…
Mar 05, 2026
9d ago
๐Ÿ†”51603464

GPT 5.4 is released https://t.co/kqy67qLlJf

Media 1
๐Ÿ–ผ๏ธ Media
B
BoWang87
@BoWang87
๐Ÿ“…
Mar 05, 2026
8d ago
๐Ÿ†”78072654

Two major AI releases this week: โ€ข Qwen3.5 โ€” new open-source small models โ€ข GPT-5.4 โ€” newest frontier closed model Most benchmarks compare math and coding. But the real test for frontier AI should be biology and healthcare. Thatโ€™s where mistakes actually matter. So our team at @UHN ran them on EURORAD โ€” 207 expert-validated radiology differential diagnosis cases. Results: GPT-5.4: 92.2% Qwen3.5-27B: 85% Gemini 3.1 Pro: ~79% A 27B open model that runs on a laptop is only 7 points behind the most powerful AI model on earth โ€” and already beating Gemini on this benchmark. That gap is much smaller than people expected. And it matters. For years hospitals faced an impossible tradeoff: Frontier models โ†’ patient data leaves the hospital Local models โ†’ not good enough That tradeoff may finally be ending. Qwen3.5-27B runs fully local. No API. No cloud. No patient data leaving the building. HIPAA / PHIPA compliance becomes architecture, not paperwork. Interesting detail: 27B and 122B score almost identically here. Scaling bigger didnโ€™t help much. One caveat: with web-scale training, itโ€™s hard to completely rule out that frontier models like GPT-5.4 may have seen parts of evaluation datasets. Still, the signal is clear: Small models are getting good enough for real clinical AI. And if we want to measure real AI progress, biology and healthcare should be the benchmark. Huge credit to the team @alifmunim @AlhusainAbdalla @JunMa_AI4Health @Omar_Ibr12 @oliviaamwei

Media 1
๐Ÿ–ผ๏ธ Media
D
DigEconLab
@DigEconLab
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”16890580

๐ŸŽ™๏ธNew episode of "Machine Learning: How Did We Get Here?" Tom Mitchell (@CarnegieMellon) and @ylecun, Executive Chairman of AMI Labs and Professor at NYU, discuss how technological advances and commercial forces shaped AI history. Listen on Spotify: https://t.co/YdWjoVdoVc

Media 1
๐Ÿ–ผ๏ธ Media
D
DigEconLab
@DigEconLab
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”80475754

Tom's ongoing peek at the personalities and personal stories behind machine learning's history is available wherever you find your podcasts. ๐ŸŽฅWatch on YouTube: https://t.co/czYb2iXB2l โ–ถ๏ธListen on Apple: https://t.co/z07MFJaTr1

Media 1Media 2
๐Ÿ–ผ๏ธ Media
J
jeremyphoward
@jeremyphoward
๐Ÿ“…
Mar 01, 2026
12d ago
๐Ÿ†”86107414

As @bradrcarson explains, the contract language released so far does not restrict the gov from using AI to kill without human oversight. https://t.co/To1RKsQTGg

Media 1
๐Ÿ–ผ๏ธ Media
G
GlennMatlin
@GlennMatlin
๐Ÿ“…
Mar 01, 2026
12d ago
๐Ÿ†”78251503

@ch402 @sebgehr Too many to count. NatSec in general agrees with you @ch402. Jack Shannanโ€™s background and placement in Operation Maven is noteworthy so his understanding of how critical Claude is to American military effectiveness is not just hot air. https://t.co/0fjZrDKWLh https://t.co/I8ISpgEMve

Media 1Media 2
๐Ÿ–ผ๏ธ Media
G
GergelyOrosz
@GergelyOrosz
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”70884640

On one end, the Anthropic team is a massive user of AI to write code (80%+ of all code deployed is written by Claude Code). They ship amazingly fast. On the other hand, seeing these beyond terrible reliability numbers suggests there might be a downside to all this speed: https://t.co/9nYoH7KYOc

Media 1
๐Ÿ–ผ๏ธ Media
T
ThePrimeagen
@ThePrimeagen
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”65774984

we are about to hit 1 9 of availability while coding is largely solved https://t.co/4NJB1YNsPk

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”jeremyphoward retweeted
T
ThePrimeagen
@ThePrimeagen
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”65774984

we are about to hit 1 9 of availability while coding is largely solved https://t.co/4NJB1YNsPk

Media 1
โค๏ธ2,451
likes
๐Ÿ”84
retweets
๐Ÿ–ผ๏ธ Media
B
BlancheMinerva
@BlancheMinerva
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”44795777

It's very common for people to claim that open LLMs will be used to commit cyber attacks at massive scale. What public evidence is there for this claim? The best (and one of the only) accounts I've seen of a cyber LLM attack was done using Claude https://t.co/v63Lolv5iH

Media 1
๐Ÿ–ผ๏ธ Media
G
ggerganov
@ggerganov
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”52531340

Looking for user feedback about the upcoming ggml official Debian and Ubuntu packages https://t.co/8lcGZzSgLK

Media 1
๐Ÿ–ผ๏ธ Media
S
SakanaAILabs
@SakanaAILabs
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”84732913

ๅ…ˆ้€ฑใ€Sakana AIใฎใƒชใƒผใƒ‰ๆŠ•่ณ‡ๅฎถใงใ‚ใ‚‹Khosla VenturesใฎVinod KhoslaๆฐใŒๆฅๆ—ฅใ—ใพใ—ใŸใ€‚๐Ÿ‡ฏ๐Ÿ‡ต ๅฝ“็คพๅ…ฑๅŒๅ‰ตๆฅญ่€…ใฎไผŠ่—คใจใจใ‚‚ใซ็‰‡ๅฑฑใ•ใคใ่ฒกๅ‹™ๅคง่‡ฃใ‚’ๅ…ฑใซ่กจๆ•ฌใ—ใ€ๆ—ฅๆœฌใฎ็”ฃๆฅญ็ซถไบ‰ๅŠ›ใ‚’้ซ˜ใ‚ใ‚‹AIๆˆฆ็•ฅใ‚„ใ€ๅ…ฌๅ…ฑ้ƒจ้–€ใซใŠใ‘ใ‚‹ๆŠœๆœฌ็š„ใชAIๆดป็”จใซใคใ„ใฆใ€ใ‚ฐใƒญใƒผใƒใƒซใช่ฆ–็‚นใ‹ใ‚‰ๆ„่ฆ‹ไบคๆ›ใ‚’่กŒใ„ใพใ—ใŸใ€‚ ็ถšใ‘ใฆใ€Vinodๆฐ ใซใฏSakana AIใฎๆ–ฐใ‚ชใƒ•ใ‚ฃใ‚นใซใ‚‚ใŠ่ถŠใ—ใ„ใŸใ ใใพใ—ใŸใ€‚CEO David Ha (@hardmaru) ใ‚„ CTO Llion Jones (@YesThisIsLion) ใ‚‚ไบคใˆใ€ๆ—ฅๆœฌใฎๅฎ‰ๅ…จไฟ้šœใƒป้˜ฒ่ก›ๅˆ†้‡ŽใซใŠใ‘ใ‚‹AIๅˆฉ็”จใ‚’ๅซใ‚ใ€ๅ‰ตๆฅญๆ™‚ใ‹ใ‚‰ๆœŸๅพ…ใ‚’ใ‹ใ‘ใฆใ„ใŸใ ใ„ใฆใ„ใ‚‹ๅฝ“็คพใฎใƒฆใƒ‹ใƒผใ‚ฏใชๆŠ€่ก“ใ‚’ๆดป็”จใ—ใŸๅ›ฝๅ†…ๅค–ใฎใ•ใพใ–ใพใช็”ฃๆฅญใ‚ปใ‚ฏใ‚ฟใƒผใซใŠใ‘ใ‚‹AIๅฑ•้–‹ใฎๅฏ่ƒฝๆ€งใซใคใ„ใฆๅฏพ่ฉฑใ‚’่กŒใ„ใพใ—ใŸใ€‚ โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€” Last week, Vinod Khosla (@vkhosla) of Khosla Ventures, Sakana AIโ€™s lead investor, visited Japan. ๐Ÿ‡ฏ๐Ÿ‡ต Together with our Co-founder COO Ren Ito, they paid a courtesy visit to Finance Minister Satsuki Katayama to exchange views from a global perspective on AI strategies to boost Japanโ€™s industrial competitiveness and the fundamental integration of AI within the public sector. Following that, Vinod visited Sakana AIโ€™s new office. Joined by CEO David Ha (@hardmaru) and CTO Llion Jones (@YesThisIsLion), we discussed the potential of deploying AI across various domestic and global industrial sectors using our unique technologyโ€”something he has supported since our founding. This included conversations on utilizing AI in Japanโ€™s security and defense fields.

Media 1Media 2
๐Ÿ–ผ๏ธ Media
J
JTillipman
@JTillipman
๐Ÿ“…
Mar 01, 2026
13d ago
๐Ÿ†”43523604

Can AI companies restrict government use of their technology? They do it all the time. Whether and how depends on the acquisition pathway, contract type, and terms. My explainer: https://t.co/QHSZrooFoH #Anthropic #openai #pentagon #DoD #govcon

Media 1
๐Ÿ–ผ๏ธ Media
C
ch402
@ch402
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”04210184

@CharlieBul58993 @JTillipman @bridgewriter (former NSC counsel) - https://t.co/K8WEStCDhc

Media 1
๐Ÿ–ผ๏ธ Media
A
ARozenshtein
@ARozenshtein
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”24749148

A deep dive in @lawfare on the many legal problems with the Pentagon's designation of Anthropic as a supply chain risk. https://t.co/6mlWhgwMge

Media 1
๐Ÿ–ผ๏ธ Media
S
sukh_saroy
@sukh_saroy
๐Ÿ“…
Mar 01, 2026
13d ago
๐Ÿ†”28257218

New research just exposed the biggest lie in AI coding benchmarks. LLMs score 84-89% on standard coding tests. On real production code? 25-34%. That's not a gap. That's a different reality. Here's what happened: Researchers built a benchmark from actual open-source repositories real classes with real dependencies, real type systems, real integration complexity. Then they tested the same models that dominate HumanEval leaderboards. The results were brutal. The models weren't failing because the code was "harder." They were failing because it was *real*. Synthetic benchmarks test whether a model can write a self-contained function with a clean docstring. Production code requires understanding inheritance hierarchies, framework integrations, and project-specific utilities. Different universe. Same leaderboard score. But it gets worse. A separate study ran 600,000 debugging experiments across 9 LLMs. They found a bug in a program. The LLM found it too. Then they renamed a variable. Added a comment. Shuffled function order. Changed nothing about the bug itself. The LLM couldn't find the same bug anymore. 78% of the time, cosmetic changes that don't affect program behavior completely broke the model's ability to debug. Function shuffling alone reduced debugging accuracy by 83%. The models aren't reading code. They're pattern-matching against what code *looks like* in their training data. A third study confirmed this from another angle: when researchers obfuscated real-world code changing symbols, structure, and semantics while keeping functionality identical LLM pass rates dropped by up to 62.5%. The researchers call this the "Specialist in Familiarity" problem. LLMs perform well on code they've memorized. The moment you show them something unfamiliar with the same logic, they collapse. Three papers. Three different methodologies. Same conclusion: The benchmarks we use to evaluate AI coding tools are measuring memorization, not understanding. If you're shipping code generated by LLMs into production without review, these numbers should concern you. If you're building developer tools, the question isn't "what's your HumanEval score." It's "what happens when the code doesn't look like the training data."

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”GaryMarcus retweeted
H
Hayden Field
@haydenfield
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”95210239

Gift link: https://t.co/S1D5ZMpE3l

Media 1
โค๏ธ73
likes
๐Ÿ”10
retweets
๐Ÿ–ผ๏ธ Media
B
BigTechAlert
@BigTechAlert
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”22048366

๐Ÿšซ @bradlightcap has stopped following @GaryMarcus (๐Ÿค–๐Ÿ”: any thoughts on this?) https://t.co/kI0mBNCoxY

Media 1
๐Ÿ–ผ๏ธ Media
H
haydenfield
@haydenfield
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”95210239

Gift link: https://t.co/S1D5ZMpE3l

Media 1
๐Ÿ–ผ๏ธ Media
S
ShakeelHashim
@ShakeelHashim
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”29530548

In the last few days, OpenAI and its executives have claimed that its DoW deal prevents its models being used for mass domestic surveillance. As I write in a lengthy explainer for @ReadTransformer today, that appears to be misleading at best. https://t.co/IdlpVUSY0p

Media 1
๐Ÿ–ผ๏ธ Media
J
Jacobsklug
@Jacobsklug
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”99569241

Be like Sam Altman > runs YC > starts a open-sourced non profit to regulate ai & protect humanity > raise money for non profit > use that money to build a closed source AI > create a new for profit company > raise money & kick out existing investors > use our data for ads in ChatGPT > go on the news and stand up for Anthropic against US gov > 24hrs later sign a deal with US to do exactly the opposite

Media 1
๐Ÿ–ผ๏ธ Media
G
GaryMarcus
@GaryMarcus
๐Ÿ“…
Mar 02, 2026
12d ago
๐Ÿ†”25070923

Folks, this is not normal. Four American soldiers have died, but let me tell you about the curtains. โ€œI always liked gold.โ€ https://t.co/1Kt9tvNi8g

Media 1
๐Ÿ–ผ๏ธ Media