Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
M
math_rachel
@math_rachel
๐Ÿ“…
Nov 18, 2025
158d ago
๐Ÿ†”78933538

(2024) The โ€œHygiene Hypothesisโ€ is more accurately framed as the โ€œOld friends hypothesis.โ€ We co-evolved with friendly bacteria & some parasites. We did not co-evolve with the crowd infections of mega-cities & 100,000 daily global flights. 9/ https://t.co/7DwOVexIYU

Media 1
๐Ÿ–ผ๏ธ Media
M
math_rachel
@math_rachel
๐Ÿ“…
Nov 18, 2025
158d ago
๐Ÿ†”21365597

These were just some of my most popular posts from the last 10 years. Also, I have moved many of my older posts from medium & fast ai over to my current site, so you can find them all in one place: 10/ https://t.co/WE7RqUYktD

Media 1
๐Ÿ–ผ๏ธ Media
M
math_rachel
@math_rachel
๐Ÿ“…
Nov 19, 2025
156d ago
๐Ÿ†”76336096

(2025) A biologist discovered 100s of errors in a paper that used AI to classify enzymes. Publishing incentives rewards flashy results, not diligent fact-checking, and it is very difficult to evaluate AI claims in areas we are not experts. 11/ https://t.co/BJDSsDIAkK

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”91285640

๐Ÿšจ New paper! ๐ŸŒŽ MENLO: From Preferences to Proficiency We introduce a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles. ๐Ÿ“„ Paper: https://t.co/n8Z2cDJm5a ๐Ÿค— Data: https://t.co/fzM6Um32nD ๐ŸงตDetails ๐Ÿ‘‡

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”13506130

Multilingual LLMs โ‰  Native speakers Evaluating native-like generation across language varieties is hard, subjective, and inconsistent. MENLO provides: โ€“ A structured evaluation protocol โ€“ Human preference data โ€“ Model-based reward modeling for 47 languages https://t.co/zEBXazlUUG

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”53355599

What makes a native speaker? We go beyond fluency and consider a responseโ€™s factuality and tone with regard to the addressee and local context. We define 4 quality dimensions reflecting these attributes. https://t.co/nFqYRrA0nJ

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”04265954

MENLO framework includes: ๐Ÿ“Š 6,423 human-labeled prompt-response preference pairs ๐ŸŒ 47 language varieties ๐Ÿงญ 4 structured quality dimensions (fluency, tone, etc.) โœ… High inter-annotator agreement โš–๏ธ Pairwise judgments โ†’ better signal https://t.co/SW5nX5FwrX

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”89724599

We benchmark: 1. Zero-shot LLM judges 2. RL- & SFT-trained reward models 3. Human raters (gold) Findings: โ€“ Pairwise + rubric-based eval boosts zero-shot LLM judge performance โ€“ But: gap with humans remains across languages https://t.co/UfSuaq2xdZ

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”44990574

We explore: ๐Ÿ” Reinforcement learning ๐Ÿ Reward shaping ๐Ÿง  Multi-task learning across languages/dimensions โ†’ These improve multilingual reward model quality and correlation with human judgments. https://t.co/wDudu4goxv

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”26118200

Reward models trained with MENLO can also be used generatively: โ€“ As scoring functions for multilingual generation โ€“ To improve proficiency and audience alignment in LLM outputs Still: some human-model judgment divergences persist, LLM evaluators are overconfident about the improvement.

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”08704259

Key takeaways: โ€“ Fine-grained LLM judges benefit from pairwise evaluation and structured rubrics โ€“ RL-trained cross-lingual reward modeling is feasible and helpful โ€“ MENLO pushes toward scalable, preference-aligned multilingual generation https://t.co/57kx5CaMs5

Media 1
๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 01, 2025
205d ago
๐Ÿ†”63449724

We release: ๐Ÿ“‚ MENLO dataset โš™๏ธ Evaluation framework + rubrics ๐Ÿ“„ Judge/RM prompts ๐Ÿ”ฌ Benchmark for multilingual reward modeling Paper: https://t.co/LKAy493nlU Data: https://t.co/co8O5WOOKp https://t.co/4TegeRmzy0

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
D
DuniaInnovation
@DuniaInnovation
๐Ÿ“…
Sep 30, 2025
206d ago
๐Ÿ†”93762005

โ€œDuniaโ€ means Earth. Our Goal is simple: To build the engine that discovers the materials of the future for this planet. Because every leap in human history began with a material. https://t.co/rbT2ZpovJ3

๐Ÿ–ผ๏ธ Media
S
seb_ruder
@seb_ruder
๐Ÿ“…
Oct 08, 2025
198d ago
๐Ÿ†”99395035

We'll be organizing the Second Big Picture Workshop at #ACL2026. This is a meta-workshop, which explores research narratives and how they connect with each other. Our talks will feature multiple speakers that argue different positions of a topic. https://t.co/SXTqzuNVkB

Media 1
๐Ÿ–ผ๏ธ Media
A
AnthropicAI
@AnthropicAI
๐Ÿ“…
Nov 04, 2025
171d ago
๐Ÿ†”89728939

Even when new AI models bring clear improvements in capabilities, deprecating the older generations comes with downsides. An update on how weโ€™re thinking about these costs, and some of the early steps weโ€™re taking to mitigate them: https://t.co/VCTMW0d2e8

Media 1
๐Ÿ–ผ๏ธ Media
T
TrentonBricken
@TrentonBricken
๐Ÿ“…
Nov 24, 2025
151d ago
๐Ÿ†”09285429

Always more to do but I'm proud of how safe Opus 4.5 is! (System Card section 6.2) https://t.co/ncvy5rIblk https://t.co/4fXIgBHcI9

Media 1
๐Ÿ–ผ๏ธ Media
P
peterwildeford
@peterwildeford
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”65180116

I'm grading all my friends on this graph and then confronting them about it https://t.co/OtrIAMdTzV

@Miles_Brundage โ€ข Tue Nov 25 04:10

Some of y'all need to make progress on this benchmark https://t.co/6qC4wONhWR

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”sleepinyourhat retweeted
P
Peter Wildeford๐Ÿ‡บ๐Ÿ‡ธ๐Ÿš€
@peterwildeford
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”65180116

I'm grading all my friends on this graph and then confronting them about it https://t.co/OtrIAMdTzV

Media 1
โค๏ธ184
likes
๐Ÿ”6
retweets
๐Ÿ–ผ๏ธ Media
S
saprmarks
@saprmarks
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”87181807

@Miles_Brundage See page 69 of the system card for more possible metrics along which some of y'all need to make progress! https://t.co/BfVqyGguzP

Media 1
๐Ÿ–ผ๏ธ Media
A
AISecurityInst
@AISecurityInst
๐Ÿ“…
Nov 26, 2025
149d ago
๐Ÿ†”33499159

Weโ€™re sharing a case study on alignment evaluations with @AnthropicAI on Claude Opus 4.5, Opus 4.1 and Sonnet 4.5. We ask: would an AI assistant used inside a frontier lab quietly sabotage AI safety research? Overall results are encouraging, but with important caveats.๐Ÿงต https://t.co/tcpFrolCn6

Media 1
๐Ÿ–ผ๏ธ Media
W
WhiteHouse
@WhiteHouse
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”79748076

Today, @amazon announced a MAJOR plan to build AI and high-performance computing for the U.S. Government: "We're giving agencies expanded access to advanced AI capabilities that will enable them to accelerate critical missions from cybersecurity to drug discovery."๐Ÿ”ฅ https://t.co/ht4Wn7TPtE

Media 1
๐Ÿ–ผ๏ธ Media
D
Defence_Index
@Defence_Index
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”08273588

๐Ÿšจ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ป๐Ÿ‡ช The US Has Assembled A Strike Force Around Venezuela That Resembles The Opening Phase Of A Full Scale Intervention The US has quietly assembled one of its most powerful regional force groupings in years around the Caribbean basin, all positioned within striking distance of Venezuela. From Puerto Rico to the Caribbean Sea, every category of American firepower is now in place. ๐Ÿ”น Long range bombers B-52H, B-1B, and B-2A aircraft are positioned for strike missions from CONUS. Their role includes potential deep strike and JASSM launches with ranges beyond a thousand kilometers. ๐Ÿ”น Carrier strike capability The Gerald R Ford carrier strike group is present with a full carrier air wing. F-35C, F A-18E F Super Hornets, E-A 18G Growlers, E-2D Advanced Hawkeyes, and MH-60 helicopters all sit inside strike range. Destroyers in the group carry Tomahawk missiles. ๐Ÿ”น Tomahawk land attack missile concentration More than two hundred Tomahawk missiles are available in the Caribbean through multiple destroyers and cruisers. Their loadouts give the US the ability to hit fixed targets across Venezuela within minutes. ๐Ÿ”น JASSM strike potential B-2A and B-52H bombers can launch AGM-158 JASSM standoff missiles. These weapons allow strikes without ever entering Venezuelan airspace. ๐Ÿ”น US Marine Corps expeditionary forces The Iwo Jima amphibious ready group and the 22nd Marine Expeditionary Unit are in the Caribbean Sea. Assault ships carry Osprey aircraft, attack helicopters, landing craft, and infantry capable of rapid beach entry or inland seizure. ๐Ÿ”น Forward positioned aircraft in Puerto Rico MQ-9 Reaper drones, F-15 fighters, KC-135 tankers, and C-130 transports are forward deployed to Roosevelt Roads. These assets allow persistent ISR, refueling operations, and fast deployment of strike aircraft. ๐Ÿ”น US special forces presence Marine Raiders and other special operations elements are in theatre. Their missions include recon, target design, and advance preparation. ๐Ÿ”น Support and logistics power C-17 and C-130 aircraft, KC-10 and KC 135 tankers, and all rotary wing assets provide sustained operational tempo for any strike or landing operation. The map shows a posture that is not normal. It is not accidental. When long range bombers, Tomahawk carriers, a full carrier strike group, and a Marine amphibious group appear at the same time in the same region, the message is unmistakable. Washington is positioning itself for the ability to strike Venezuela across air, sea, and land at any moment.

Media 1
๐Ÿ–ผ๏ธ Media
B
byHeatherLong
@byHeatherLong
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”90388589

Americans are frustrated with the economy -- and the outlook for 2026. Every consumer sentiment gauge is saying the same thing: Sentiment is down to the worst levels since April (or since inflation summer of 2022). Why? Because the middle class is feeling squeezed. (And lower-income households are basically in a recession) 1) It's hard to get a job (unless you work in healthcare) 2) The cost of living is up, esp. the basics of food, utilities, healthcare, insurance and auto repair. 3) Real incomes are trending down as inflation rises and pay gains are getting stingier. In November, "expectations for increased household incomes shrunk dramatically," the Conference Board said today. Expect more of that in 2026.

Media 1Media 2
๐Ÿ–ผ๏ธ Media
O
Osint613
@Osint613
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”94123741

30% of the U.S. Navyโ€™s deployed warships are currently in the Caribbean. Something tells me this is not only about Venezuela. https://t.co/NrPlTbHIH6

๐Ÿ–ผ๏ธ Media
T
TechInnovationz
@TechInnovationz
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”48639500

$LAES ๐Ÿš€ Another major strength behind the SEALSQ x ICโ€™Alps integration. ICโ€™Alps isnโ€™t just any ASIC design house itโ€™s the first independent European ASIC company with a Quality Management System certified to the highest international standards : ๐Ÿ”น EN 9100:2018 (Aerospace & Defense) ๐Ÿ”น ISO 13485:2016 (Medical Devices) ๐Ÿ”น ISO 9001:2015 (Global Quality Standard) ๐Ÿ”น Common Criteria Site Certificatio audited under ANSSI supervision ๐Ÿ”น Actively progressing toward IATF-16949 for automotive This positions ICโ€™Alps as one of Europeโ€™s most trusted secure silicon partners, capable of delivering: โ€ข First-time-right ASIC development โ€ข End-to-end traceability for medical & aerospace chips โ€ข Secure environments for PQC, eSIM, secure elements & critical systems โ€ข A sovereign ecosystem across France & Switzerland With ICโ€™Alps now fully part of SEALSQ, the group gains: โœ… Aerospace-grade manufacturing discipline โœ… MedTech-certified development flows โœ… Security-audited infrastructure โœ… Europeโ€™s strongest foundation for post-quantum secure semiconductors This is the kind of quality backbone that differentiates SEALSQ globally especially as quantum-secure hardware demand accelerates. Link: https://t.co/fa6HOwIame @CreusMoreira ๐Ÿ‘๐Ÿ‘ #SEALSQ #LAES #ICAlps #Semiconductors #PQC #Cybersecurity #Aerospace #MedicalDevices #Quality

Media 1Media 2
๐Ÿ–ผ๏ธ Media
R
RapidResponse47
@RapidResponse47
๐Ÿ“…
Nov 25, 2025
151d ago
๐Ÿ†”03096042

.@mkratsios47: "It's a huge opportunity for the United States to continue to outpace the world in scientific discovery and innovation... the largest marshaling of the federal government scientific apparatus since the Apollo Project." https://t.co/BT5F4QzdJ3

@RapidResponse47 โ€ข Mon Nov 24 23:52

FACT SHEET: President Donald J. Trump Unveils the Genesis Mission to Accelerate AI for Scientific Discovery https://t.co/1fTfIMqLAy

๐Ÿ–ผ๏ธ Media
E
ENERGY
@ENERGY
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”65599102

President Trump is launching the most powerful scientific platform to ever be built, reminiscent of the Manhattan Project and Apollo programs: Genesis Mission. https://t.co/zmZES9V7PW

๐Ÿ–ผ๏ธ Media
M
MarioNawfal
@MarioNawfal
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”51254680

๐Ÿšจ๐Ÿ‡บ๐Ÿ‡ธ THE GENESIS MISSION: AMERICA JUST BUILT A SCIENCE CHEAT CODE The Genesis Mission isnโ€™t another government program with a glossy logo. Itโ€™s the first attempt to wire together the worldโ€™s most powerful supercomputers, the sharpest AI models, and locked-down datasets from every major scientific field - physics, bio, energy, climate, materials, medicine, all of it. The goal? Double Americaโ€™s research speed in 10 years. Thatโ€™s not incremental progress. Thatโ€™s a time-warp button. Think Manhattan Project resources + Space Race urgency + modern AI horsepower. Discoveries that used to take a decade could get crunched in months. Drug design, fusion modeling, climate simulation, protein engineering - everything gets faster, cheaper, and way less guesswork. Elonโ€™s right about this part: when you fuse compute + data + talent at national scale, you donโ€™t get โ€œinnovation.โ€ You get a scientific industrial revolution. Source: @WhiteHouse , Genesis .Energy .Gov

@MarioNawfal โ€ข Tue Nov 25 17:40

๐Ÿšจ๐Ÿ‡บ๐Ÿ‡ธ DOE DROPS HYPE TRAILER FOR TRUMPโ€™S โ€œGENESIS MISSIONโ€ - AND IT LOOKS LIKE AMERICA JUST GOT AN AI ORIGIN STORY The Department of Energy - the same agency that guards nukes and supercomputers - just released a promo video for Trumpโ€™s AI project, the Genesis Mission, and itโ€™s cu

Media 1
๐Ÿ–ผ๏ธ Media
O
Osint613
@Osint613
๐Ÿ“…
Nov 26, 2025
149d ago
๐Ÿ†”18136130

The Trump administration is in talks with Taiwan on a deal that would see Taiwanese companies, including TSMC, increase investment in U.S. semiconductor facilities and provide training for American workers. In return, Taiwan seeks a reduction of its 20% tariff on U.S. goods. Source: Reuters

Media 1
๐Ÿ–ผ๏ธ Media
W
WallStreetMav
@WallStreetMav
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”01406078

HUGE NEWS President Trump has ordered a mass โ€œre-interviewโ€ of every refugee admitted between Jan 2021โ€“Feb 2025, plus a FREEZE on all pending green-card applications. The Biden Admin basically had a wide open door for 4 years and let just about anyone thru. https://t.co/59oFaUt0I4

Media 1
๐Ÿ–ผ๏ธ Media
C
CreusMoreira
@CreusMoreira
๐Ÿ“…
Nov 25, 2025
150d ago
๐Ÿ†”88080609

Invitation Davos 2026 โ€“ โ€œTrust and Convergenceโ€ WISeKey, https://t.co/hlrfZKc1Lx, and SEALSQ are pleased to continue their 21-year tradition of presenting breakthrough technologies driving the Fourth Industrial Revolution. We are honored to invite you to our Davos 2026 Event, taking place in January 2026, dedicated to the theme: โ€œAge of Convergence: Trust as the Foundation of the Next Technological Eraโ€ As digital, physical, and biological systems converge, trust becomes the indispensable pillar enabling this new interconnected era. During this exclusive session, we will present our latest innovations in cybersecurity, secure space infrastructure, post-quantum semiconductors, and digital identityโ€”technologies designed to ensure that convergence accelerates human progress. Join global leaders, innovators, and partners for an in-depth exploration of how trusted technologies will define the future. ๐Ÿ”— Event details & registration: https://t.co/y0HLQ6NbOU Event Highlights โ€ขKeynotes by WISeKey, https://t.co/hlrfZKc1Lx, and SEALSQ leadership โ€ขStrategic announcements spanning space, cybersecurity, and post-quantum innovation โ€ขInsights from global industry experts โ€ขNetworking with international decision-makers Date: January 2026 Location: Davos, Switzerland (Full agenda and venue details to follow.) We look forward to welcoming you in Davos to shape together the trusted foundations of the Age of Convergence.

๐Ÿ–ผ๏ธ Media
S
sultanalnefaie
@sultanalnefaie
๐Ÿ“…
Nov 27, 2025
148d ago
๐Ÿ†”28325500

ู…ุดุฑูˆุน #ุงู„ุฏุฑุนูŠุฉ ู…ุณู‚ุท ุฑุฃุณ ุขู„ ุณุนูˆุฏ: - 40 ูู†ุฏู‚ ูุงุฎุฑุ› ุฑูŠุชุฒูƒุงุฑู„ุชูˆู†ุŒ ุฑุงูู„ุฒุŒ ููˆุฑุณูŠุฒูˆู† - 8 ุญุฏุงุฆู‚ ูˆ9 ู…ุชุงุญู - ุฃุดู‡ุฑ ุนู„ุงู…ุงุช ุงู„ู…ุทุงุนู… - 4 ูู†ุงุฏู‚ ุนุงู„ู…ูŠุฉ ุณุชูุชุชุญ - ูˆุณุงุฆู„ ุชุฑููŠู‡ ูˆุฏุงุฑ ุฃูˆุจุฑุง - ู…ูˆุนุฏ ุงูุชุชุงุญ ู…ุชูˆู‚ุน 2028 ู†ุชุงุฌ ุฑุคูŠุฉ #ูˆู„ูŠ_ุงู„ุนู‡ุฏ ุงู„ุฃู…ูŠุฑ #ู…ุญู…ุฏ_ุจู†_ุณู„ู…ุงู† ๐Ÿ‡ธ๐Ÿ‡ฆ ๐ŸŽฅ ุดุงู‡ุฏ ุงู„ุทุฑุงุฒ ุงู„ู†ุฌุฏูŠ ุงู„ุฃุตูŠู„: https://t.co/mIfvFr5P03

๐Ÿ–ผ๏ธ Media