Your curated collection of saved posts and media
(2024) The โHygiene Hypothesisโ is more accurately framed as the โOld friends hypothesis.โ We co-evolved with friendly bacteria & some parasites. We did not co-evolve with the crowd infections of mega-cities & 100,000 daily global flights. 9/ https://t.co/7DwOVexIYU
These were just some of my most popular posts from the last 10 years. Also, I have moved many of my older posts from medium & fast ai over to my current site, so you can find them all in one place: 10/ https://t.co/WE7RqUYktD
(2025) A biologist discovered 100s of errors in a paper that used AI to classify enzymes. Publishing incentives rewards flashy results, not diligent fact-checking, and it is very difficult to evaluate AI claims in areas we are not experts. 11/ https://t.co/BJDSsDIAkK
๐จ New paper! ๐ MENLO: From Preferences to Proficiency We introduce a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles. ๐ Paper: https://t.co/n8Z2cDJm5a ๐ค Data: https://t.co/fzM6Um32nD ๐งตDetails ๐

Multilingual LLMs โ Native speakers Evaluating native-like generation across language varieties is hard, subjective, and inconsistent. MENLO provides: โ A structured evaluation protocol โ Human preference data โ Model-based reward modeling for 47 languages https://t.co/zEBXazlUUG
What makes a native speaker? We go beyond fluency and consider a responseโs factuality and tone with regard to the addressee and local context. We define 4 quality dimensions reflecting these attributes. https://t.co/nFqYRrA0nJ
MENLO framework includes: ๐ 6,423 human-labeled prompt-response preference pairs ๐ 47 language varieties ๐งญ 4 structured quality dimensions (fluency, tone, etc.) โ High inter-annotator agreement โ๏ธ Pairwise judgments โ better signal https://t.co/SW5nX5FwrX
We benchmark: 1. Zero-shot LLM judges 2. RL- & SFT-trained reward models 3. Human raters (gold) Findings: โ Pairwise + rubric-based eval boosts zero-shot LLM judge performance โ But: gap with humans remains across languages https://t.co/UfSuaq2xdZ
We explore: ๐ Reinforcement learning ๐ Reward shaping ๐ง Multi-task learning across languages/dimensions โ These improve multilingual reward model quality and correlation with human judgments. https://t.co/wDudu4goxv
Reward models trained with MENLO can also be used generatively: โ As scoring functions for multilingual generation โ To improve proficiency and audience alignment in LLM outputs Still: some human-model judgment divergences persist, LLM evaluators are overconfident about the improvement.
Key takeaways: โ Fine-grained LLM judges benefit from pairwise evaluation and structured rubrics โ RL-trained cross-lingual reward modeling is feasible and helpful โ MENLO pushes toward scalable, preference-aligned multilingual generation https://t.co/57kx5CaMs5
We release: ๐ MENLO dataset โ๏ธ Evaluation framework + rubrics ๐ Judge/RM prompts ๐ฌ Benchmark for multilingual reward modeling Paper: https://t.co/LKAy493nlU Data: https://t.co/co8O5WOOKp https://t.co/4TegeRmzy0

โDuniaโ means Earth. Our Goal is simple: To build the engine that discovers the materials of the future for this planet. Because every leap in human history began with a material. https://t.co/rbT2ZpovJ3
We'll be organizing the Second Big Picture Workshop at #ACL2026. This is a meta-workshop, which explores research narratives and how they connect with each other. Our talks will feature multiple speakers that argue different positions of a topic. https://t.co/SXTqzuNVkB
Even when new AI models bring clear improvements in capabilities, deprecating the older generations comes with downsides. An update on how weโre thinking about these costs, and some of the early steps weโre taking to mitigate them: https://t.co/VCTMW0d2e8
Always more to do but I'm proud of how safe Opus 4.5 is! (System Card section 6.2) https://t.co/ncvy5rIblk https://t.co/4fXIgBHcI9
I'm grading all my friends on this graph and then confronting them about it https://t.co/OtrIAMdTzV
Some of y'all need to make progress on this benchmark https://t.co/6qC4wONhWR
I'm grading all my friends on this graph and then confronting them about it https://t.co/OtrIAMdTzV
@Miles_Brundage See page 69 of the system card for more possible metrics along which some of y'all need to make progress! https://t.co/BfVqyGguzP
Weโre sharing a case study on alignment evaluations with @AnthropicAI on Claude Opus 4.5, Opus 4.1 and Sonnet 4.5. We ask: would an AI assistant used inside a frontier lab quietly sabotage AI safety research? Overall results are encouraging, but with important caveats.๐งต https://t.co/tcpFrolCn6
Today, @amazon announced a MAJOR plan to build AI and high-performance computing for the U.S. Government: "We're giving agencies expanded access to advanced AI capabilities that will enable them to accelerate critical missions from cybersecurity to drug discovery."๐ฅ https://t.co/ht4Wn7TPtE
๐จ๐บ๐ธ๐ป๐ช The US Has Assembled A Strike Force Around Venezuela That Resembles The Opening Phase Of A Full Scale Intervention The US has quietly assembled one of its most powerful regional force groupings in years around the Caribbean basin, all positioned within striking distance of Venezuela. From Puerto Rico to the Caribbean Sea, every category of American firepower is now in place. ๐น Long range bombers B-52H, B-1B, and B-2A aircraft are positioned for strike missions from CONUS. Their role includes potential deep strike and JASSM launches with ranges beyond a thousand kilometers. ๐น Carrier strike capability The Gerald R Ford carrier strike group is present with a full carrier air wing. F-35C, F A-18E F Super Hornets, E-A 18G Growlers, E-2D Advanced Hawkeyes, and MH-60 helicopters all sit inside strike range. Destroyers in the group carry Tomahawk missiles. ๐น Tomahawk land attack missile concentration More than two hundred Tomahawk missiles are available in the Caribbean through multiple destroyers and cruisers. Their loadouts give the US the ability to hit fixed targets across Venezuela within minutes. ๐น JASSM strike potential B-2A and B-52H bombers can launch AGM-158 JASSM standoff missiles. These weapons allow strikes without ever entering Venezuelan airspace. ๐น US Marine Corps expeditionary forces The Iwo Jima amphibious ready group and the 22nd Marine Expeditionary Unit are in the Caribbean Sea. Assault ships carry Osprey aircraft, attack helicopters, landing craft, and infantry capable of rapid beach entry or inland seizure. ๐น Forward positioned aircraft in Puerto Rico MQ-9 Reaper drones, F-15 fighters, KC-135 tankers, and C-130 transports are forward deployed to Roosevelt Roads. These assets allow persistent ISR, refueling operations, and fast deployment of strike aircraft. ๐น US special forces presence Marine Raiders and other special operations elements are in theatre. Their missions include recon, target design, and advance preparation. ๐น Support and logistics power C-17 and C-130 aircraft, KC-10 and KC 135 tankers, and all rotary wing assets provide sustained operational tempo for any strike or landing operation. The map shows a posture that is not normal. It is not accidental. When long range bombers, Tomahawk carriers, a full carrier strike group, and a Marine amphibious group appear at the same time in the same region, the message is unmistakable. Washington is positioning itself for the ability to strike Venezuela across air, sea, and land at any moment.
Americans are frustrated with the economy -- and the outlook for 2026. Every consumer sentiment gauge is saying the same thing: Sentiment is down to the worst levels since April (or since inflation summer of 2022). Why? Because the middle class is feeling squeezed. (And lower-income households are basically in a recession) 1) It's hard to get a job (unless you work in healthcare) 2) The cost of living is up, esp. the basics of food, utilities, healthcare, insurance and auto repair. 3) Real incomes are trending down as inflation rises and pay gains are getting stingier. In November, "expectations for increased household incomes shrunk dramatically," the Conference Board said today. Expect more of that in 2026.

30% of the U.S. Navyโs deployed warships are currently in the Caribbean. Something tells me this is not only about Venezuela. https://t.co/NrPlTbHIH6
$LAES ๐ Another major strength behind the SEALSQ x ICโAlps integration. ICโAlps isnโt just any ASIC design house itโs the first independent European ASIC company with a Quality Management System certified to the highest international standards : ๐น EN 9100:2018 (Aerospace & Defense) ๐น ISO 13485:2016 (Medical Devices) ๐น ISO 9001:2015 (Global Quality Standard) ๐น Common Criteria Site Certificatio audited under ANSSI supervision ๐น Actively progressing toward IATF-16949 for automotive This positions ICโAlps as one of Europeโs most trusted secure silicon partners, capable of delivering: โข First-time-right ASIC development โข End-to-end traceability for medical & aerospace chips โข Secure environments for PQC, eSIM, secure elements & critical systems โข A sovereign ecosystem across France & Switzerland With ICโAlps now fully part of SEALSQ, the group gains: โ Aerospace-grade manufacturing discipline โ MedTech-certified development flows โ Security-audited infrastructure โ Europeโs strongest foundation for post-quantum secure semiconductors This is the kind of quality backbone that differentiates SEALSQ globally especially as quantum-secure hardware demand accelerates. Link: https://t.co/fa6HOwIame @CreusMoreira ๐๐ #SEALSQ #LAES #ICAlps #Semiconductors #PQC #Cybersecurity #Aerospace #MedicalDevices #Quality

.@mkratsios47: "It's a huge opportunity for the United States to continue to outpace the world in scientific discovery and innovation... the largest marshaling of the federal government scientific apparatus since the Apollo Project." https://t.co/BT5F4QzdJ3
FACT SHEET: President Donald J. Trump Unveils the Genesis Mission to Accelerate AI for Scientific Discovery https://t.co/1fTfIMqLAy
President Trump is launching the most powerful scientific platform to ever be built, reminiscent of the Manhattan Project and Apollo programs: Genesis Mission. https://t.co/zmZES9V7PW
๐จ๐บ๐ธ THE GENESIS MISSION: AMERICA JUST BUILT A SCIENCE CHEAT CODE The Genesis Mission isnโt another government program with a glossy logo. Itโs the first attempt to wire together the worldโs most powerful supercomputers, the sharpest AI models, and locked-down datasets from every major scientific field - physics, bio, energy, climate, materials, medicine, all of it. The goal? Double Americaโs research speed in 10 years. Thatโs not incremental progress. Thatโs a time-warp button. Think Manhattan Project resources + Space Race urgency + modern AI horsepower. Discoveries that used to take a decade could get crunched in months. Drug design, fusion modeling, climate simulation, protein engineering - everything gets faster, cheaper, and way less guesswork. Elonโs right about this part: when you fuse compute + data + talent at national scale, you donโt get โinnovation.โ You get a scientific industrial revolution. Source: @WhiteHouse , Genesis .Energy .Gov
๐จ๐บ๐ธ DOE DROPS HYPE TRAILER FOR TRUMPโS โGENESIS MISSIONโ - AND IT LOOKS LIKE AMERICA JUST GOT AN AI ORIGIN STORY The Department of Energy - the same agency that guards nukes and supercomputers - just released a promo video for Trumpโs AI project, the Genesis Mission, and itโs cu
The Trump administration is in talks with Taiwan on a deal that would see Taiwanese companies, including TSMC, increase investment in U.S. semiconductor facilities and provide training for American workers. In return, Taiwan seeks a reduction of its 20% tariff on U.S. goods. Source: Reuters
HUGE NEWS President Trump has ordered a mass โre-interviewโ of every refugee admitted between Jan 2021โFeb 2025, plus a FREEZE on all pending green-card applications. The Biden Admin basically had a wide open door for 4 years and let just about anyone thru. https://t.co/59oFaUt0I4
Invitation Davos 2026 โ โTrust and Convergenceโ WISeKey, https://t.co/hlrfZKc1Lx, and SEALSQ are pleased to continue their 21-year tradition of presenting breakthrough technologies driving the Fourth Industrial Revolution. We are honored to invite you to our Davos 2026 Event, taking place in January 2026, dedicated to the theme: โAge of Convergence: Trust as the Foundation of the Next Technological Eraโ As digital, physical, and biological systems converge, trust becomes the indispensable pillar enabling this new interconnected era. During this exclusive session, we will present our latest innovations in cybersecurity, secure space infrastructure, post-quantum semiconductors, and digital identityโtechnologies designed to ensure that convergence accelerates human progress. Join global leaders, innovators, and partners for an in-depth exploration of how trusted technologies will define the future. ๐ Event details & registration: https://t.co/y0HLQ6NbOU Event Highlights โขKeynotes by WISeKey, https://t.co/hlrfZKc1Lx, and SEALSQ leadership โขStrategic announcements spanning space, cybersecurity, and post-quantum innovation โขInsights from global industry experts โขNetworking with international decision-makers Date: January 2026 Location: Davos, Switzerland (Full agenda and venue details to follow.) We look forward to welcoming you in Davos to shape together the trusted foundations of the Age of Convergence.
ู ุดุฑูุน #ุงูุฏุฑุนูุฉ ู ุณูุท ุฑุฃุณ ุขู ุณุนูุฏ: - 40 ููุฏู ูุงุฎุฑุ ุฑูุชุฒูุงุฑูุชููุ ุฑุงููุฒุ ููุฑุณูุฒูู - 8 ุญุฏุงุฆู ู9 ู ุชุงุญู - ุฃุดูุฑ ุนูุงู ุงุช ุงูู ุทุงุนู - 4 ููุงุฏู ุนุงูู ูุฉ ุณุชูุชุชุญ - ูุณุงุฆู ุชุฑููู ูุฏุงุฑ ุฃูุจุฑุง - ู ูุนุฏ ุงูุชุชุงุญ ู ุชููุน 2028 ูุชุงุฌ ุฑุคูุฉ #ููู_ุงูุนูุฏ ุงูุฃู ูุฑ #ู ุญู ุฏ_ุจู_ุณูู ุงู ๐ธ๐ฆ ๐ฅ ุดุงูุฏ ุงูุทุฑุงุฒ ุงููุฌุฏู ุงูุฃุตูู: https://t.co/mIfvFr5P03