Your curated collection of saved posts and media

Showing 31 posts ยท last 14 days ยท by score
C
chrisalbon
@chrisalbon
๐Ÿ“…
Jul 01, 2026
3d ago
๐Ÿ†”25059902
โญ0.34

Right now Codex is using Computer Use to organize the 1500 PDFs I have in GoodNotes while I watch the world cup. This is my "ai folds clothes while I make art" moment. Thanks @jxnlco and co

V
vanstriendaniel
@vanstriendaniel
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”58712878

You can now use 100s of tools with @huggingface Buckets, thanks to the new S3 API! Usually just one or two lines to change. https://t.co/V6PPIoWlh7

Media 1
๐Ÿ–ผ๏ธ Media
R
rachpradhan
@rachpradhan
๐Ÿ“…
Jun 28, 2026
5d ago
๐Ÿ†”12950961

holy shit @ivanleomk i used @GoogleDeepMind's gemma4(with codegraff) on the flight to Japan to read through a few papers i was interested in and it cooked!!(i think it just needs a really good harness) https://t.co/IksPGvhY5p (pre-release) https://t.co/yWyH0VKlCr

Media 2
๐Ÿ–ผ๏ธ Media
L
lhoestq
@lhoestq
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”03015169

S3-clients + Hugging Face Buckets = ๐Ÿ’ฅ You can now query and write directly to HF Storage Buckets via the S3-compatible API Just one secret. Done. ๐Ÿš€ https://t.co/id0DaVJkSS

Media 1
๐Ÿ–ผ๏ธ Media
I
ivanleomk
@ivanleomk
๐Ÿ“…
Jun 25, 2026
9d ago
๐Ÿ†”64290329
โญ0.40

@an_engineer_log GLM 5.2 is a fantastic model, great work by the https://t.co/xQ0FtPrDZv team! We designed gemma so that it's built for on device intelligence - the smallest quantized variant just needs slightly over 1GB of memory! Open Source makes all great progress possible :)

G
googlegemma
@googlegemma
๐Ÿ“…
Jun 24, 2026
10d ago
๐Ÿ†”35609111

Want to host your own Gemma hackathon? Weโ€™re sponsoring 1-day hackathons on Kaggle to help developers dive into open models! From building lightweight tools to tackling your community's unique challenges, this is your chance to lead the charge with Gemma 4.๐Ÿ‘‡ https://t.co/3N0eOlXyID

Media 1
๐Ÿ–ผ๏ธ Media
B
BlancheMinerva
@BlancheMinerva
๐Ÿ“…
Jun 21, 2026
13d ago
๐Ÿ†”22530245

@bobbycxy We built an influence function library recently and have been using influence scores to explore model behaviors. It would be interesting to see if the documents that most influence game-play relevant behaviors are systematically different across models https://t.co/wmMRIQnsgf

Media 1
๐Ÿ–ผ๏ธ Media
E
enriqdev
@enriqdev
๐Ÿ“…
Jun 24, 2026
9d ago
๐Ÿ†”21915397

Google fue muy listo; usan los acelerรณmetros de miles de telรฉfonos Android cรณmo una red global de sismos, toda esa data se envรญa y Google logrรณ una forma de detectar esas ondas a tiempo y enviar las alertas. https://t.co/U7VFGxTCQ5

Media 1
๐Ÿ–ผ๏ธ Media
S
SakanaAILabs
@SakanaAILabs
๐Ÿ“…
Jul 03, 2026
1d ago
๐Ÿ†”03779928

We are pleased to present our latest research at #ICML2026, โ€œBridging Spherical Black-Box Optimizersโ€ https://t.co/3FT6vn0dSn When optimizing through simulators, external APIs, or in reinforcement learning, gradients are often unavailable. Black-Box Optimization (BBO) fills this gap, but the field has been historically split into two categories: 1. Parametric Methods: Algorithms like Evolution Strategies (ES) scale to high dimensions but only find a single solution. 2. Nonparametric Methods: Algorithms like Consensus-Based Optimization (CBO) find multiple solutions but fail in high dimensions. Our team asked a simple question: what if they are all doing the same thing? In our paper, we showed that these distinct families are actually variations of a single update equation. By bridging this theoretical gap, we can now engineer custom hybrid optimizers for specific tasks. A key application of this is merging foundation models. Building on our previous work in Evolutionary Model Merging, we faced a computational challenge. Evaluating large language models at every step is resource-intensive, but using a smaller evaluation dataset causes standard unimodal optimizers to overfit. By treating LLM merging as a multimodal problem and deploying our newly developed hybrid optimizers, AdaPol and SchedPol, we successfully navigated this issue. The algorithms identified multiple distinct optima on the smaller dataset, allowing us to find generalized, high-quality merges at a fraction of the compute cost.

Media 1
๐Ÿ–ผ๏ธ Media
Z
zanoga
@zanoga
๐Ÿ“…
Jun 24, 2026
10d ago
๐Ÿ†”83531958

Finally finished building my AI datacenter! ๐Ÿš€ 32x3090s across 4 servers (8 GPUs each), all connected over InfiniBand. The whole setup is solar-powered with a massive battery bank and generator backup. More technical details and benchmarks coming soon. https://t.co/8GfedrSzNp

Media 1
๐Ÿ–ผ๏ธ Media
W
wey_gu
@wey_gu
๐Ÿ“…
Jun 24, 2026
10d ago
๐Ÿ†”56333929

Hermes ๅผ•ๅ…ฅไบ† /learn ไปŽไปปไฝ• input ไน ๅพ—ๅฏๅค็”จ็š„ๆŠ€่ƒฝ๐Ÿซก Nowledge Mem ็š„ Skills ไนŸๆœ‰ไธ€ๆ ท็š„่ƒฝๅŠ› ้™คไบ†้ป˜้ป˜ไธปๅŠจไปŽๅކๅฒไธŠไธ‹ๆ–‡้‡Œๆ‘ธ็ดขๅ‡บๅฏ่ƒฝๆฝœๅœจๆž„ๆˆ skills ็š„ๆœบไผšๆ็คบ็ป™็”จๆˆท๏ผŒ็”จๆˆทๆฟ€ๆดปไน‹ๅŽๅฏไปฅๅœจๆ‰€ๆœ‰ agent ้‡Œ่ฐƒ็”จ๏ผŒๅนถไธ”้š็€่ฐƒ็”จ่ฟ˜ไผšไธๆ–ญ่‡ชไผ˜ๅŒ–ใ€ๆผ”่ฟ›ๅค–๏ผ› GUI ้‡Œ็š„ Skill Creator ๅ…่ฎธๆˆ‘ไปฌไธปๅŠจๅˆ›ๅปบ Skill๏ผŒๅฎƒไผš่‡ชๅŠจๆ‰พๅˆฐ็›ธๅ…ณ็š„ๅކๅฒไธŠไธ‹ๆ–‡่ฟ›่กŒๅˆ›ๅปบๅ’Œ่‡ชไผ˜ๅŒ–ใ€‚ ๅ…ถๆฌกๆˆ‘ไปฌๆ นๆฎ็”จๆˆท่€ๅธˆไปฌ็š„ๅปบ่ฎฎ๏ผŒ้—ญ็Žฏไบ†่ฟ™ไธชไธปๅŠจ flow๏ผŒๅขžๅŠ ไบ† cli ๅ’Œ ai-now ้‡Œ็š„ไธปๅŠจๅˆ›ๅปบ Skills ็š„ๅ…ฅๅฃ

@Teknium โ€ข Tue Jun 23 21:07

Hermes can now LEARN from any source or set of sources, build a skill, test it live, and crystallize new learnings. Just run /learn and pass it sources, past sessions, URLs, docs, whatever you think will help it learn, and it'll go from 0 to 1 to create you a skill!

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
S
StasBekman
@StasBekman
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”34087642

After many months of intense work the @Snowflake AI Research team is happy to present to you the new open source project: Arctic RL https://t.co/B5EgRoSOCb - Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required - ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs - Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions - Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%

Media 1
๐Ÿ–ผ๏ธ Media
S
skcd42
@skcd42
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”35131891
โญ0.34

/goal is live on Grok Build. We use a team of agents: - implementors - skeptics - code reviewers - planners and a mix of grok build and composer in various roles. Would love to hear your feedback on how ambitious you can be with /goal and where the gaps are

@ โ€ข

K
Kappische
@Kappische
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”08498384

Iโ€™m surprised the gaming community havenโ€™t pushed harder to work on Neural Texture Compression considering the RAM squeeze weโ€™re seeing. Unity, Unreal, Valve, Microsoft, Sony, Nintendo, Intel, AMD, Nvidia should help push this as a standard where possible. https://t.co/kwbLd7UAgg https://t.co/yIjTazeqv8

Media 1Media 2
๐Ÿ–ผ๏ธ Media
๐Ÿ”jxnlco retweeted
I
Iyan Moon
@iyanmoonyang
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”19455554
โญ0.32

first ee project: I put a bitmap running horse on a raspberry pi pico using codex! I couldnโ€™t get the snout right ๐Ÿ˜ญ so it looks a bit stubby thank you @covacut for the ingredients https://t.co/gukxpMwPrr

โค๏ธ45
likes
๐Ÿ”2
retweets
J
Jason
@Jason
๐Ÿ“…
Jul 02, 2026
2d ago
๐Ÿ†”32040706
โญ0.40

I KEEP BLOWING THROUGH MY PERPLEXITY COMPUTER AND CLAUDE COWORK TOKENS I HAVE SOME RESEARCH JOBS THAT I WANT TO RUN CONSTANTLY / HOURLY INDEFINITELY NEED TO RUN LOCAL OPEN-SOURCE MODELS CONTINUOUSLY IN MY OWN PRIVATE CLOUD AT THIS POINT TELL ME WHAT I SHOULD DO... @NousResearch TIME?

๐Ÿ”jeremyphoward retweeted
S
Hanchi Sun
@sun_hanchi
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”12833516
โญ0.36

https://t.co/gPwut02Ilj People are missing out on how big a deal Longcat 2.0 by Meituan (aka "Chinese Doordash") is. Near frontier performance, trained on 50k Chinese domestic accelerators! The first ever to achieve this! https://t.co/SNLdPUfkZZ

โค๏ธ83
likes
๐Ÿ”7
retweets
M
matrix_build
@matrix_build
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”41195805
โญ0.34

what if you can run an entire 0-person company โ€” without the grind of running a team? matrix is the runtime that makes it possible. in last weekโ€™s limited beta, our users created tens of thousands of new 0-person companies and started real businesses in matrix. today, matrix is open to everyone. launch yours โ†“

H
hugothomel
@hugothomel
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”73540432

we made an interactive movie in a day - powered by a world model - running in real time - you can explore and make your own choices this is Operation Pandora. play now ๐Ÿ‘‡ https://t.co/gQ2NevjC52

๐Ÿ–ผ๏ธ Media
๐Ÿ”Tim_Dettmers retweeted
S
Shanda Li ้ปŽๅ–„่พพ
@Shanda_Li_2000
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”81727101
โญ0.38

Can an AI agent surface why an ML paper might be hard to reproduce โ€” just by reading it, without running any code? We build ReproRepo, a framework for auditing reproducibility with agents. Across 1,149 recent papers, the best agent surfaced a semantically related, human-reported reproducibility blocker for ~90% of them. ๐Ÿงต๐Ÿ‘‡

โค๏ธ32
likes
๐Ÿ”8
retweets
S
simonw
@simonw
๐Ÿ“…
Jul 03, 2026
1d ago
๐Ÿ†”20215566
โญ0.38

The most interesting Fable tip I've heard so far is to let the model use its own judgement as much as possible I told it "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent" and it seems to be saving a lot of tokens

Y
yishan
@yishan
๐Ÿ“…
Jun 27, 2026
7d ago
๐Ÿ†”33780960

A big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is published. This time, we have something better! The typical reaction to research results like this roughly goes "You're just testing on old models. Today's models are way better and surely can do it now!" But the best solution is for these papers to also open-source all of their testing framework so that upon publication, others can reproduce their results, as well as run it on the newest models of the day - and into the future. After all, "this is the worst they'll ever be" so what really matters is determining when they DO pass the threshold. As it turns out, the authors of this paper DID open-source their evaluation framework! Here: https://t.co/iXLwmItKwu So I figured... let's re-run the tests on the latest models! Summary of our results are here: https://t.co/1Dzj0UcJUQ One drawback is that, unfortunately, the authors didn't (or weren't legally able to) open-source ALL the testing data, since apparently some of it is copyrighted by JAMA/NEJM etc. That's a separate problem with the medical research publishing industry for another time. However, we were able to reproduce the test on the public datasets they did include! First, we re-ran the same tests (as closely as we could) on the old models the paper claimed to use, in order to establish a baseline and determine how much "drift" there would be. (Answer: not too much) Then we ran those tests on the newest frontier models we could find. The results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use. In fact, the paper's criterion for "fit for reliable medical use" is more stringent, requiring the models to be robust under perturbation and bad data, knowing when to say there's not enough information, give clinically valid reasoning rather than hallucinations, etc. Those sound pretty reasonable to me. I wasn't able to reproduce that kind of qualitative evaluation, but even on the basic pass/fail test using public datasets of interpreting radiology images, the newest models are better, but not yet quite good enough. Nevertheless, I would like to praise the paper's authors for at least open-sourcing what they could, enabling me to (fairly quickly) attempt to reproduce their results. This is definitely a step in the right direction! While my reproduction wasn't able to be comprehensive, it certainly gave me useful directional info and - perhaps more importantly - allowed me (a random dude on the internet) to directly reproduce the results in their paper and validate them. I would like to encourage ALL authors of research papers on AI models to do similar open-sourcing of their experimental frameworks!

@ โ€ข

Media 1Media 2
๐Ÿ–ผ๏ธ Media
R
RisingSayak
@RisingSayak
๐Ÿ“…
Jul 03, 2026
1d ago
๐Ÿ†”63683632

We just released a new version of Diffusers! This includes many new image and video pipelines (Ideogram4, MotifVideo, etc.). But it also includes the recently popular DiffusionGemma ๐ŸคŒ Check out the notes for full details. https://t.co/49lDK8Vnnk

Media 1
๐Ÿ–ผ๏ธ Media
J
JakeABoggs
@JakeABoggs
๐Ÿ“…
Jul 02, 2026
2d ago
๐Ÿ†”84315817

Fable 5 is a large step for Anthropic's vision capabilities and effectively ties with GPT-5.5 on HieroglyphBench, my benchmark which tests how well VLMs can transcribe ancient Egyptian hieroglyphs However, they're both still far behind the Gemini series, where 3.5 Flash has more than double the score

Media 1
๐Ÿ–ผ๏ธ Media
W
winglian
@winglian
๐Ÿ“…
Jun 26, 2026
8d ago
๐Ÿ†”87497231
โญ0.32

@_Suresh2 @casper_hansen_ materializes them in smaller chunks so it has a lower peak VRAM requirement.

E
ethanlshen
@ethanlshen
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”85642132
โญ0.38

I'll be presenting SERA, Ai2 's first coding agents, at ICML on July 7th ๐Ÿ‡ฐ๐Ÿ‡ท Excited to chat about unit-test free verification, code data curation, and specialized coding agents. Come by, say hi, and grab some stickers ๐Ÿฅณ

@allen_ai โ€ข Tue Jan 27 16:12

Introducing Ai2 Open Coding Agentsโ€”starting with SERA, our first-ever coding models. Fast, accessible agents (8Bโ€“32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. ๐Ÿงต

M
majidmanzarpour
@majidmanzarpour
๐Ÿ“…
Jun 23, 2026
11d ago
๐Ÿ†”12178585

Hey @claudeai Opus 4.8 let's build a fully procedural spider in @threejs๐Ÿ•ท๏ธ โ€ฆso we did. Feet-driven IK + a Cruse-rule gait = it walks any terrain. Then we built a 42-scenario test harness and drove the locomotion to 100%. https://t.co/5HU9BBvpdf

@majidmanzarpour โ€ข Tue Jun 23 00:39

Tip: if you're running into visual/physics issues in your @threejs game, prompt your agent to "build a visual test harness with test cases and results" for the problem Pair it with browser access & "/goal iterate on the visual test harness and logic until all test cases pass 10

๐Ÿ–ผ๏ธ Media
D
daniel_mac8
@daniel_mac8
๐Ÿ“…
Jun 20, 2026
14d ago
๐Ÿ†”77105538

This is one of the coolest open-source AI agent projects I've seen in a while: 'Understand Anything' It's a plugin for Claude Code, Codex, OpenCode etc. that analyzes your codebase and turns it into a knowledge base that you can interact with. It explains the codebase to you, rather than showing you the structure. It seems like it's designed for code but I opened my Obsidian vault of podcast highlights in Claude Code, then ran /understand. The result is a knowledge graph that I can search of highlights from 888 podcast episodes and 144K lines of markdown text.

๐Ÿ–ผ๏ธ Media
D
derrickcchoi
@derrickcchoi
๐Ÿ“…
Jul 01, 2026
3d ago
๐Ÿ†”12738602
โญ0.36

Itโ€™s been great working with @HP on using Codex for their engineering work. Looking forward to deepening the partnership. https://t.co/Y5t6lepQMw

L
levie
@levie
๐Ÿ“…
Jun 20, 2026
14d ago
๐Ÿ†”48782515
โญ0.40

Pretty remarkable whatโ€™s happening with open weights AI right now. Weโ€™re seeing models achieve SOTA results on specific tasks, and getting close to frontier on some areas of coding and other domains. The more that open weights is able to maintain only a marginal gap from the frontier, instead of a widening gap, the more value that can be created with AI. Incidentally, this is actually fine for the frontier labs as well; if we can lower the cost of an overall task then AI usage goes up in general. Youโ€™re still likely using frontier models for planning, orchestration, reviewing, and other parts of work. But this is all very good for the applied layer of AI, which is now in a great position to cost optimize workloads with cheaper models or use tailored open models post-trained for specific tasks to improve performance.

@Designarena โ€ข Fri Jun 19 17:58

https://t.co/JSn0lDCNkB

W
whosamberella
@whosamberella
๐Ÿ“…
Jul 04, 2026
4h ago
๐Ÿ†”90283195
โญ0.36

what codex did for me recently: - find and book empty piano practice room in manhattan, and it found free Steinway for my practice - search for restaurants and find available bookings on resy - now looking for rooftops available for 7/4 fireworks