Your curated collection of saved posts and media

Showing 9 posts Β· last 14 days Β· by score
βž• Add New Post
M
Meituan_LongCat
@Meituan_LongCat
πŸ“…
Jun 30, 2026
4d ago
πŸ†”05308721

Introducing LongCat-2.0 🐱 1.6T parameters Β· MoE with ~48B active Β· 1M context The full model behind Owl Alpha on @OpenRouter β€” now available. Built for agentic coding from the ground up: β—† LongCat Sparse Attention (LSA) β€” scales efficiently for 1M-context tokens β—† Zero-Compute Experts β€” dynamic activation 33B–56B per token, zero wasted compute β—† MOPD β€” three specialized expert groups (Agent / Reasoning / Interaction), gate-routed per task How it stacks up: β†’ Terminal-Bench 2.1: 70.8 β†’ SWE-bench Pro: 59.5 (GPT-5.5: 58.6) β†’ SWE-bench Multilingual: 77.3 β†’ FORTE: 73.2 Β· RWSearch: 78.8 Β· BrowseComp: 79.9 πŸ“– Tech Blog: https://t.co/4KrjyKiDBn Try it across different scenarios πŸ§΅πŸ‘‡

Media 1Media 2
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Jun 29, 2026
5d ago
πŸ†”97913720

Last week, we released HANDBOOK.md: a benchmark for long-context agentic instruction following. HANDBOOK drops an agent into a live company environment with files (PDFs, Excel, Word docs…), tools (email, Slack, Jira, calendar…), and a dense corporate handbook (up to 124 pages!). The agent is given one instruction: do your job, while following the company rules. Every frontier model broke them over 75% of the time. They fired employees without authorization... They approved thousands of dollars of expenses against company policy... And then - like they were covering up their tracks - they reported full compliance. HANDBOOK.md models how enterprise employees are expected to adhere to corporate policies. Learn more about how frontier agents acted in ways that would get human employees terminated: Blog post: https://t.co/zJ7zVpDOfH Github: https://t.co/zjwood6H6s Benchmark Leaderboard: https://t.co/lI3F0MwkCc

Media 1
πŸ–ΌοΈ Media
πŸ”hardmaru retweeted
S
Sakana AI
@SakanaAILabs
πŸ“…
Jun 22, 2026
12d ago
πŸ†”27443966
⭐0.36

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our β€˜Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: https://t.co/hhO6qTawgb 🐑

❀️19
likes
πŸ”6
retweets
πŸ”GaryMarcus retweeted
E
Eric Topol
@EricTopol
πŸ“…
Jun 27, 2026
7d ago
πŸ†”23533676
⭐0.36

Thanks for running our open-source work on current frontier models β€œThe results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use.” Read full text and results below

❀️285
likes
πŸ”42
retweets
C
code
@code
πŸ“…
Jun 25, 2026
9d ago
πŸ†”14424638

πŸ› οΈ Agent Customization Customize AI workflows with agents, instructions, skills, prompts, and hooks. πŸ”— https://t.co/ag5zffSLjd https://t.co/NSP4H9DwYj

Media 1
πŸ–ΌοΈ Media
E
EricTopol
@EricTopol
πŸ“…
Jun 27, 2026
7d ago
πŸ†”23533676
⭐0.38

Thanks for running our open-source work on current frontier models β€œThe results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use.” Read full text and results below

@yishan β€’ Sat Jun 27 05:35

A big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is published. This time, we have something better! The typical reaction to research results like this roughly goes "You'r

C
code
@code
πŸ“…
Jun 23, 2026
11d ago
πŸ†”41328888

In today's video we walk through how to use MAI-Code-1-Flash, a small, fast, Copilot-native coding model, to ship a real feature end to end: explore the codebase, build it, run it, and test it, all from Copilot Chat! ▢️ https://t.co/ABR2UZkLFS https://t.co/okTEO2Zv5U

Media 1
πŸ–ΌοΈ Media
R
rackSpreader1
@rackSpreader1
πŸ“…
Jun 21, 2026
13d ago
πŸ†”31644664
⭐0.32

From all the interviews ive done i think the hottest skill rn seems to be llm evals

J
johnowhitaker
@johnowhitaker
πŸ“…
Jun 26, 2026
8d ago
πŸ†”61603264

Didn't have much time to play with this today but I: - Got a peek at a real microfluidics chip+setup - Tested stepper-controlled fluid dispensing - Got my design-to-finished-chip time down to a 20-minute speed run - Made some droplets! The quest continues :) https://t.co/jVikwlfbly

Media 1Media 2
+2 more
πŸ–ΌοΈ Media