Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”78345880

WebResearcher: Unbounded reasoning for long-horizon agents โ€ข IterResearch: Iterative deep-research paradigm (avoids context suffocation & noise) โ€ข WebFrontier: Tool-augmented data engine for complex research tasks โ€ข Parallel agents + synthesis โ†’ scalable, evidence-grounded reasoning โ€ข Beats proprietary systems: 36.7% on HLE, 51.7% on BrowseComp abs: https://t.co/AvnPicEGWM (3/N)

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”83196253

AgentFounder: Scaling Agents via Continual Pre-training โ€ข First to propose Agentic CPT โ†’ builds agentic foundation models before fine-tuning โ€ข Solves post-training bottlenecks (capabilities + alignment conflict) โ€ข Data synthesis: First-order (planning/actions) + Higher-order (multi-step decision) โ€ข Two-stage training (32K โ†’ 128K context) โ€ข SOTA: 39.9% BrowseComp-en, 72.8% GAIA abs: https://t.co/LTCuW2LCo4 (5/N)

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”27416197

WebWeaver: Structuring Web-Scale Evidence for Deep Research โ€ข Dual-agent framework (Planner + Writer) โ€ข Dynamic outlines: search โ†” refine โ†” search (human-like loop) โ€ข Memory-grounded, section-by-section synthesis โ†’ avoids long-context failures โ€ข SOTA across DeepResearch Bench, DeepConsult, DeepResearchGym โ€ข Produces reliable, well-cited, structured reports abs: https://t.co/WsTbHV7ECO (6/N)

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”42279549

ReSum: Long-Horizon Web Agents Without Context Limits โ€ข Problem: ReAct hits context limits in long searches (32k tokens) โ€ข Solution: ReSum periodically compresses history โ†’ compact reasoning states โ€ข ReSumTool-30B: specialized summarizer extracts key evidence & gaps โ€ข ReSum-GRPO (RL): trains agents to adapt summaries into reasoning โ€ข +4.5% over ReAct baseline, +8.2% with RL across web search benchmarks abs: https://t.co/QRkfu2w6TN (7/7)

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”84675495

Hunyuan3D Studio tech report was just released! โ€ข Modular pipeline: Part-level gen, PolyGen, SeamGPT UV, PBR textures, auto-rigging โ€ข Game-engine ready (Unity/Unreal), optimized + production quality โ€ข Huge speedup: lowers barrier for 3D content creation https://t.co/eP05VYDHue

Media 1
๐Ÿ–ผ๏ธ Media
A
arankomatsuzaki
@arankomatsuzaki
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”95133948

AgentScaler: Towards General Agentic Intelligence โ€ข Scales environments for diverse, realistic tool-calling โ€ข Fully simulated envs = verifiable + scalable interactions โ€ข SOTA on ฯ„-bench, ฯ„ยฒ-bench, ACEBench โ€ข AgentScaler-30B matches 1T-parameter models with far fewer params abs: https://t.co/lqUkV0GbKS (4/N)

Media 1
๐Ÿ–ผ๏ธ Media
D
dmsobol
@dmsobol
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”90326566

This might be the most information dense blog I've ever written. Added "show me the math" section into MoE 101 p4 episode. We believe it fully models MoE training perf on both gpu and cerebras wse devices. https://t.co/uW6H78ZE56 ๐Ÿงต1/n

@CerebrasSystems โ€ข Tue Sep 16 22:33

๐Ÿงฎ Calling all Mathletes, this one is for you. Weโ€™ve been asked to show the math behind our MoE claims. So we did. Our analysis confirms: On GPUs, expert parallelism creates severe communication overheads that dwarf computation and make MoE training painfully slow. At Cerebras,

Media 1
๐Ÿ–ผ๏ธ Media
A
Aniket_d98
@Aniket_d98
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”44482353

๐ŸšจReasoning LLMs are eฬตfฬตfฬตeฬตcฬตtฬตiฬตvฬตeฬต ฬตyฬตeฬตtฬต inefficient! Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and latency. Metacognitive Reuse: turn recurring LLM reasoning into concise, reusable โ€œbehaviorsโ€. The model learns named skills from its own chains-of-thought and reuses them to think faster & cheaper. Arxiv ๐Ÿ”— - https://t.co/zA1gB4eYTG

๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 08, 2025
230d ago
๐Ÿ†”73895241

Join us today at 10 AM PT for our September Community Meeting! Weโ€™ll talk Mojo Vision & Roadmap, GSplat Kernels, and HyperLogLog. Details โ†’ https://t.co/Vcmd9SCZoa #Mojo #Modular

Media 1
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 08, 2025
229d ago
๐Ÿ†”93393209

New for your podcast feed - @clattner_llvm sits down with @yminsky of Signals and Threads to discuss how @Modular is designing the Mojo language to be easy to use while providing the precise level of control required to write state of the art kernels. https://t.co/c2tKcujV6m

Media 1
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 10, 2025
227d ago
๐Ÿ†”85320455

The video from the latest Modular Community Meeting is live! In this edition: ๐Ÿ“ธ Porting GSplat Kernels to Mojo ๐Ÿ”ข Datastructures for DB Development ๐Ÿ”ฅ Update on Mojo Vision and Roadmap, with Q&A https://t.co/XrsXtKaqYN

Media 1
๐Ÿ–ผ๏ธ Media
A
AIatAMD
@AIatAMD
๐Ÿ“…
Sep 10, 2025
227d ago
๐Ÿ†”26947856

From Python to productionโ€”without rewriting your stack. @Modularโ€™s Mojo gives AI devs the power of C++ with the flexibility of Python. And it all runs on AMD Instinct GPUs, AMD EPYC CPUs, and AMD ROCm software. ๐ŸŽฅ Watch @clattner_llvm's Tech Talk at AMD Advancing AI 2025: https://t.co/CcwvkPj0Xe

Media 2
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”37613217

ssshh... ๐Ÿคซ @AMD Mi355X... now available in nightlies. https://t.co/sWUHne1c1L

Media 1
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”40666429

Honored to be highlighted in @Oracle and @OracleCloud's incredible Q1 earnings results๐Ÿš€ Excited to keep crushing it for our customers and their most important AI workloads ๐Ÿ”ฅhttps://t.co/VpL3NMHVF4

Media 1
๐Ÿ–ผ๏ธ Media
J
JeffDean
@JeffDean
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”37130472

Very excited to see our Gemini models getting better and better at coding! An advanced version ofย Gemini 2.5 Deep Think atย the 2025 International Collegiate Programming Contest (ICPC) World Finals achieved gold-medal level performance! ๐ŸŽ‰ https://t.co/yCQKOagnkm

Media 1
๐Ÿ–ผ๏ธ Media
Q
quocleix
@quocleix
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”87904855

(1/3) Thrilled to announce a new Gemini breakthrough! Building on our success at IMO this year, an advanced version of Gemini Deep Think achieved gold-medal level performance at the ICPC 2025 World Finals - one of the worldโ€™s leading competitive programming competitions. https://t.co/kDO4BkfqCP

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Sep 10, 2025
228d ago
๐Ÿ†”63387435

https://t.co/4TbCAYJ1fh

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Sep 10, 2025
228d ago
๐Ÿ†”42685970

https://t.co/iXvuIlHH16

Media 1
๐Ÿ–ผ๏ธ Media
P
peymankh
@peymankh
๐Ÿ“…
Sep 10, 2025
228d ago
๐Ÿ†”13428209

Perplexity iOS app now #11 on the overall apps in the US AppStore https://t.co/Jp2fkWOUJj

Media 1
๐Ÿ–ผ๏ธ Media
A
abcdabcd987
@abcdabcd987
๐Ÿ“…
Sep 10, 2025
227d ago
๐Ÿ†”94891677

1.5 seconds is long enough to transfer model weights from training nodes to RL rollout nodes (as opposed to 100s). Here's the full story of how I made it (not just presenting the solution): https://t.co/6zaFAeNICT https://t.co/PAUqY43epH

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Sep 15, 2025
223d ago
๐Ÿ†”75768704

Actually, fastest growing app on both App Store and Play Store https://t.co/bECVY7QiI5

@AravSrinivas โ€ข Mon Sep 15 05:59

Perplexity is the fastest growing GenAI app on Android ๐Ÿ“ˆ

Media 1
๐Ÿ–ผ๏ธ Media
P
perplexity_ai
@perplexity_ai
๐Ÿ“…
Sep 16, 2025
222d ago
๐Ÿ†”86291895

Perplexity Pro users can now connect their email, calendar, Notion, and GitHub to Perplexity. Enterprise Pro users can also connect Linear and Outlook. https://t.co/g15wDrueBU

๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”23866285

1Password is available natively on Comet to enable secure browsing https://t.co/kE7FiHLVAK

Media 1
๐Ÿ–ผ๏ธ Media
J
Jordan_W_Taylor
@Jordan_W_Taylor
๐Ÿ“…
Sep 16, 2025
222d ago
๐Ÿ†”59529288

The original productivity drive. The modern world wasn't unlocked by the spinning jenny and the steam engine: It started much earlier than that. The foundation of all modernity is food production. https://t.co/xpbQLe32Te

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 16, 2025
222d ago
๐Ÿ†”48051888

Has there been any public documentation or discussion of Claude's "skills" based approach to handling specialized tasks? Very "I know kung fu," but with the AI as Neo. https://t.co/MQmOi6ycjh

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 15, 2025
222d ago
๐Ÿ†”39359061

Paper argues that diminishing returns to AI scale are an illusion Economic value comes from completing long projects, not single questions. And accuracy drives how long a project AI does: small gains compound exponentially! Reasoners are much more accurate, with big impacts. https://t.co/aLBAO6OJvc

Media 1Media 2
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”27697950

Reasoning models (apparently without tool use) scored #1 (OpenAI) & tied for #2 (Google) in the International Collegiate Programming Contest Its been one year since reasoners were first announced, it is genuinely surprising how good they have gotten at hard problems, so quickly https://t.co/mvMLyWdVC7

Media 1Media 2
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”23501452

A third of American adults use AI โ€œmany times a day to almost constantlyโ€ & another third several times a week. I canโ€™t usefully add much to discussions of valuation bubbles, but if โ€œbubbleโ€ means a disappointing technology that is overhyped & not useful, that doesnโ€™t match data https://t.co/1OMEyNoS8A

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”25642978

Though attitudes towards AI are all over the mapโ€ฆ https://t.co/Q36SbLubfL

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
S
stevesi
@stevesi
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”27944355

@emollick Pew Research 1999 https://t.co/BuShtqZGId https://t.co/K8IjJbP1Wy

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”69329718

The jaggedness of AI remains even as models have rapidly come to exceed human abilities in many of the hardest timed math & science contests. Yet there is much less progress on good puns. True AGI would be figure out our limits in more than calculus (sorry, but also seriously). https://t.co/tatB741YF2

Media 1Media 2
๐Ÿ–ผ๏ธ Media
R
rohanpaul_ai
@rohanpaul_ai
๐Ÿ“…
Sep 17, 2025
221d ago
๐Ÿ†”78743973

Hidden text inside PDFs can secretly change how LLMs write peer reviews, making the review scores artificially higher or lower In tests, some models gave 100% accept with a positive hidden prompt, and 0% with a negative one. The setup imitates a rushed reviewer, using 1000 ICLR 2024 papers and copy paste style LLM reviews. Each PDF is turned into Markdown, so white on white or tiny font text becomes part of what the model reads. The models fill a fixed review form with set score choices, and the prompt is either neutral, push high, or push low. A positive injection moves scores into accept bins, a negative one pushes them down, and even neutral prompts accept far more than the human baseline of 43%. Models that seemed resistant often ignored the rules and output illegal scores like 4, so their text was unusable for copy paste. This happens because conversion keeps invisible text in the input stream, and a partial fix is to parse the PDF as images first. ---- Paper โ€“ arxiv. org/abs/2509.10248v2 Paper Title: "Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications"

Media 1
๐Ÿ–ผ๏ธ Media