Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
R
runwayml
@runwayml
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”49997672

Today, we're launching the Runway Game Worlds Beta. Over the last few months, we have been working on research and products that are moving us closer toward a future where you will be able to explore any character, story or world in real time. While generating the pixels of these experiences is one aspect of this new frontier, another is the need for novel mechanics and interfaces. From how stories unfold to how your choices affect the worlds youโ€™re simulating. Todayโ€™s beta release marks a first step in this direction, learn more below. (1/5)

๐Ÿ–ผ๏ธ Media
D
deedydas
@deedydas
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”56637753

๐Ÿšจ China's DeepSeek is at it again, with the best open source model drop, V3.1 scoring 66% on SWE-Bench and costing.. $0.56/M input (2x cheaper than GPT-5) $1.68/M output (6x cheaper) The whale is back. https://t.co/NTirIMpJOu

Media 1
๐Ÿ–ผ๏ธ Media
_
_akhaliq
@_akhaliq
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”57569836

DeepSeek-V3.1 is now available in anycoder Hybrid inference: Think & Non-Think โ€” one model, two modes Stronger agent skills: Post-training boosts tool use and multi-step agent tasks https://t.co/gv4FEdJMuY

Media 1
๐Ÿ–ผ๏ธ Media
_
_akhaliq
@_akhaliq
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”00627863

app: https://t.co/esPDyHDu94

Media 1
๐Ÿ–ผ๏ธ Media
_
_akhaliq
@_akhaliq
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”24362966

@deepseek_ai Now Available and default model in anycoder: https://t.co/esPDyHDu94

Media 1
๐Ÿ–ผ๏ธ Media
_
_akhaliq
@_akhaliq
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”33009165

DeepSeek-V3.1 ball bouncing inside a spinning hexagon with @FireworksAI_HQ in anycoder, one shot https://t.co/67rVlpictJ

๐Ÿ–ผ๏ธ Media
_
_akhaliq
@_akhaliq
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”50677692

app: https://t.co/esPDyHDu94

Media 1
๐Ÿ–ผ๏ธ Media
B
bryancsk
@bryancsk
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”92678033

I love them so much https://t.co/grkPvVpW1P

Media 1
๐Ÿ–ผ๏ธ Media
J
jeremyphoward
@jeremyphoward
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”51156623

@zcbenz Yeah that's how they used to be implemented, using im2col(). It's also how we started teaching implementing them in https://t.co/GEOZunWoXj.

Media 1
๐Ÿ–ผ๏ธ Media
R
reach_vb
@reach_vb
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”02672396

DEEPSEEK V3.1 INSTRUCT IS OUTTT!! https://t.co/CpKlbSYn8O

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
R
Vaibhav (VB) Srivastav
@reach_vb
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”02672396

DEEPSEEK V3.1 INSTRUCT IS OUTTT!! https://t.co/CpKlbSYn8O

Media 1
โค๏ธ255
likes
๐Ÿ”30
retweets
๐Ÿ–ผ๏ธ Media
M
multimodalart
@multimodalart
๐Ÿ“…
Aug 20, 2025
251d ago
๐Ÿ†”29092568

Qwen Image Edit works too well with lightx2v LoRA to run with just 8 and 4 steps, wtf? in my experience, 8 steps keeps the quality of the edits at the same level as the original model, at a 12x speedup ๐Ÿ’จ (ofc i built a demo for it) https://t.co/jrO8kdoJ48

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”HamelHusain retweeted
H
Hamel Husain
@HamelHusain
๐Ÿ“…
Aug 13, 2025
258d ago
๐Ÿ†”40829295

The beatings (free books) will continue until everyone looks at their data: 1. LLM Evals FAQ: https://t.co/BzEHwvobz5 2. Beyond Naive RAG: Practical Advanced Methods https://t.co/x2870kdHoZ

Media 1
โค๏ธ247
likes
๐Ÿ”43
retweets
๐Ÿ–ผ๏ธ Media
R
rungalileo
@rungalileo
๐Ÿ“…
Aug 20, 2025
251d ago
๐Ÿ†”75084774

Generic evals and metrics donโ€™t reflect real-world failure modes. You need customized, domain-specific evals explicitly tailored for your application or agents for true reliability. On this weekโ€™s Chain of Thought podcast, AI consultant and evaluation expert @HamelHusain breaks down why most teams experience โ€œthe illusion of monitoringโ€ when using generic metrics that donโ€™t account for real production failures. Instead of chasing dashboards, Hamel argues for: โ€“ Manual error analysis grounded in real user logs โ€“ Custom metrics aligned to product risks, not vanity โ€“ Iterative feedback loops that surface failure modes over time Learn more about creating customized evals tailored to your domain-specific risks in this weekโ€™s episode with Hamel, our COO and Co-founder @atinsanyal, and host @ConorBronsdon ๐Ÿ‘‡

๐Ÿ–ผ๏ธ Media
G
gojira
@gojira
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”72072538

Loved the "AI Evals for Engineers & PMs" course https://t.co/f5WVix8Qs0 by @HamelHusain and @sh_reya. It takes โ€œlook at your dataโ€ from slogan to method: inspect interaction traces closely, build error taxonomies, rigorously tune automated evals, then optimize your prompts & pipeline. Highly recommended!

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”HamelHusain retweeted
G
Keiji Kanazawa
@gojira
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”72072538

Loved the "AI Evals for Engineers & PMs" course https://t.co/f5WVix8Qs0 by @HamelHusain and @sh_reya. It takes โ€œlook at your dataโ€ from slogan to method: inspect interaction traces closely, build error taxonomies, rigorously tune automated evals, then optimize your prompts & pipeline. Highly recommended!

Media 1
โค๏ธ7
likes
๐Ÿ”2
retweets
๐Ÿ–ผ๏ธ Media
๐Ÿ”johnrobinsn retweeted
L
@levelsio
@levelsio
๐Ÿ“…
Aug 19, 2025
252d ago
๐Ÿ†”71108487

Oh yes! This is exactly what I wanted Audacity ported to web So cool: https://t.co/F5BfPTc2w9 https://t.co/p7SG1CIStb

Media 1Media 2
โค๏ธ2,756
likes
๐Ÿ”198
retweets
๐Ÿ–ผ๏ธ Media
J
johnrobinsn
@johnrobinsn
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”10727397

Hunyuan-GameCraft builds on the Hunyuan-Video model adding the ability to integrate egocentric and third-person perspective camera movements into generated videos. An open weight "Genie-like" model! My first experiments on a single NVidia 5090 (32G). Leverages a history-conditioned training strategy for long-term consistency.

๐Ÿ–ผ๏ธ Media
L
LiorOnAI
@LiorOnAI
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”02361866

The website of Alec Radford. Inventor of GPT. https://t.co/HkYz6cnYiG

Media 1
๐Ÿ–ผ๏ธ Media
L
LiorOnAI
@LiorOnAI
๐Ÿ“…
Aug 20, 2025
251d ago
๐Ÿ†”36810733

Yann LeCun: "If you are interested in human-level ai, don't work on LLMs" https://t.co/lX0dXWTemh

Media 1
๐Ÿ–ผ๏ธ Media
D
DanielLurie
@DanielLurie
๐Ÿ“…
Aug 21, 2025
251d ago
๐Ÿ†”36998969

Small businesses, listen up! Just in time for the San Francisco โ€œsummer,โ€ you no longer have to pay to put tables and chairs on the sidewalk. A quick, free registration is all it takes, and youโ€™re good to go. Through our permit reform initiative, PermitSF, my administration is cutting red tape and making it easier for small businesses to thrive.

๐Ÿ–ผ๏ธ Media
Z
Zai_org
@Zai_org
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”06891613

Introducing ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. https://t.co/86XVYd9xCb ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”25829098

Overview This work proposes training single models to natively behave like multiโ€‘agent systems, coordinating โ€œroleโ€‘playingโ€ and tool agents endโ€‘toโ€‘end. They distill strong multiโ€‘agent frameworks into CoA trajectories, then optimize with agentic RL on verifiable tasks. https://t.co/hR4PTUEQpa

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”44077988

Paradigm shift CoA generalizes ReAct/TIR by dynamically activating multiple roles and tools within one model, preserving a single coherent state while cutting interโ€‘agent chatter. https://t.co/uYeDRhPqKl

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”60526667

Training recipe 1) Multiโ€‘agent distillation turns successful OAgents runs into CoAโ€‘formatted traces with planning, tool calls, observations, and reflection, filtered for difficulty and quality; 2) Agentic RL targets hard queries where tools matter, with simple binary rewards via LLMโ€‘asโ€‘Judge for web tasks and executable or exactโ€‘match rewards for code/math.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”14142412

Training framework Stage 1 (SFT) โ€“ uses reformatted ReAct-style data (both short and long reasoning chains) to give the model a solid โ€œcold start.โ€ Progressive filtering ensures only high-quality trajectories are used, emphasizing coherence, tool efficiency, and reflective reasoning. Stage 2 (RL) โ€“ builds on the SFT base. The model performs tool-aware rollouts on unused QA pairs. Rewards are computed from task correctness (via LLM-as-Judge, exact match, or test cases), and policy updates improve tool coordination and reasoning robustness.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”72958894

Main results With Qwenโ€‘2.5โ€‘32B backbones, Agent Foundation Models (AFM) sets new pass@1 on GAIA 55.3, BrowseComp 11.1, HLE 18.0, and leads WebWalker 63.0; it also tops multiโ€‘hop QA suites across sizes. https://t.co/60HHo1wUj2

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”34712733

Code + math AFMโ€‘RLโ€‘32B reaches AIME25 59.8, MATH500 94.6, OlympiadBench 72.1, and LiveCodeBench v5 47.9, beating prior TIR methods including ReTool and Reveal. https://t.co/Rg9RWIMSVW

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”10186234

Efficiency and robustness Compared to traditional multiโ€‘agent systems, AFM cuts inference tokens and tool calls substantially. The paper reports an 84.6% token cost reduction while staying competitive. It also generalizes to unseen tools better when strict formatting is required.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”52245839

Testโ€‘time scaling Bestโ€‘ofโ€‘3 and pass@3 markedly boost AFM, e.g., GAIA 69.9 and HLE 33.2, closing the gap with larger proprietary agent stacks. Overall, Chain-of-Agents enables training single-agent foundation models that natively simulate multi-agent collaboration, combining multi-agent distillation with agentic RL to achieve state-of-the-art results Project + Code + Models: https://t.co/yeI0JTO6ok Paper: https://t.co/vxzeM4x1dy

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Aug 20, 2025
252d ago
๐Ÿ†”61853995

Chain-of-Agents Interesting idea to train a single model with the capabilities of a multi-agent system. 84.6% reduction in inference cost! Distillation and Agentic RL are no joke! Here are my notes: https://t.co/cwHNROfoR2

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Aug 20, 2025
251d ago
๐Ÿ†”79062809

๐Ÿš€ New case study: @StackAI_HQ + LlamaCloud โœ”๏ธ 1M+ docs processed with high-accuracy parsing โœ”๏ธ Faster, smarter enterprise document agents โœ”๏ธ Trusted by finance, insurance & more Full story ๐Ÿ‘‰ https://t.co/r6NFPZJVFs #GenAI #AIagents #EnterpriseAI

Media 1
๐Ÿ–ผ๏ธ Media