Your curated collection of saved posts and media
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
๐ New paper: MambaโTransformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. โ Only 25% of visual tokens ๐ 3.8โ4.2ร faster prefilling (TTFT) ๐ฏ Near-baseline accuracy (can exceed baseline with light finetuning) https://t.co/CJaCktyWCt
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Claude / Codex also have an easier time writing some components of FA4 thanks to the fast compile time. I got Claude to debug a deadlock when we first implemented 2CTA fwd. It ran autonomously overnight for 6 hours, figured out part of the fix, but then went down a rabbit hole convincing itself that the compiler is broken (so very human ๐). After 6 hours, from Claudeโs partial fix, I was able to fix the hang in 10 mins. More details here: https://t.co/ipGhC9FzET Iโm hoping FA5 will be written completely by AI
FA4 now available in lm-engine: https://t.co/n47TEinAfG 13.4% end-to-end speedup for Llama 8B training on 4x GB200s (1 node) ๐๐๐ 1005.55 TFLOPs for SDPA vs 1140.73 for FA4 (BF16 precision) @tedzadouri @ultraproduct @__tensorcore__ @tri_dao cooked Thanks to @bharatrunwal2 for running the experiment!
FA4 now available in lm-engine: https://t.co/n47TEinAfG 13.4% end-to-end speedup for Llama 8B training on 4x GB200s (1 node) ๐๐๐ 1005.55 TFLOPs for SDPA vs 1140.73 for FA4 (BF16 precision) @tedzadouri @ultraproduct @__tensorcore__ @tri_dao cooked Thanks to @bharatrunwal2 for running the experiment!
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread โ #ainativeconf #ainativecloud https://t.co/XXIXMRRiLe
Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread โ #ainativeconf #ainativecloud https://t.co/XXIXMRRiLe
@FPLGOAT7 I got lucky, sold Dango, sold Haaland. Tarkowski did it for me. https://t.co/BUmAWBP0W7
@yehiael22 @FPL_Harry Same here https://t.co/2gZdylxMf1
@yehiael22 @FPL_Harry Same here https://t.co/2gZdylxMf1
Introducing the Google Workspace CLI: https://t.co/8yWtbxiVPp - built for humans and agents. Google Drive, Gmail, Calendar, and every Workspace API. 40+ agent skills included.
โ ๏ธ WARNING: THIS PRODUCT MAY CONTAIN SHELLFISH ๐ฆ https://t.co/zJ6n2auo6B
Do you want to demo your project at the Meta booth during GTC? ๐ Join @Meta and @nvidia, in partnership with CV, for a full-day hackathon at @SHACK15sf, writing high-performance GPU kernels with Helion, PyTorch's new kernel authoring DSL that delivers higher performance in fewer lines of code with autotuning. ๐ March 14th โ Right before NVIDIA GTC. The perfect warm-up. ๐ Prizes & perks: > Nvidia GPUs and Nvidia DGX Spark > Demo your project at the Meta booth during GTC > GTC conference passes > Ray-Ban Meta glasses > Mentoring from Meta AI researchers & NVIDIA engineers ๐ Fully in-person | Teams of up to 4 | Rolling review, limited spots Register below ๐
How can we securelty contain #AI?. In this live discussion, experts will explore why traditional container isolation falls short for agent-based systems & what changes when agents have persistent memory, filesystem access, GPUs, or external execution authority https://t.co/qi4Mw97DPo
Sakana AIใงใๆๅ ็ซฏAIใฎใ็คพไผๅฎ่ฃ ใใๅ ้ใใใพใใใ๏ผ๐ ๅบ็ค็ ็ฉถใจใใญใใฏใใฎๆถใๆฉใจใชใ Applied Research Engineer ใ็ตถ่ณๅ้ไธญใงใ๏ผไธ็ใใใใฏใฉในใฎๆ่กใซ่งฆใใชใใใๆฌกไธไปฃใฎใฝใชใฅใผใทใงใณใ่ชใใฎๆใงๅตใไธใใ็ฑใใใธใทใงใณใงใ๐๐จ โผ่ฉณ็ดฐใฏใใกใ https://t.co/eQ7e0rIOmg https://t.co/7Mx8h3JScP

Sakana AIใงไธ็ทใซๅใApplied Research Engineerใ็ตถ่ณๅ้ไธญใงใ๏ผ๐๐จ https://t.co/FuEoI2xrzS
Sakana AIใฎใ็คพไผๅฎ่ฃ ใใใใๅ ้ใใใใใใๆฐใใซRecruiterใๅ้ใใพใ๏ผ๐ https://t.co/qiY3upbBAV ๆฐ่ฆใใญใใฏใ้็บใ้ฒใไธญใๆช่ธใฎใฝใชใฅใผใทใงใณใๅตใEngineerใPMใ็ฉๆฅตๆก็จไธญใงใใใใคใฌใฏใใฝใผใทใณใฐใ่ปธใซๅ่ฃ่ ใจ็ดๆฅๅใๅใใใณใขใใผใ ใ็ตๆใใฆใใใ ใ้่ฆใชๅฝนๅฒใงใใ ใใฏใใญใธใผๆฅญ็ใงใฎๆก็จ็ต้จใๆดปใใใ็งใใกใฎๆ้ทใจใณใธใณใฎไธญๅฟใๆ ใฃใฆใใ ใใๆนใใๅพ ใกใใฆใใพใ๏ผ๐

โฐ Clockโs ticking! Registration for #PyTorchCon Europe goes up โฌ100 after 20 March. Also less than a week left to RSVP for onsite child care ๐ถ 7โ8 April | Paris ๐ Register: https://t.co/53JVfAmOap ๐ถ Child care info: https://t.co/OnRpL1AQKa https://t.co/okpcmT7qu3
Applied Research Engineer ๐ https://t.co/FuEoI2xrzS
We are pleased to announce a strategic investment from Citi! https://t.co/SQp1HEGzEp This milestone marks Citiโs first such investment in a Japanese company. The investment reflects their high regard for our advanced technical capabilities and our proven track record of implementing AI within the financial sector. We are focused on developing new enterprise-grade AI solutions using nature-inspired intelligence. Our goal has consistently been to bridge the gap between cutting-edge research and practical business applications. Building on our work developing highly specialized AI agents for financial domains, we are ready to take the next step. Through this partnership, we aim to accelerate our international expansion and drive innovation in global financial services, originating from Japan.
