Your curated collection of saved posts and media
Also, come on OpenAI. If you want an automated AI researcher, this needs to start going up, not down. https://t.co/0ZQ4UhdNyu
Also, come on OpenAI. If you want an automated AI researcher, this needs to start going up, not down. https://t.co/0ZQ4UhdNyu
Also, come on OpenAI. If you want an automated AI researcher, this needs to start going up, not down. https://t.co/0ZQ4UhdNyu
Google kicks out legit paying Antigravity customers for high usage [to solve their problem of not enough capacity]; does not tell them; does not offer refunds or any way to refund the service. This comic by @lmanul is so spot on with regards to Google (and Amazon!) https://t.co/x0LToYbOHX
Google kicks out legit paying Antigravity customers for high usage [to solve their problem of not enough capacity]; does not tell them; does not offer refunds or any way to refund the service. This comic by @lmanul is so spot on with regards to Google (and Amazon!) https://t.co/x0LToYbOHX
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
π New paper: MambaβTransformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. β Only 25% of visual tokens π 3.8β4.2Γ faster prefilling (TTFT) π― Near-baseline accuracy (can exceed baseline with light finetuning) https://t.co/CJaCktyWCt
Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/
Claude / Codex also have an easier time writing some components of FA4 thanks to the fast compile time. I got Claude to debug a deadlock when we first implemented 2CTA fwd. It ran autonomously overnight for 6 hours, figured out part of the fix, but then went down a rabbit hole convincing itself that the compiler is broken (so very human π). After 6 hours, from Claudeβs partial fix, I was able to fix the hang in 10 mins. More details here: https://t.co/ipGhC9FzET Iβm hoping FA5 will be written completely by AI
FA4 now available in lm-engine: https://t.co/n47TEinAfG 13.4% end-to-end speedup for Llama 8B training on 4x GB200s (1 node) πππ 1005.55 TFLOPs for SDPA vs 1140.73 for FA4 (BF16 precision) @tedzadouri @ultraproduct @__tensorcore__ @tri_dao cooked Thanks to @bharatrunwal2 for running the experiment!
FA4 now available in lm-engine: https://t.co/n47TEinAfG 13.4% end-to-end speedup for Llama 8B training on 4x GB200s (1 node) πππ 1005.55 TFLOPs for SDPA vs 1140.73 for FA4 (BF16 precision) @tedzadouri @ultraproduct @__tensorcore__ @tri_dao cooked Thanks to @bharatrunwal2 for running the experiment!
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread β #ainativeconf #ainativecloud https://t.co/XXIXMRRiLe
Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread β #ainativeconf #ainativecloud https://t.co/XXIXMRRiLe
@FPLGOAT7 I got lucky, sold Dango, sold Haaland. Tarkowski did it for me. https://t.co/BUmAWBP0W7
@yehiael22 @FPL_Harry Same here https://t.co/2gZdylxMf1
@yehiael22 @FPL_Harry Same here https://t.co/2gZdylxMf1
Introducing the Google Workspace CLI: https://t.co/8yWtbxiVPp - built for humans and agents. Google Drive, Gmail, Calendar, and every Workspace API. 40+ agent skills included.
β οΈ WARNING: THIS PRODUCT MAY CONTAIN SHELLFISH π¦ https://t.co/zJ6n2auo6B
Do you want to demo your project at the Meta booth during GTC? π Join @Meta and @nvidia, in partnership with CV, for a full-day hackathon at @SHACK15sf, writing high-performance GPU kernels with Helion, PyTorch's new kernel authoring DSL that delivers higher performance in fewer lines of code with autotuning. π March 14th β Right before NVIDIA GTC. The perfect warm-up. π Prizes & perks: > Nvidia GPUs and Nvidia DGX Spark > Demo your project at the Meta booth during GTC > GTC conference passes > Ray-Ban Meta glasses > Mentoring from Meta AI researchers & NVIDIA engineers π Fully in-person | Teams of up to 4 | Rolling review, limited spots Register below π
How can we securelty contain #AI?. In this live discussion, experts will explore why traditional container isolation falls short for agent-based systems & what changes when agents have persistent memory, filesystem access, GPUs, or external execution authority https://t.co/qi4Mw97DPo