Your curated collection of saved posts and media
Thanks, AK @_akhaliq !!! We release the Gradio Demo and Code here: Code: https://t.co/F5K6iWzN7m Demo: https://t.co/z5LoWYkWOL
RealWonder Real-Time Physical Action-Conditioned Video Generation paper: https://t.co/U8RM31zcVD https://t.co/GEMCJ14Yda
Our full pipeline and real-time generation code are available here! https://t.co/oXJ9R2i9wA
Our full pipeline and real-time generation code are available here! https://t.co/oXJ9R2i9wA
Thanks again for sharing! @_akhaliq π₯° The paper, code, @Gradio demo are all released! π₯ Please have a try! π Page: https://t.co/pW4CpKHKNj https://t.co/jNK3dUr1XJ
Introducing the Synthetic Data Playbook: We generated over a 1T tokens in 90 experiments with 100k+ GPUh to figure out what makes good synthetic data and how to generate it at scale https://t.co/iaHuodWVAa https://t.co/48gBUYE6R2
(I still have the bigger cousin running on prod nanochat, working a bigger model and on 8XH100, which looks like this now. I'll just leave this running for a while...) https://t.co/aWya9hpUMl
π’ Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - https://t.co/DcCG3zlN8p
While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday. There are two size flavors,Β Sarvam 30BΒ andΒ Sarvam 105BΒ model (both reasoning models). Interestingly, the smaller 30B model uses βclassicβ Grouped Query Attention (GQA), whereas the larger 105B variant switched to DeepSeek-style Multi-Head Latent Attention (MLA). As I wrote about in my analyses before, both are popular attention variants to reduce KV cache size (the longer the context, the more you save compared to regular attention). MLA is more complicated to implement, but it can give you better modeling performance if we go by the ablation studies in theΒ 2024 DeepSeek V2 paperΒ (as far as I know, this is still the most recent apples-to-apples comparison). Speaking of modeling performance, the 105B model is on par with LLMs of similar size: gpt-oss 120B and Qwen3-Next (80B). Sarvam is better on some tasks and worse on others, but roughly the same on average. Itβs not the strongest coder in SWE-Bench Verified terms, but it is surprisingly good at agentic reasoning and task completion (Tau2). Itβs even better than Deepseek R1 0528. Considering the smaller Sarvam 30B, the perhaps most comparable model to the 30B model is Nemotron 3 Nano 30B, which is slightly ahead in coding per SWE-Bench Verified and agentic reasoning (Tau2) but slightly worse in some other aspects (Live Code Bench v6, BrowseComp). Unfortunately, Qwen3-30B-A3B is missing in the benchmarks, which is, as far as I know, is the most popular model of that size class. Interestingly, though, the Sarvam team compared their 30B model to Qwen3-30B-A3B on a computational performance analysis, where they found that Sarvam gets 20-40% more tokens/sec throughput compared to Qwen3 due to code and kernel optimizations. Anyways, one thing that is not captured by the benchmarks above is Sarvamβs good performance on Indian languages. According to a judge model, the Sarvam team found that their model is preferred 90% of the time compared to others when it comes to Indian texts. (Since they built and trained the tokenizer from scratch as well, Sarvam also comes with a 4 times higher token efficiency on Indian languages.
@Shubham13596 I'd say agent contexts with longer-running reasoning tasks (see last row) https://t.co/MJMMYF0bmD
@Shubham13596 Regarding Google's models, they didn't compare to Gemini, but Gemma was actually the 2nd best in the multi-lingual performance https://t.co/kMTE80oksj

@HarveenChadha Ohhh, I checked the HTML source and found it! I had no idea that you have to horizontally scroll the table π. Tbh this is a bit hidden and potentially confusing. (No need to change the name, it's more of a layout issue) https://t.co/9KymL1J1Ok
caught 6 awesome demos at the Gemini 3 Hackathon in singaporeπΈπ¬ today and the energy was unreal. big shoutout to @65labslah @cerebral_valley folks and @vadiamit, @SaadGH for putting this together π the challenge? "bring something new to life." no basic RAG apps, no chatbots, no recycled ideas. build something nobody's ever built before. here's what the top 6 teams cooked up π§΅
The boys are back. https://t.co/M8xFFynhVE
The boys are back. https://t.co/M8xFFynhVE
As Harold Bloom has been having a moment on here (let it never be said that these things don't go in cycles), here's Kermode in March 1976--explaining to Bob Silvers why he won't, after all, review Bloom's Poetry and Repression for the NYRB. https://t.co/T0LwfsvjnA

The entire Middle East is at war... Syria Subplot: https://t.co/CucXa6lZlb
The entire Middle East is at war... Syria Subplot: https://t.co/CucXa6lZlb
this story is WILD. anthropic ceo dario amodei says he cannot say for sure if claude is circumcised https://t.co/N8chxo0Fjb

this story is WILD. anthropic ceo dario amodei says he cannot say for sure if claude is circumcised https://t.co/N8chxo0Fjb

@TheWapplehouse https://t.co/X3lrZ8cbxS
@TheWapplehouse https://t.co/X3lrZ8cbxS
https://t.co/tWGVwmuNjT
https://t.co/tWGVwmuNjT