Your curated collection of saved posts and media

Showing 24 posts Β· last 7 days Β· quality filtered
πŸ”_akhaliq retweeted
S
Shenghai Yuan
@shenghai_y55451
πŸ“…
Mar 06, 2026
5d ago
πŸ†”93620262

Thanks, AK @_akhaliq !!! We release the Gradio Demo and Code here: Code: https://t.co/F5K6iWzN7m Demo: https://t.co/z5LoWYkWOL

Media 1
❀️24
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Mar 07, 2026
3d ago
πŸ†”55191620

RealWonder Real-Time Physical Action-Conditioned Video Generation paper: https://t.co/U8RM31zcVD https://t.co/GEMCJ14Yda

Media 2
πŸ–ΌοΈ Media
Z
ziyuchen_
@ziyuchen_
πŸ“…
Mar 07, 2026
3d ago
πŸ†”77630159

Our full pipeline and real-time generation code are available here! https://t.co/oXJ9R2i9wA

Media 1
πŸ–ΌοΈ Media
πŸ”_akhaliq retweeted
Z
Ziyu Chen
@ziyuchen_
πŸ“…
Mar 07, 2026
3d ago
πŸ†”77630159

Our full pipeline and real-time generation code are available here! https://t.co/oXJ9R2i9wA

Media 1
❀️26
likes
πŸ”7
retweets
πŸ–ΌοΈ Media
H
haodongli00
@haodongli00
πŸ“…
Mar 08, 2026
3d ago
πŸ†”18864991

Thanks again for sharing! @_akhaliq πŸ₯° The paper, code, @Gradio demo are all released! πŸ”₯ Please have a try! πŸš€ Page: https://t.co/pW4CpKHKNj https://t.co/jNK3dUr1XJ

πŸ–ΌοΈ Media
J
joelniklaus
@joelniklaus
πŸ“…
Mar 08, 2026
3d ago
πŸ†”85585544

Introducing the Synthetic Data Playbook: We generated over a 1T tokens in 90 experiments with 100k+ GPUh to figure out what makes good synthetic data and how to generate it at scale https://t.co/iaHuodWVAa https://t.co/48gBUYE6R2

Media 1
πŸ–ΌοΈ Media
K
karpathy
@karpathy
πŸ“…
Mar 07, 2026
3d ago
πŸ†”91536982

(I still have the bigger cousin running on prod nanochat, working a bigger model and on 8XH100, which looks like this now. I'll just leave this running for a while...) https://t.co/aWya9hpUMl

Media 1
πŸ–ΌοΈ Media
P
pratykumar
@pratykumar
πŸ“…
Mar 06, 2026
4d ago
πŸ†”24431356

πŸ“’ Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - https://t.co/DcCG3zlN8p

Media 1
πŸ–ΌοΈ Media
R
rasbt
@rasbt
πŸ“…
Mar 07, 2026
3d ago
πŸ†”87037906

While waiting for DeepSeek V4 we got two very strong open-weight LLMs from India yesterday. There are two size flavors,Β Sarvam 30BΒ andΒ Sarvam 105BΒ model (both reasoning models). Interestingly, the smaller 30B model uses β€œclassic” Grouped Query Attention (GQA), whereas the larger 105B variant switched to DeepSeek-style Multi-Head Latent Attention (MLA). As I wrote about in my analyses before, both are popular attention variants to reduce KV cache size (the longer the context, the more you save compared to regular attention). MLA is more complicated to implement, but it can give you better modeling performance if we go by the ablation studies in theΒ 2024 DeepSeek V2 paperΒ (as far as I know, this is still the most recent apples-to-apples comparison). Speaking of modeling performance, the 105B model is on par with LLMs of similar size: gpt-oss 120B and Qwen3-Next (80B). Sarvam is better on some tasks and worse on others, but roughly the same on average. It’s not the strongest coder in SWE-Bench Verified terms, but it is surprisingly good at agentic reasoning and task completion (Tau2). It’s even better than Deepseek R1 0528. Considering the smaller Sarvam 30B, the perhaps most comparable model to the 30B model is Nemotron 3 Nano 30B, which is slightly ahead in coding per SWE-Bench Verified and agentic reasoning (Tau2) but slightly worse in some other aspects (Live Code Bench v6, BrowseComp). Unfortunately, Qwen3-30B-A3B is missing in the benchmarks, which is, as far as I know, is the most popular model of that size class. Interestingly, though, the Sarvam team compared their 30B model to Qwen3-30B-A3B on a computational performance analysis, where they found that Sarvam gets 20-40% more tokens/sec throughput compared to Qwen3 due to code and kernel optimizations. Anyways, one thing that is not captured by the benchmarks above is Sarvam’s good performance on Indian languages. According to a judge model, the Sarvam team found that their model is preferred 90% of the time compared to others when it comes to Indian texts. (Since they built and trained the tokenizer from scratch as well, Sarvam also comes with a 4 times higher token efficiency on Indian languages.

Media 1
πŸ–ΌοΈ Media
R
rasbt
@rasbt
πŸ“…
Mar 07, 2026
3d ago
πŸ†”75938396

@Shubham13596 I'd say agent contexts with longer-running reasoning tasks (see last row) https://t.co/MJMMYF0bmD

Media 1
πŸ–ΌοΈ Media
R
rasbt
@rasbt
πŸ“…
Mar 07, 2026
3d ago
πŸ†”76979377

@Shubham13596 Regarding Google's models, they didn't compare to Gemini, but Gemma was actually the 2nd best in the multi-lingual performance https://t.co/kMTE80oksj

Media 1Media 2
πŸ–ΌοΈ Media
R
rasbt
@rasbt
πŸ“…
Mar 07, 2026
3d ago
πŸ†”76624306

@HarveenChadha Ohhh, I checked the HTML source and found it! I had no idea that you have to horizontally scroll the table πŸ˜†. Tbh this is a bit hidden and potentially confusing. (No need to change the name, it's more of a layout issue) https://t.co/9KymL1J1Ok

πŸ–ΌοΈ Media
B
brianchew
@brianchew
πŸ“…
Mar 07, 2026
3d ago
πŸ†”73079558

caught 6 awesome demos at the Gemini 3 Hackathon in singaporeπŸ‡ΈπŸ‡¬ today and the energy was unreal. big shoutout to @65labslah @cerebral_valley folks and @vadiamit, @SaadGH for putting this together πŸ™ the challenge? "bring something new to life." no basic RAG apps, no chatbots, no recycled ideas. build something nobody's ever built before. here's what the top 6 teams cooked up 🧡

Media 1
πŸ–ΌοΈ Media
D
DanielBaldwin
@DanielBaldwin
πŸ“…
Aug 14, 2025
208d ago
πŸ†”12046788

The boys are back. https://t.co/M8xFFynhVE

Media 1
πŸ–ΌοΈ Media
πŸ”youwouldntpost retweeted
D
DANIEL BALDWIN
@DanielBaldwin
πŸ“…
Aug 14, 2025
208d ago
πŸ†”12046788

The boys are back. https://t.co/M8xFFynhVE

Media 1
❀️483
likes
πŸ”47
retweets
πŸ–ΌοΈ Media
P
profrhodrilewis
@profrhodrilewis
πŸ“…
Mar 07, 2026
3d ago
πŸ†”06890151

As Harold Bloom has been having a moment on here (let it never be said that these things don't go in cycles), here's Kermode in March 1976--explaining to Bob Silvers why he won't, after all, review Bloom's Poetry and Repression for the NYRB. https://t.co/T0LwfsvjnA

Media 1Media 2
πŸ–ΌοΈ Media
C
CigariIIo27
@CigariIIo27
πŸ“…
Mar 07, 2026
4d ago
πŸ†”53830457

The entire Middle East is at war... Syria Subplot: https://t.co/CucXa6lZlb

Media 1
πŸ–ΌοΈ Media
πŸ”youwouldntpost retweeted
C
Cigarillo27
@CigariIIo27
πŸ“…
Mar 07, 2026
4d ago
πŸ†”53830457

The entire Middle East is at war... Syria Subplot: https://t.co/CucXa6lZlb

Media 1
❀️56,633
likes
πŸ”4,344
retweets
πŸ–ΌοΈ Media
C
calebgamman
@calebgamman
πŸ“…
Mar 07, 2026
3d ago
πŸ†”68517013

this story is WILD. anthropic ceo dario amodei says he cannot say for sure if claude is circumcised https://t.co/N8chxo0Fjb

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”youwouldntpost retweeted
C
caleb gamman
@calebgamman
πŸ“…
Mar 07, 2026
3d ago
πŸ†”68517013

this story is WILD. anthropic ceo dario amodei says he cannot say for sure if claude is circumcised https://t.co/N8chxo0Fjb

Media 1Media 2
❀️8,350
likes
πŸ”449
retweets
πŸ–ΌοΈ Media
A
AdmiralHalo
@AdmiralHalo
πŸ“…
Mar 07, 2026
3d ago
πŸ†”41660757

@TheWapplehouse https://t.co/X3lrZ8cbxS

Media 1
πŸ–ΌοΈ Media
πŸ”youwouldntpost retweeted
A
Admiral Benghazi πŸ›³πŸš€πŸššπŸ”₯πŸ’―πŸŽ―πŸ‡±πŸ‡·πŸ‡±πŸ‡·πŸ‡±πŸ‡·πŸ‘½
@AdmiralHalo
πŸ“…
Mar 07, 2026
3d ago
πŸ†”41660757

@TheWapplehouse https://t.co/X3lrZ8cbxS

Media 1
❀️829
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
E
edgecone
@edgecone
πŸ“…
Mar 07, 2026
3d ago
πŸ†”00046714

https://t.co/tWGVwmuNjT

Media 1
πŸ–ΌοΈ Media
πŸ”youwouldntpost retweeted
E
dog
@edgecone
πŸ“…
Mar 07, 2026
3d ago
πŸ†”00046714

https://t.co/tWGVwmuNjT

Media 1
❀️415
likes
πŸ”32
retweets
πŸ–ΌοΈ Media