Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
N
NielsRogge
@NielsRogge
๐Ÿ“…
Aug 29, 2025
240d ago
๐Ÿ†”95079313

GLM-4.5 is beating Claude-4 Opus on the Berkeley Function Calling benchmark while costing 70x less https://t.co/eG8QZ3l9sE

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
N
Niels Rogge
@NielsRogge
๐Ÿ“…
Aug 29, 2025
240d ago
๐Ÿ†”95079313

GLM-4.5 is beating Claude-4 Opus on the Berkeley Function Calling benchmark while costing 70x less https://t.co/eG8QZ3l9sE

Media 1
โค๏ธ773
likes
๐Ÿ”64
retweets
๐Ÿ–ผ๏ธ Media
H
HuggingPapers
@HuggingPapers
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”52463563

ByteDance Seed and Stanford introduce Mixture of Contexts (MoC) for long video generation, tackling the memory bottleneck with a novel sparse attention routing module. It enables minute-long consistent videos with short-video cost. https://t.co/JHCSQ81FWJ

๐Ÿ–ผ๏ธ Media
E
eliebakouch
@eliebakouch
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”11204147

The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty. The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert. 1) New architecture > Layers have 2 Attention blocks and both FFN and MoE, that way you can overlap the 2 all-to-all coms. (also it's only 28 layers but you have to take into account the 2 attention blocks). > They add the zero-computational expert that tokens can choose and do nothing, kinda like a "sink" for easy tokens. > For load balancing, they have a dsv3-like aux loss free to set the average real/fake expert per token. They apply a decay schedule to this bias update. They also do loss balance control. 2) Scaling > They made changes to MLA/MoE to have variance alignment at init. The gains are pretty impressive in Figure 5, but i don't know to what extent this has impact later on. > Model growth init is pretty cool, they first train a 2x smaller model and then "when it's trained enough" (a bit unclear here how many B tokens) they init the final model by just stacking the layers of the smaller model. > They used @_katieeverett @Locchiu and al. paper to have hyperparameter transfer with SP instead of muP for the 2x smaller model ig. 3) Stability > They track Gradient Norm Ratio and cosine similarity between experts to adjust the weight of the load balancing loss (they recommend Gradient Norm Ratio <0.1). > To avoid large activations, they apply a z-loss to the hidden state, with a pretty small coef (another alternative to qk-clip/norm). > They set Adam epsilon to 1e-16 and show that you want it to be lower than the gradient RMS range. 4) Others > They train on 20T tokens for phase 1, "multiple T of tokens" for mid training on STEM/code data (70% of the mixture), 100B for long context extension without yarn (80B for 32k, 20B for 128k). The long context documents represent 25% of the mixture (not sure if it's % of documents or tokens, which changes a lot here). > Pre-training data pipeline is context extraction, quality filtering, dedup. > Nice appendix where they show they compare top_k needed for different benchmarks (higher MMLU with 8.32, lower GSM8K with 7.46). They also compare token allocation in deep/shallow layers. > They release two new benchmarks Meeseeks (multi-turn IF) and VitaBench (real-world business scenario). > Lots of details in the infra/inference with info on speculative decoding acceptance, quantization, deployment, kernel optimization, coms overlapping, etc. > List of the different relevent paper in thread ๐Ÿงต

Media 1
๐Ÿ–ผ๏ธ Media
M
multimodalart
@multimodalart
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”30178059

a mysterious new button appeared on the @huggingface Spaces Nano Banana app ๐Ÿ‘€ https://t.co/cY0nLznV8t

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
M
apolinario ๐ŸŒ
@multimodalart
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”30178059

a mysterious new button appeared on the @huggingface Spaces Nano Banana app ๐Ÿ‘€ https://t.co/cY0nLznV8t

Media 1
โค๏ธ205
likes
๐Ÿ”17
retweets
๐Ÿ–ผ๏ธ Media
A
AdinaYakup
@AdinaYakup
๐Ÿ“…
Sep 01, 2025
238d ago
๐Ÿ†”72042256

Hunyuan-MT-7B ๐Ÿ”ฅ open translation model released by @TencentHunyuan https://t.co/dTs5yBmkIi โœจ Supports 33 languages, including 5 ethnic minority languages in China ๐Ÿ‘€ โœจ Including a translation ensemble model: Chimera-7B โœจ Full pipeline: pretrain > CPT > SFT > enhancement > ensemble refinement > SOTA performance at similar scale

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
A
Adina Yakup
@AdinaYakup
๐Ÿ“…
Sep 01, 2025
238d ago
๐Ÿ†”72042256

Hunyuan-MT-7B ๐Ÿ”ฅ open translation model released by @TencentHunyuan https://t.co/dTs5yBmkIi โœจ Supports 33 languages, including 5 ethnic minority languages in China ๐Ÿ‘€ โœจ Including a translation ensemble model: Chimera-7B โœจ Full pipeline: pretrain > CPT > SFT > enhancement > ensemble refinement > SOTA performance at similar scale

Media 1
โค๏ธ118
likes
๐Ÿ”20
retweets
๐Ÿ–ผ๏ธ Media
M
MaziyarPanahi
@MaziyarPanahi
๐Ÿ“…
Sep 01, 2025
237d ago
๐Ÿ†”46550374

1/ shipping two synthetic med qa sets from @OpenMed_AI community, made by @mkurman88 (core contributor): โ€ข med-synth qwen3-235b-a22b (2507) โ€ข med-synth gemma 3 (27b-it) datasets on @huggingface ๐Ÿ‘‡ https://t.co/1BP4hnABe2

Media 1
๐Ÿ–ผ๏ธ Media
M
MaziyarPanahi
@MaziyarPanahi
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”04635190

need your help! list your top 5 datasets on @huggingface for rl training with verified answers. - math - code - everyday stuff https://t.co/O54HdefTP4

Media 1
๐Ÿ–ผ๏ธ Media
D
daftengine
@daftengine
๐Ÿ“…
Aug 27, 2025
242d ago
๐Ÿ†”16084232

1/ It's now easier than ever to use @huggingface ๐Ÿค— Datasets at scale! Introducing Daft's native support for reading from and writing to Hugging Face. https://t.co/JNJbrgiWNI

Media 1
๐Ÿ–ผ๏ธ Media
N
NLPingu
@NLPingu
๐Ÿ“…
Sep 01, 2025
238d ago
๐Ÿ†”66436117

Hugging Faceใงใ‚‚ๅ…ฌ้–‹ใ—ใพใ—ใŸใ€‚ https://t.co/HEdMxsH2Hi

@NLPingu โ€ข Mon Sep 01 07:18

GPQAใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใฎๆ—ฅๆœฌ่ชž่จณใงใ‚ใ‚‹JGPQAใ‚’ๅ…ฌ้–‹ใ—ใพใ—ใŸใ€‚ๅ•้กŒใจ้ธๆŠž่‚ขใ‚’ๆฉŸๆขฐ็ฟป่จณใ—ใŸๅพŒใซไบบๆ‰‹ใงไฟฎๆญฃใ—ใŸใ‚‚ใฎใซใชใ‚Šใพใ™ใ€‚่ซธไบ‹ๆƒ…ใง็พๅœจใฏGithubใงใฎๅ…ฌ้–‹ใงใ™ใŒใ€ใ„ใšใ‚ŒHugging Faceใงใ‚‚ๅ…ฌ้–‹ไบˆๅฎšใงใ™ใ€‚ https://t.co/TbiKwv2y2b

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
N
Kouta Nakayama
@NLPingu
๐Ÿ“…
Sep 01, 2025
238d ago
๐Ÿ†”66436117

Hugging Faceใงใ‚‚ๅ…ฌ้–‹ใ—ใพใ—ใŸใ€‚ https://t.co/HEdMxsH2Hi

Media 1
โค๏ธ7
likes
๐Ÿ”2
retweets
๐Ÿ–ผ๏ธ Media
T
thekaransinghal
@thekaransinghal
๐Ÿ“…
Aug 27, 2025
242d ago
๐Ÿ†”83761464

FYI HealthBench now conveniently available on HuggingFace. We hope it helps model developers and the healthcare community understand model performance and safety. https://t.co/BqFoLE6Sv5

Media 1
๐Ÿ–ผ๏ธ Media
R
reach_vb
@reach_vb
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”42980638

that's a Chinese food delivery company absolutely mogging the competition https://t.co/r7aMojpUo1

@reach_vb โ€ข Sat Aug 30 16:47

chat, this is getting out of hand: "We introduce LongCat-Flash, a powerful and efficient language model with 560 billion total parameters, the model incorporates a dynamic computation mechanism that activates 18.6Bโˆผ31.3B parameters (averagingโˆผ27B) based on contextual demands" S

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
R
Vaibhav (VB) Srivastav
@reach_vb
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”42980638

that's a Chinese food delivery company absolutely mogging the competition https://t.co/r7aMojpUo1

Media 1
โค๏ธ456
likes
๐Ÿ”32
retweets
๐Ÿ–ผ๏ธ Media
H
huggingface
@huggingface
๐Ÿ“…
Sep 01, 2025
237d ago
๐Ÿ†”87564243

@poetengineer__ You should make it a demo on https://t.co/qkxl7AQLv7 for the community

Media 1
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”18645741

This was the comment. Had to delete it since I don't want my comments to serve as an outlet for this discourse. https://t.co/ydnzxPNvlA

Media 1
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”96317712

Do consider: some humans 40,000 years ago could paint better than you do (see Altamira bisons below). And the basics of civilization -- agriculture, domestication, writing, complex stone architecture, metallurgy (gold) were independently reinvented among populations that were completely isolated from each other for >20,000 years

Media 1
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”74529718

There's a specific world map image that's been been shared about 20 times in the comments and QT of my original tweet, that claims that the average IQ of entire continents is below 65 (a level at which you couldn't learn to read or even live independently). The data is from Lynn and Vanhanen. It has been widely debunked, and sharing it is just telling on yourself. Most country averages are collected from tiny sample sizes, often from children, and in many cases from children in mental institutions or refugees camps. The numbers were also arbitrarily massaged when they didn't agree with the message. It's not science, it's "political discourse" (you know the kind)

Media 1
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”64733526

This is what we're dealing with. If you're sharing this data as "proof" that intelligence is a recent and exclusively European development, you are showing yourself to be incapable of the most basic level of critical thinking https://t.co/QxBlXTyW1I

Media 1
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Aug 31, 2025
239d ago
๐Ÿ†”58447091

For those who don't know who Richard Lynn is -- he was a self-avowed racist who argued for eugenics, and who said about brown/black people, "we need to think realistically in terms of the 'phasing out' of such peoples" So it's not like his "work" was unbiased and guided by a love of truth-seeking. He went and made up some total bunk data points to support his racist views. Didn't even try to make it believable or auditable. And now drooling morons on Twitter are sharing his world map as if it were unquestionable "science". Incredible stuff

Media 1
๐Ÿ–ผ๏ธ Media
M
MIT_CSAIL
@MIT_CSAIL
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”73968424

#Otd in 1955, the term โ€œartificial intelligenceโ€ was coined in a conference proposal: https://t.co/BbzXHKRdns https://t.co/fRNik3DsLz

Media 1
๐Ÿ–ผ๏ธ Media
R
rasbt
@rasbt
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”10929982

@Dhirajsoude Not a math book but my Machine Learning with PyTorch and Scikt-Learn book introduces/covers more of the math alongside the ML/DL topics. I also started writing about these on a more basic level but thatโ€™s years ago and havenโ€™t had time to finish: https://t.co/R3B1AH65o6

Media 1
๐Ÿ–ผ๏ธ Media
M
manorlaboratory
@manorlaboratory
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”70045717

Trump proposed 40% cut to NIH ๐Ÿคฌ๐Ÿ’”๐Ÿ˜ฑ Senate committee (16 GOP, 10 DEM) voted 23-3 to INCREASE NIH budget by $400M ๐ŸŽ‰ ๐ŸŽŠ ๐Ÿ™Œ Next step is House ๐Ÿ˜ถ๐Ÿ˜ถ๐Ÿ˜ถ https://t.co/QCJcVuYCo5

Media 1
๐Ÿ–ผ๏ธ Media
Z
ZeyuanAllenZhu
@ZeyuanAllenZhu
๐Ÿ“…
Aug 31, 2025
238d ago
๐Ÿ†”27706828

๐Ÿš€ NVIDIA continues to lead on open-sourcing pretraining data โ€” Nemotron-CC-v2 has dropped! ๐Ÿ‘ Congrats to @KarimiRabeeh @issanjeev @PavloMolchanov @KezhiKong @SimonXinDong @ctnzr @YejinChoinka + many others! ๐Ÿ™ A very loud thank you for citing our Physics of LMs, Part 3.1. Youโ€™re perhaps the first leading lab to publicly acknowledge its usefulness (knowledge augmentation: add QA at pretrain-level, add diversity + translation). When I ran this code 2 years ago, it was using V100s + 8 A100s so many didnโ€™t believe in it --- I wasnโ€™t approved to test on real-life data, couldnโ€™t secure GPUs for larger experiments. Thatโ€™s why this recognition really matters: it validates the value of foundational projects like ours, and helps me keep pushing to deliver more insights for the AI community. Truly grateful. https://t.co/c5g1VMUhCr

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
C
casper_hansen_
@casper_hansen_
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”78031810

xAI may be one of the single biggest contributors to open-source inference just by serving everything with SGLang https://t.co/zPwZvfvjYP

Media 1
๐Ÿ–ผ๏ธ Media
I
ivanleomk
@ivanleomk
๐Ÿ“…
Aug 30, 2025
240d ago
๐Ÿ†”87562413

Is this rich for typescript with @EffectTS_ ? https://t.co/UYguHYH6hN

Media 1
๐Ÿ–ผ๏ธ Media
I
ivanleomk
@ivanleomk
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”30779290

Things at manus move fast! Plenty of problems to solve and great people to work with We're hiring aggressively at https://t.co/Q65afg9ebq

@hidecloud โ€ข Fri Aug 29 17:04

Shout out to our engineers! Amazing ๐Ÿคฉ

Media 1
๐Ÿ–ผ๏ธ Media
I
ivanleomk
@ivanleomk
๐Ÿ“…
Aug 30, 2025
239d ago
๐Ÿ†”16569611

@backpinelabs @AmpCode Man this OAuth got teeth :( Should be done with it soon fingers crossed. This should make for a much more portable package https://t.co/0JtAL2n8os

Media 1
๐Ÿ–ผ๏ธ Media
S
sethharpesq
@sethharpesq
๐Ÿ“…
Aug 29, 2025
240d ago
๐Ÿ†”50652645

The Fort Bragg Cartel would have never become a reality without @RollingStone. No other US magazine would have given me free rein to investigate the nexus between organized crime and special operations, or to come at the story from such an aggressively adversarial angle. https://t.co/7NyQCc3PaH

Media 1
๐Ÿ–ผ๏ธ Media
J
jasonbonner_
@jasonbonner_
๐Ÿ“…
Aug 30, 2025
240d ago
๐Ÿ†”79201066

https://t.co/h8sWy9qOTS

@depulsive โ€ข Fri Aug 29 14:52

me visiting the club chalamet wall in italy https://t.co/S2Dz6fedpM

Media 1
๐Ÿ–ผ๏ธ Media