@RisingSayak
Nothing special just sharing a "bag of tricks" on the OG DiT architecture, inspired by the work on ModernBERT. About 2x less params, 43% better throughput, with longer training, FID improves. Felt nice, won't delete later. https://t.co/62jixLmPzH