@togethercompute
What if you could get 1.3B Transformer quality from a 770M model? That's not a compression result. It's a different architecture. Parcae, from @realDanFu (Together AI's VP of Kernels) and his lab at UCSD, passes activations through the same layers multiple times — stably, for the first time.