@xenovacom
Ternary Bonsai: state-of-the-art intelligence at 1.58 bits. The models are so small they can even run locally in your browser on WebGPU! ⚡️ Here's the 8B version (just ~2GB in size) running at 60 tokens per second on my M4 Max. Try the demo out yourself! 👇