@osanseviero
Some fun things people may have missed from Gemma 3 270M: 1. Out of 270M params, 170M are embedding params and 100M are transformers blocks. Bert from 2018 was larger π€― 2. The vocabulary is quite large (262144 tokens). This makes Gemma 3 270M very good model to be hyper specialized in a task or a specific language, as the model will work very well even with less common tokens. 3. We released both a pre-trained and an instruct model, enabling you to fine-tune for your needs. 4. We collaborated closely with the developer ecosystem to get this out, allowing you to use Hugging Face transformers and transformers.js, Ollama, Kaggle, LM Studio, Docker, LiteRT, Vertex, llama.cpp, Keras, MLX, Gemma.cpp, UnSloth, JAX, Cloud Run, and more. https://t.co/CLciq44qOS