@lixin4ever
๐๐Excited to share DiGIT, a new image tokenizer for autoregressive image modeling. ๐กSuper simple idea: - SSL representations + K-Means clustering as discrete image tokenizer - Autoregressive modeling over image tokens ๐ Discriminative SSL model (e.g., Dinov2) is critical ๐ Strong results (understanding & generation): - 80.3% Top-1 Accuracy on ImageNet (Linear probe) - Class-unconditional FID score of 4.59 - Class-conditional FID score of 3.39 Paper: https://t.co/y4JSBd7DnQ Repo: https://t.co/WdvNkzVYUl Model: https://t.co/poqMhrJcph