@dahou_yasser
We are releasing Falcon Perception, an open-vocabulary referring expression segmentation model. Along with it, a 0.3B OCR model that is on par with 3-10x larger competitors. Current systems solve this with complex pipelines (separate encoders, late fusion, matching algorithms). We developed a novel simpler "bitter" approach: one early-fusion Transformer (image + text from first layer) with a shared parameter space, and let scale + training signal do the work. Please check our work ! 📄 Paper: https://t.co/dWvK5t7MIt 💻 Code: https://t.co/AJ65GbMrUY 🎮 Playground: https://t.co/BIgisZkeid 🤗 Blogpost: https://t.co/J2IjlBPywF