@_akhaliq
Augmenting CLIP with Improved Visio-Linguistic Reasoning paper page: https://t.co/PHbgZCUuRi Image-text contrastive models such as CLIP are useful for a variety of downstream applications including zero-shot classification, image-text retrieval and transfer learning. However,… https://t.co/Eu1TgBgmyb https://t.co/JP6iuPXSMV