@arankomatsuzaki
ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model Collects and open-sources the largest (4 mil) GPT-4V dataset for VLM training, which consists of fine-grained captions, complex instructions and detailed answers https://t.co/cO4re0WYxx https://t.co/IbappxMBPM