@omarsar0
MegaParse is an open-source tool for parsing various types of documents for LLM ingestion. Supports text, PDF, PowerPoint, excel, csv, and Word documents. It can convert these into a format ideal for LLMs. It can parse content of different types such as tables, TOC, headers, footers, images, etc. I am also building a similar tool and I think the most important feature at the moment is the ability to customize the format of the transformed data as different LLMs prefer different formats.