@EnricoShippole
Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with @AiEleuther, @VectorInst, @allen_ai, @huggingface, and DPI by @ShayneRedford. We provisioned a subset of the Common Pile, consisting only of public… https://t.co/K61ld9XqWP