@arankomatsuzaki
Some interesting points: - They are now data-constrained, not compute-constrained. Future progress relies on algos w/ better sample-efficiecy. - Training GPT-4 now requires only 5-10 people. - Expect 10M+ GPU training runs, potentially “semi-synchronous” or decentralized. https://t.co/dV82TY0S7P