@omarsar0
Mixture of In-Context Learners Uses subsets of demonstrations to train experts via in-context learning. Given a training set, a trainable weighting function is used to combine the experts' next-token predictions. This approach applies to black-box LLMs since access to the internal parameters of the LLM is not required. Good properties include the following: - competitive with standard ICL while being significantly more data, memory, and computationally efficient - resilient to noisy demonstrations and label imbalance Overall, it is a very cool and simple approach to make better use of in-context demonstrations which is one of the more important methods to get the most out of the LLMs today.