@arankomatsuzaki
Think before you speak: Training Language Models With Pause Tokens - Performing training and inference on LMs with a learnable pause token appended to the input prefix - Gains on 8 tasks, e,g, +18% on SQuAD https://t.co/snkfjFZhhZ https://t.co/wUhZspVtSj