@arankomatsuzaki
Transformers Can Achieve Length Generalization But Not Robustly Length generalization remains fragile, significantly influenced by factors like random weight initialization and training data order https://t.co/aVTXAMwOn0 https://t.co/1cJQxB5Cqn