@_lewtun
I've tested the "long is more" trick on @Teknium1's OpenHermes dataset and it works surprisingly well 🔥! - Select the 1k longest samples (0.1%) - SFT Mistral-7B for 15 epochs with NEFTune α=10 - MT Bench ~7 + decent perf on other benchmarks 💾Dataset: https://t.co/MWnWXgZ6QT https://t.co/0HpXvX1DAm