Hi there! Congrats on the great work, really appreciate seeing discussions like these in the open ✨
Just one question: in the long context extension phase you mention using an extra 100B tokens - from where do you source them? Are they from the same sources as the pretraining, with different upscaling weights?
In general, I would really appreciate it if you could point me to some resource/inspiration regarding what data to use for the long-context extension!