I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.
I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?
(1.3B parameters) using a single A100 GPU (40GB VRAM)
I’m not a training expert myself, but I think this is way too slow for the specs and model size…
Likely causes could be not using the GPU properly, or small batch size, but usually you don’t have to specify anything in particular to get the kind of use that was there…
for anyone interested in the answer. This is the expected training speed for the provided hardware and model size. What I ended up doing to improve the speed substantially is:
Lower the context length from 2048 to 512.
Use mixed precision training.
Using a quantized optimizer.
Step 1 had the biggest impact on training speed. I trained with the lowered context window for ~95% of the data I had and then increased it back to 2048 for the remaining 5%.