Hardware Requirements for CPU / GPU Inference

#58

by jurassicpark - opened Jul 20, 2022

Jul 20, 2022

I was looking and couldn't find any recommendations for the required hardware to run this model in inference on the CPU or GPU.

I'm going to test it out but some guidance would be pretty helpful.

Does anyone have this data? Particularly, how much RAM for CPU, and amount of GPU RAM (I've seen some threads saying ~352GB). Also, perhaps what kind of inference times can be expected with different setups.

jurassicpark

Jul 20, 2022

Copying some data I found from other threads here:

@IanBeaver

It needed around 400GB [disk space] just to fit the all the weights files. They list the sizes of the weights and checkpoints under the Training section.

@IanBeaver

I have successfully loaded it on a single x2iezn.6xlarge instance in AWS but using only CPUs the model is very slow. Text generation sampling for several sequences can take several minutes to return, but the full model is working and it is much cheaper for local evaluation than 9 GPUs!

x2iezn.6xlarge specs:

768gb RAM
24 vcpus
$5.004 / hour

@maveriq

As a first order estimate, 176B parameters in half precision (16 bits = 2 bytes) would need 352 GB RAM. But since some modules are 32-bit, it would be more. So about nine GPUs with 40-GB RAM, and it doesn't take into account the input.

pai4451

Jul 22, 2022

GPU RAM requires more than 352 GB RAM (176B parameters in half-precision). I can do the inference on 8 A6000 GPUs. However, there isn't much room left for input tokens.

bwv988

Jul 22, 2022

Copying some data I found from other threads here:...

Thanks for this, very helpful, was looking for the same information. No wonder I am failing to run the full model on a 64GB VM. ;)

Have you come across any recommendations anywhere to reduce memory usage, say, for specific pipeline tasks?

snarik

Sep 11, 2022

@bwv988 Your best bet is to try out bitsandbytes. https://github.com/TimDettmers/bitsandbytes

bwv988

Oct 1, 2022

Thanks @snarik , will give this a go!

needfulthing

Apr 24, 2023

This configuration claims to run on >16 GB RAM and a single CPU:

https://towardsdatascience.com/run-bloom-the-largest-open-access-ai-model-on-your-desktop-computer-f48e1e2a9a32

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment