OOM while training LLM

Leonhard_Piff · April 15, 2024, 1:59pm

Im working on trying to fine tune a 7 Billion parameter model in TF keras but I’m having the problem that I can’t distribute the model across multiple GPUs.

I read some about the parameter server strategy but I can’t really get it to work the way I want to.

If anyone has any insights they would be greatly appreciated.

Kiran_Sai_Ramineni · April 24, 2024, 9:04am

Hi @Leonhard_Piff, Please refer to this document for running Gemma 7b on multiple GPU’s. Thank You.

Topic		Replies	Views
How to use GPU Memory Oversubscription in TensorFlow 2 (Keras API)? General Discussion models , keras	1	608	January 10, 2024
I can't use Multi-GPU to fine-tune the Gemma3 4B model Gemma models , gemma-3	3	1208	June 3, 2025
OOM when allocating tensor with shape General Discussion models , keras , help_request	7	5771	January 2, 2024
RESOURCE_EXHAUSTED when running TimeDistributed on MultiHeadAttention TensorFlow models , tfkeras , transformers	1	426	January 29, 2024
Training with multi-gpus can not accelerate General Discussion gpu , help_request	2	437	December 13, 2022

OOM while training LLM

Related topics