Cannot train on RTX 3090

Notoooriou5 · February 15, 2023, 11:44am

I know there are a thousand post about this sort of thing, and I’ve spent days and days trying to get this to work.
Somehow I got to benchmark on the cifar10 model about a week ago but I have had no luck since. GPU was detected but unable to load model into memory.
Originally I already had CUDA12.0 installed, thats what worked. I have since tried to downgrade to 11.2 unsuccessfully, now GPU is not detected at all in tensorflow 2.11

I have tried all I can think of, and all I could find online.
I am at my wits end with this after hours poured into this.
Any help would be greatly appreciated

chunduriv · February 15, 2023, 11:51am

@Notoooriou5,

Welcome to the Tensorflow Forum!

Could you please share details of your operating system and steps that you took to install Tensorflow?

Thank you!

Notoooriou5 · February 15, 2023, 1:17pm

Hi Chanduriv,
System is i7-7700k, 24gb ram, and rtx 3090, tensorflow-gpu 2.10, running in anaconda.
I have reverted to cuda 12.0 and my GPU is now detected.
However, when attempting to run a training example on the cifar-10 model, I am out of memory. (hard drive not RAM, my RAM seems to be untouched).
GPU is loaded with approx 18GB out of 24GB available

I have approx 7GB free on my disk and it is fully consumed and unable to carry out training.
Is this simply a case of not having enough empty disk space? Or is there something else I should check?

chunduriv · February 16, 2023, 12:21pm

@Notoooriou5,

You can try limiting gpu memory. Currently it can be handled in two ways
a) Turn on memory growth by calling [tf.config.experimental.set_memory_growth]. (tf.config.experimental.set_memory_growth | TensorFlow v2.16.1).
It allocates more memory as the process increases and demands extra memory
b) Set a hard limit on the total memory tf.config.set_logical_device_configuration(memory_limit=1024)

Thank you

Topic		Replies	Views
Why missing more than 20% of video memory with TensorFlow both Linux and Windows? [RTX 3080] General Discussion gpu , help_request	1	1291	May 9, 2023
How to manage gpu memory allocation properly General Discussion models , gpu , help_request	1	2126	November 3, 2021
Python crashes when I run tf.random.normal([1000, 1000]) in TensorFlow 2 General Discussion gpu , help_request , tfcore	4	3506	October 19, 2021
I have problem with train models in tensorflow General Discussion models , tensorflow	8	590	December 6, 2023
CUDA error: out of memory (CUDA_ERROR_OUT_OF_MEMORY) General Discussion models , datasets , gpu , tensorflow	4	1393	November 11, 2023

Cannot train on RTX 3090

Related topics