I know there are a thousand post about this sort of thing, and I’ve spent days and days trying to get this to work.
Somehow I got to benchmark on the cifar10 model about a week ago but I have had no luck since. GPU was detected but unable to load model into memory.
Originally I already had CUDA12.0 installed, thats what worked. I have since tried to downgrade to 11.2 unsuccessfully, now GPU is not detected at all in tensorflow 2.11
I have tried all I can think of, and all I could find online.
I am at my wits end with this after hours poured into this.
Any help would be greatly appreciated
@Notoooriou5,
Welcome to the Tensorflow Forum!
Could you please share details of your operating system and steps that you took to install Tensorflow?
Thank you!
Hi Chanduriv,
System is i7-7700k, 24gb ram, and rtx 3090, tensorflow-gpu 2.10, running in anaconda.
I have reverted to cuda 12.0 and my GPU is now detected.
However, when attempting to run a training example on the cifar-10 model, I am out of memory. (hard drive not RAM, my RAM seems to be untouched).
GPU is loaded with approx 18GB out of 24GB available
I have approx 7GB free on my disk and it is fully consumed and unable to carry out training.
Is this simply a case of not having enough empty disk space? Or is there something else I should check?
@Notoooriou5,
You can try limiting gpu memory. Currently it can be handled in two ways
a) Turn on memory growth
by calling [tf.config.experimental.set_memory_growth
]. (tf.config.experimental.set_memory_growth | TensorFlow v2.16.1).
It allocates more memory as the process increases and demands extra memory
b) Set a hard limit on the total memory
tf.config.set_logical_device_configuration(memory_limit=1024)
Thank you