GPU is 4-5% slower than CPU

Hi,
I’m using this tutorial: Image classification  |  TensorFlow Core

My image size is 256x256. I don’t need bigger sizes, I just need it to be fast, so to my surprise, I found out that my CPU is 5% slower than my GPU, sometimes is actually faster.

Num GPUs Available: 1
Tensorflow: 2.10.0

build_info.build_info:
OrderedDict([(‘is_cuda_build’, False), (‘is_rocm_build’, False), (‘is_tensorrt_build’, False), (‘msvcp_dll_names’, ‘msvcp140.dll,msvcp140_1.dll’)])

Video card: GeForce rtx 4060 ti 8gb (when is training it doesn’t go more than 1-2%)
RAM 96gb (never going up more than 45gb)
SSD Kingston FURY Renegade 1TB PCI Express 4.0 x4 M.2 2280 - 7300mb/s → never goes to 1%
CPU - intel i5 10th generation - no more than 53%

I really don’t see the bottleneck, the bigger batch I use the slower is. The best batch is 16, so if I don’t have a bottleneck it would make sense the bigger the batch the better speeds…

I have about 1000 classes.
Here is the output of my model



Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 sequential (Sequential)     (None, 256, 256, 3)       0         
                                                                 
 rescaling_1 (Rescaling)     (None, 256, 256, 3)       0         
                                                                 
 conv2d (Conv2D)             (None, 256, 256, 8)       224       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 128, 128, 8)      0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 128, 128, 16)      1168      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 64, 64, 16)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 64, 64, 32)        4640      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 32, 32, 32)       0         
 2D)                                                             
                                                                 
 dropout (Dropout)           (None, 32, 32, 32)        0         
                                                                 
 flatten (Flatten)           (None, 32768)             0         
                                                                 
 dense (Dense)               (None, 128)               4194432   
                                                                 
 outputs (Dense)             (None, 863)               111327    
                                                                 
=================================================================
Total params: 4,311,791
Trainable params: 4,311,791
Non-trainable params: 0

@fiulian,

Welcome to the Tensorflow Forum,

(‘is_cuda_build’, False)

It suggest that the Tensorflow build doesn’t include GPU support.

Could you please share the output of the following snippet?

physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))

Thank you!

The output is
Num GPUs: 1