Hi,
I’m using this tutorial: Image classification | TensorFlow Core
My image size is 256x256. I don’t need bigger sizes, I just need it to be fast, so to my surprise, I found out that my CPU is 5% slower than my GPU, sometimes is actually faster.
Num GPUs Available: 1
Tensorflow: 2.10.0
build_info.build_info:
OrderedDict([(‘is_cuda_build’, False), (‘is_rocm_build’, False), (‘is_tensorrt_build’, False), (‘msvcp_dll_names’, ‘msvcp140.dll,msvcp140_1.dll’)])
Video card: GeForce rtx 4060 ti 8gb (when is training it doesn’t go more than 1-2%)
RAM 96gb (never going up more than 45gb)
SSD Kingston FURY Renegade 1TB PCI Express 4.0 x4 M.2 2280 - 7300mb/s → never goes to 1%
CPU - intel i5 10th generation - no more than 53%
I really don’t see the bottleneck, the bigger batch I use the slower is. The best batch is 16, so if I don’t have a bottleneck it would make sense the bigger the batch the better speeds…
I have about 1000 classes.
Here is the output of my model
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) (None, 256, 256, 3) 0
rescaling_1 (Rescaling) (None, 256, 256, 3) 0
conv2d (Conv2D) (None, 256, 256, 8) 224
max_pooling2d (MaxPooling2D (None, 128, 128, 8) 0
)
conv2d_1 (Conv2D) (None, 128, 128, 16) 1168
max_pooling2d_1 (MaxPooling (None, 64, 64, 16) 0
2D)
conv2d_2 (Conv2D) (None, 64, 64, 32) 4640
max_pooling2d_2 (MaxPooling (None, 32, 32, 32) 0
2D)
dropout (Dropout) (None, 32, 32, 32) 0
flatten (Flatten) (None, 32768) 0
dense (Dense) (None, 128) 4194432
outputs (Dense) (None, 863) 111327
=================================================================
Total params: 4,311,791
Trainable params: 4,311,791
Non-trainable params: 0