Strange behavior of mixed precision both in metrics both in speed

Jonathan_Campeggio · September 26, 2023, 12:21pm

I have an Image classifier based on ReseNet50. Until I have used transfer learning during the training, everything was ok. When I decided to train the ResNet from scratch I have a memory problems. It is ok, because there are too much parameters and so on. Therefore I decided to use mixed precision.

Here I report a toy version of my code that reproduces the issue:

import tensorflow as tf
from tensorflow.keras.datasets import mnist

def dummy_model(nClasses):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation=‘relu’))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation=‘relu’))
return model
the actual main

Settings for GPU

gpus = tf.config.list_physical_devices(‘GPU’)
if gpus:
try:
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=1024*3)])
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.list_logical_devices(‘GPU’)
print(len(gpus), “Physical GPUs,”, len(logical_gpus), “Logical GPUs”)
print(“\n\n”)
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)

use of the mixed precision

tf.keras.mixed_precision.set_global_policy(“mixed_float16”)

Loading the data

(trainX, trainy), (testX, testy) = mnist.load_data()

Normalizing the data

trainX = trainX.astype(‘float32’) / 255
testX = testX.astype(‘float32’) / 255

base_model = dummy_model(10)

inputs = tf.keras.Input(shape=(28, 28, 1), name=‘digits’)
x = base_model(inputs)
x = tf.keras.layers.GlobalMaxPooling2D()(x)
tf.keras.layers.Dense(10)
outputs = tf.keras.layers.Activation(activation=“softmax”, dtype = ‘float32’)(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = 0.001), metrics=[“accuracy”], loss = “sparse_categorical_crossentropy”)
model.fit(trainX, trainy, epochs = 10)
model.evaluate(testX, testy)

If I comment the line with the mixed precision, this dummy model that classifies digits of the MNIST database works fine. But when I use the mixed precision, the speed is halved and I have a nearly zero accuracy.

I use Tensorflow 2.8 on native Windows with GPU acceleration.

Jonathan_Campeggio · September 27, 2023, 1:11pm

The interesting thing is that if I force a convolution layer to be fp32, the values of both the loss and the accuracy rise.

Laxma_Reddy_Patlolla · October 3, 2023, 11:01pm

Hi @Jonathan_Campeggio ,

Mixed precision training is most beneficial for very deep models with a large number of parameters. For relatively simple models like the one you provided for MNIST, it may not provide a significant speedup and could introduce numerical stability challenges. You can experiment with different settings and architectures to find the right balance between speed and accuracy for your specific use case.

Thanks.

Topic		Replies	Views
TypeError: Input ‘y’ of ‘Sub’ Op has type float16 that does not match type float32 of argument ‘x’ General Discussion help_request	10	6387	July 9, 2021
Warning: Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn TensorFlow cudnn , gpu	3	160	July 22, 2024
Local output of Keras cost is nan but colab output of Keras cost is valid General Discussion help_request , models	1	784	October 19, 2023
Create the correct variable dtype on custom layer when using mixed precision General Discussion models , keras	4	1484	November 28, 2022
CNN net much slower than DNN General Discussion models , gpu , keras , help_request	9	2503	October 30, 2021

Strange behavior of mixed precision both in metrics both in speed

the actual main

Settings for GPU

use of the mixed precision

Loading the data

Normalizing the data

Related topics