I am working on a regression model to extract some parameters from measured curves. The loss function I want to use is a combination of the MSE between the regressed parameters AND the MSE between the true input curve and the curve produced by the regressed parameters. It runs for a few epochs but eventually produces loss = nan. Here’s the model I use:
def create_model(input_shape, num_classes):
input_layer = keras.Input(shape=input_shape)
x = layers.Conv1D(
filters=32, kernel_size=3, strides=2, activation="relu", padding="same"
)(input_layer)
x = layers.BatchNormalization()(x)
x = layers.Conv1D(
filters=64, kernel_size=3, strides=2, activation="relu", padding="same"
)(x)
x = layers.BatchNormalization()(x)
x = layers.Conv1D(
filters=128, kernel_size=5, strides=2, activation="relu", padding="same"
)(x)
x = layers.BatchNormalization()(x)
x = layers.Conv1D(
filters=256, kernel_size=5, strides=2, activation="relu", padding="same"
)(x)
x = layers.BatchNormalization()(x)
x = layers.Flatten()(x)
x = layers.Dense(
2048, activation="relu", kernel_regularizer=keras.regularizers.L2()
)(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(
1024, activation="relu", kernel_regularizer=keras.regularizers.L2()
)(x)
x = layers.Dropout(0.2)(x)
#output_layer = layers.Dense(num_classes, activation="softmax")(x)
output_layer = layers.Dense(num_classes, activation="linear")(x)
return keras.Model(inputs=input_layer, outputs=output_layer)
The model is then created and compiled via:
model = create_model(input_shape=input_shape,num_classes=num_classes) #tyakes input layer for now
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=max_learning_rate,clipnorm=1), loss=custom_loss(x_data, t_final, lam, batch_size),metrics=['mse'])
The custom loss is:
def custom_coef(y_true, y_pred, x, t, lam, batch_size):
Cp = x[:,:,0]
Ct = x[:,:,1]
Ct_hat = func_2cfm_reformulated_keras(y_pred,t,Cp,batch_size, model='2cfm')
final_loss = K.mean(K.square((Ct_hat-Ct)))*lam+K.mean(K.square(y_true-y_pred)) #first is physical, second is just MSE
print(f'The final loss is {final_loss}')
return final_loss
def custom_loss(x, t, lam, batch_size):
def phys_loss(y_true, y_pred):
return custom_coef(y_true, y_pred, x, t, lam, batch_size)
return phys_loss
Finally the forward model that maps the parameters to curves used in the loss is given by:
def func_2cfm_reformulated_keras(x0,t, Cp, batch_size, model):
output_list = []
for b in range(batch_size):
Fp = x0[b,0] #changed to Fp
PS = x0[b,1]
ve = x0[b,2]
vp = x0[b,3]
Te = ve/PS #1
#Fp = Ktrans*PS/(PS-Ktrans) #2
T = (vp+ve)/Fp #3
Tp = vp/Fp #4
#now we convert based on 2CFM or 2CXM models
#2CFM
if model == '2cfm':
Tplus = Te
Tminus = Tp
if model == '2cxm':
Tplus = 0.5*(T+Te+K.sqrt((T+Te)**2-4*Tp*Te))
Tminus = 0.5*(T+Te-K.sqrt((T+Te)**2-4*Tp*Te))
f_Tminus = [K.constant(0)]
f_Tplus = [K.constant(0)]
for ii in range(0,len(t)-1):
xi = (t[ii+1]-t[ii])/Tplus
a = Cp[b,:]*Fp*Tplus*(T-Tminus)/(Tplus-Tminus)
aip = (a[ii+1]-a[ii])/(t[ii+1]-t[ii])
E0 = 1-K.exp(-xi)
E1 = xi-E0
new_val_Tplus = K.exp(-xi)*f_Tplus[ii]+a[ii]*E0+aip*Tplus*E1
f_Tplus.append(new_val_Tplus)
xi_2 = (t[ii+1]-t[ii])/Tminus
a_2 = Cp[b,:]*Fp*Tminus*(Tplus-T)/(Tplus-Tminus)
aip_2 = (a_2[ii+1]-a_2[ii])/(t[ii+1]-t[ii])
E0_2 = 1-K.exp(-xi_2)
E1_2 = xi_2-E0_2
new_val_Tminus = K.exp(-xi_2)*f_Tminus[ii]+a_2[ii]*E0_2+aip_2*Tminus*E1_2
f_Tminus.append(new_val_Tminus)
integral_tensor = tf.stack(f_Tminus) + tf.stack(f_Tplus)
output_list.append(integral_tensor)
output_stack = tf.stack(output_list)
return output_stack
The output is a tensor of size (b,Noints), where b is batch_size, Npoints is number of points in my curves (600). I achieve this by looping over the batch and looping over the individual points, then stacking them up with tf.stack. I’m not sure if this is the right way to do it, but I was able to get this method working with a simpler forward model. It also had the loss = nan issue, I solved it by reducing the size of the model and batch_size. No such luck with this model, however. I’ve tried a batch_size = 1, lowering learning rate to as low as 1E-8, using clipnorm, etc. I am certain that my input x and y values do not contain nan. x is scaled from 0-1 in all cases. Any other ideas? If I remove the first term in final_loss it trains without a problem, however if I leave that term in there even if lam = 0, it will eventually spit out a nan. It sounds like the forward model produces nans, but it doesn’t. I assume the gradients are exploding, but not sure how to check, why this is, or how I could fix it. I’ve tried other optimizers as well (Adam was first choice, no luck).