Hi everyone.
I’m trying to save a Keras model with ModelCheckpoint
callback every two epochs.
If I save the model every epoch by using save_freq="epoch"
, everything is fine and I can use val_mean_absolute_error
to format the filename. However, if I use 2* int(ceil(train_size/batch_size))
which is equal to two epochs, Keras shows an error.
KeyError: 'Failed to format this callback filepath: "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5". Reason: \'val_mean_absolute_error\''
Below is the code, got it from here:
import tensorflow as tf
from tensorflow import keras
def get_model():
model = keras.Sequential()
model.add(keras.layers.Dense(1, input_dim=784))
model.compile(
optimizer=keras.optimizers.RMSprop(learning_rate=0.1),
loss="mean_squared_error",
metrics=["mean_absolute_error"],
)
return model
# Load example MNIST data and pre-process it
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255.0
x_test = x_test.reshape(-1, 784).astype("float32") / 255.0
# Limit the data to 1000 samples
x_train = x_train[:1000]
y_train = y_train[:1000]
x_test = x_test[:1000]
y_test = y_test[:1000]
nSteps = int(tf.math.ceil(len(x_train)/128))
filepath = "saved-model_{epoch:02d}_{val_mean_absolute_error:.2f}.h5"
callbacks = [
tf.keras.callbacks.ModelCheckpoint(filepath=filepath, monitor='val_mean_absolute_error', verbose=1,
save_best_only=False, mode='min', save_freq=2*nSteps)
]
model = get_model()
history = model.fit(
x_train,
y_train,
validation_data=(x_test,y_test),
batch_size=128,
epochs=4,
verbose=1,
callbacks=callbacks,
)
I’m not sure if it’s a bug, but something is not right!
Thank you.
==================
Edited:
After a bit of debugging, I found this code in callback.py
def _implements_train_batch_hooks(self):
# Only call batch hooks when saving on batch
return self.save_freq != 'epoch'
def _implements_train_batch_hooks(self):
"""Determines if this Callback should be called for each train batch."""
return (not generic_utils.is_default(self.on_batch_begin) or
not generic_utils.is_default(self.on_batch_end) or
not generic_utils.is_default(self.on_train_batch_begin) or
not generic_utils.is_default(self.on_train_batch_end))
def _implements_test_batch_hooks(self):
"""Determines if this Callback should be called for each test batch."""
return (not generic_utils.is_default(self.on_test_batch_begin) or
not generic_utils.is_default(self.on_test_batch_end))
Accordingly, if I set save_freq='epoch'
, self.on_train_batch_end()
is skipped and self.on_test_batch_end
can format the filename correctly. So I think this is a bug and the code should consider somehow if the save_freq == n_epoch or provides another parameter to say if it is epoch or step.