Despite the claim in following link, I found keras is not saving/loading the models with iteration number, optimizer, ExponentialDecay learning rate etc
What I want: an adaptive training
I want to save the model after each epoch with everything (weights, optimizer, leanring rate). Now if the loss is not improving over N epochs, I want to reload the best_loss model and resume training from there again.
My example:
tf.random.set_seed(42)
np.random.seed(42)
x = np.random.randn(1000, 10).astype(np.float32)
y = (np.sum(x, axis=1, keepdims=True) > 0).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(32)
def build_model(lr_schedule):
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
inputs = tf.keras.Input(shape=(10,))
x = tf.keras.layers.Dense(32, activation="relu")(inputs)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)
model = tf.keras.Model(inputs, outputs)
model.compile(
optimizer=optimizer,
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.BinaryAccuracy()]
)
return model
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=0.001,
decay_steps=100,
decay_rate=0.9,
staircase=True
)
model = build_model(lr_schedule)
for i in range(10):
model.fit(dataset, epochs=1, verbose=2)
model.save(f"saved_model_test/iter_{i+1}.keras")
model.save(f"saved_model_test/iter_{i+1}.h5")
I am saving both in .keras and .h5 to check. Now
loaded_model = keras.models.load_model(
"saved_model_test2/iter_10.keras"
)
print("Iterations:", loaded_model.optimizer.iterations.numpy())
print(type(loaded_model.optimizer))
print(loaded_model.optimizer.__class__.__name__)
print("LR:", loaded_model.optimizer.learning_rate)
print("LR - value current:", loaded_model.optimizer._learning_rate)
print(type(loaded_model.optimizer._learning_rate))
gives
Iterations: 0
<class 'keras.src.optimizers.rmsprop.RMSprop'>
RMSprop
LR: <Variable path=rmsprop/learning_rate, shape=(), dtype=float32, value=0.0010000000474974513>
LR - value current: <Variable path=rmsprop/learning_rate, shape=(), dtype=float32, value=0.0010000000474974513>
<class 'keras.src.backend.Variable'>
My versions:
TF =2.20.0 and keras=3.13.2
More confusions: with
TF =2.12.0 and keras=2.12.0 I am getting
Iterations: 320
<class 'keras.optimizers.adam.Adam'>
Adam
LR: <tf.Variable 'current_learning_rate:0' shape=() dtype=float32, numpy=0.001>
LR - value current: <keras.optimizers.schedules.learning_rate_schedule.ExponentialDecay object at 0x7f8870089f70>
<class 'keras.optimizers.schedules.learning_rate_schedule.ExponentialDecay'>
I am wondering how these types of bugs can persist for so long!