Keras is not saving models as intended

Despite the claim in following link, I found keras is not saving/loading the models with iteration number, optimizer, ExponentialDecay learning rate etc

What I want: an adaptive training
I want to save the model after each epoch with everything (weights, optimizer, leanring rate). Now if the loss is not improving over N epochs, I want to reload the best_loss model and resume training from there again.
My example:

tf.random.set_seed(42)
np.random.seed(42)
x = np.random.randn(1000, 10).astype(np.float32)
y = (np.sum(x, axis=1, keepdims=True) > 0).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(32)

def build_model(lr_schedule):

    optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)

    inputs = tf.keras.Input(shape=(10,))
    x = tf.keras.layers.Dense(32, activation="relu")(inputs)
    outputs = tf.keras.layers.Dense(1, activation="sigmoid")(x)

    model = tf.keras.Model(inputs, outputs)

    model.compile(
        optimizer=optimizer,
        loss=tf.keras.losses.MeanSquaredError(),
        metrics=[tf.keras.metrics.BinaryAccuracy()]
    )

    return model

lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=0.001,
        decay_steps=100,
        decay_rate=0.9,
        staircase=True
    )
model = build_model(lr_schedule)

for i in range(10):
    model.fit(dataset, epochs=1, verbose=2)
    model.save(f"saved_model_test/iter_{i+1}.keras")
    model.save(f"saved_model_test/iter_{i+1}.h5")

I am saving both in .keras and .h5 to check. Now

loaded_model = keras.models.load_model(
    "saved_model_test2/iter_10.keras"
)

print("Iterations:", loaded_model.optimizer.iterations.numpy())
print(type(loaded_model.optimizer))
print(loaded_model.optimizer.__class__.__name__)
print("LR:", loaded_model.optimizer.learning_rate)
print("LR - value current:", loaded_model.optimizer._learning_rate)
print(type(loaded_model.optimizer._learning_rate))

gives

Iterations: 0
<class 'keras.src.optimizers.rmsprop.RMSprop'>
RMSprop
LR: <Variable path=rmsprop/learning_rate, shape=(), dtype=float32, value=0.0010000000474974513>
LR - value current: <Variable path=rmsprop/learning_rate, shape=(), dtype=float32, value=0.0010000000474974513>
<class 'keras.src.backend.Variable'>

My versions:
TF =2.20.0 and keras=3.13.2
More confusions: with
TF =2.12.0 and keras=2.12.0 I am getting

Iterations: 320
<class 'keras.optimizers.adam.Adam'>
Adam
LR: <tf.Variable 'current_learning_rate:0' shape=() dtype=float32, numpy=0.001>
LR - value current: <keras.optimizers.schedules.learning_rate_schedule.ExponentialDecay object at 0x7f8870089f70>
<class 'keras.optimizers.schedules.learning_rate_schedule.ExponentialDecay'>

I am wondering how these types of bugs can persist for so long!

Hi @satadru, Thank you for reporting this issue , this seems serialization bug from keras2 to keras3. Please try to use pure keras API for your optimizers instead of tf.keras. Also, could you please confirm whether you’ve tested this behavior with keras-nightly versions?

For me pure keras API and tf.keras always gave me the same results (for both versions 2 and 3)! Also I was not using the nightly versions..
My main confusion right now:
Goal - Suppose I want to resume training from 10th iterations using the same learning schedule, optimizer state (as if I am restarting from that epoch, apart from some stochastic difference so that I do not exactly reproduce the same training history).
Now I am confused with model.save('file_name.keras') and checkpoints saving. Can you advise on the official suggestions?

Hi @satadru , Refer to this Save, serialize, and export models, Migrating Keras 2 code to multi-backend Keras 3 for better understanding on model.save() and checkpoints.