Keras custom models deteriorates after save and reload

Hi!
I noted that other persons encountered the same following problem, as me.
(https://datascience.stackexchange.com/questions/94919/predictions-become-random-after-loading-a-custom-saved-keras-model)
Description: I created a Keras model with a custom layer and I noticed that the results deteriorates significantly if I save and reload it.
To clarify, in the scenario 1 everything works fine every time. In scenario 1, I build the model then I compile and train it. But in scenario 2 (I build the model, save it untrained, reload it and then I compile and train it) the results deteriorates significantly.

I have been thinking about whether some of the parameters of the custom layers (i.e learning rate) are not correctly saved. But this shouldn’t be the case because I always recompile the model after reloading it. So at this point I do not know what can cause the problem.

I will appreciate any advise.

Can you reproduce this with a dummy Colab example?

I have a dummy notebook but the problem is not easily reproducible. Appears at random and not very often.

Have you tried with input_signature?

Not yet. I did some tests and I believe I understand from where the problem is coming: in the call function I also use three global variables and I believe that the problem comes from the way how the keras model use these variables when it is loaded. To explain: in the Call function, the model parameters are used together with global variables in a formula, in order to compute some results.
The learnt model parameters are almost the same if I train the model after I build it or after I load it. But then, if I complete ‘by hand’ the remaining computations, the results in both scenarios are OK (almost the same - as expected). The only difference is that the loaded model reports wrong (much different than expected) results, although global variables as well as learnt parameters are the same in both scenarios.
So, applying the same computation on the same inputs should result in the same result, but it is not.

The results using same parameters and variables in these scenarios, are different, like this:

for item 123 : scenario is - load + train
po: 2.47, el: -14.24

for item 123 : scenario is - build + train
po: 1.92, el: -3.67

Yes sorry the comment was for your other retracing thread at:
https://tensorflow-prod.ospodiscourse.com/t/custom-model-trigerring-retracing-warning/3722?u=bhack

Can you reproduce the error in a small Colab?

1 Like

Yes, how can I send it to you? ( I have a local notebook not a colab)

In the dummy notebook are two sets of data and the problem is as follows: starting from scratch, with a first set of data, everything is OK (results for build model + train are the same as the ones for load model + train). The problem appears when I continue with a subsequent set of data: the results for build model + train are different than the ones for load model + train.

If you can reproduce this with dummy data you can share a minimal example with a Github gist notebook or a free Google Colab notebook

1 Like

Here it is. The same model generates also the warning below.

“WARNING:tensorflow:6 out of the last 8 calls to <function Model.make_predict_function…predict_function at 0x7f90184d5310> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to Better performance with tf.function | TensorFlow Core and tf.function | TensorFlow Core v2.6.0 for more details.”

Test with second set of data = Not OK (results for build + train are different than load + train)

Where is in the notebook the case build + train?

Example of build + train is:
a) setting up the global variables: pmean, pscale, smean
b) build the model
inputs = tf.keras.Input(shape=(1,))
x = CustomLayer(units=1)(inputs)
model = tf.keras.Model(inputs=inputs, outputs=, name=‘model’)
model.summary()
c) compile and train the model

One set of data consists in global variables + features(scaled_xs) + labels(ys).
When I change to another set of data, therefore changing the global variables, features and labels, the results retrieved with a model reloaded and trained differ from the ones obtained with the model rebuilt and trained. Otherwise said: the results obtained when I build the model from scratch (redefined as above) are correct with any set of data but not also in the case I only load a saved (and untrained) model which I want to train after reloading it. To note that the parameters saved by the model are the same in both scenarios (are visible in notebook) so the only difference I can think of is about global variables. Maybe in the second scenario (load + train) the model does not handle properly the new global variables. This is reproducible with the notebook. Hope it somewhat clarifies.

Yes I understand but probably I am missing in your notebook this step for the 2nd set of data:

build + train + predict

Predict is not necessary for my task. The main objective of the custom layer, is the computation of ‘po’ and ‘el’ variables. So instead using predict, I just read the computed variables. These two are what I am interested to find.

Ok but It seems that you have annotated only manually the last expected result in the last Cell without the code.

Can you try to fix the seeds in your first import Cell?

Done. I also successfully reproduced the problem using only one set of features, labels and two sets of global variables. It is clear now that the problem is related to them (global vars). I added an updated notebook here:

Is that you are in graph mode so you will not going to change pmean and smean in the graph.

You can check yourself, add this in your call:

tf.print(smean)
tf.print(pmean)

You can see the difference when you reload and compile the Model to run in eager mode run_eagerly=True

See also:

1 Like

Thanks for suggestion, but the solution is not working . I’ve added an update of self.pmean, self.pscale, self.smean in Call using tf.keras.backend.update but they are not updated when I change the values of the global vars.

The suggestion was not to use that API but to just understand the flow of the calls.

When you load the Model is already compiled so you cannot impact the model with the new globs (only if you run eagerly as I mentioned in the last post):

Eager execution is enabled by default in TF2.0. I’ ve been careful also to save the model without compiling it.
I compile the model only after reloading and before training it. Something seems that is probably not working as it should?

It is not the case for the default behaviour of these API:

By default, we will attempt to compile your model to a static graph to deliver the best execution performance

If this was the case you should see this warning

A Keras model instance. If the original model was compiled, and saved with the optimizer, then the returned model will be compiled. Otherwise, the model will be left uncompiled. In the case that an uncompiled model is returned, a warning is displayed if the compile argument is set to True .