Tensorflow 2.10 vs 2.12, same training script, same data, significantly worse training for 2.12

Pierre_Daye · July 31, 2023, 5:46am

I use this code Masked Autoencoder - Vision Transformer | Kaggle to train a network a transformer autoencoder. If I use the code under tensorflow 2.10, I obtain way better results than if I use 2.12. I don’t change the code, the data are the same, the pipeline is identical and a large number of repetitions of training shows a consistent behavior both under 2.10 and 2.12.

This example image shows the training and validation for 2.10 (blue and red curves, respectively) and for 2.12 (blue and orange curves on the top).
I don’t know what could generate such different results if it comes from the same code. I would appreciate if someone had a method to track down the issue.

I saw that one big difference is the change of optimizer between 2.10 and the next versions. It is still possible to use the legacy version of adam but it did not change the results.
I tried with 2.11, 2.12 and 2.13 using the docker image provided by the tensorflow team. All on the same computer, with the same architecture using the same GPU and the results are still significantly worse with versions newer than 2.10.

How could I track why the results are so different?

Thanks!

Renu_Patel · October 26, 2023, 2:00pm

Hi @Pierre_Daye

Welcome to the TensorFlow Forum!

I have tried replicating the same code in Google Colab using TensorFlow 2.10, 2.13 and 2.14 and found slightly better metrics outputs compare to TF 2.10. Please find the replicated gists attached in TF versions for your reference.

Could you please try again once using the latest stable TensorFlow version 2.14 and let us know if the issue still persists. Thank you.

Topic		Replies	Views
Kernel Crashes with TensorFlow 2.14 and the Role of Legacy Optimizers TensorFlow models , datasets , autoencoder , tensorflow	1	87	October 1, 2024
Tensorflow2.6 accuracy lower than Tensorflow2.0 General Discussion models , training , help_request	1	1081	September 9, 2022
Performance with and without tfdf General Discussion tfdf , training , help_request	4	1061	June 30, 2021
Same code gives different results in tensorflow 2.8.0 and 2.9.x General Discussion models , keras	3	784	July 21, 2023
Tensorflow 2.17 slow on apple silicon when training neural nets TensorFlow tfkeras	3	529	November 13, 2024

Tensorflow 2.10 vs 2.12, same training script, same data, significantly worse training for 2.12

Related topics