Hi,
Can you explain the difference between calling Adam from tf.keras.optimizers and tf.keras.optimizers.legacy?
I’m using TensorFlow 2.14 with CUDA 11.2 on an RTX 3060 and 64 GB RAM. When training models like an autoencoder, my kernel crashes, even with small datasets (e.g., 100 images) and simple models. Monitoring system performance, I noticed a sudden spike in GPU usage just before the crash.
After trying many solutions, I found that using Adam from tf.keras.optimizers.legacy instead of tf.keras.optimizers or passing Adam directly in compile() solved the problem.
I’m curious why using the legacy version resolves the issue, and why TensorFlow didn’t provide any clear error output for the crash.