Eager execution: Most of the ops are placed on host instead of device

Running Keras model in eager execution causes most of the ops to be placed on the host instead of device. Obviously, it causes eager execution to be much slower. Is it some issue, or that’s how eager execution works?

TensorFlow Profiler output:
Code : keras-io/examples/vision/mnist_convnet.py at master · keras-team/keras-io · GitHub
tf_profile_graph

Same code with run_eagerly=True in the model.compile().
tf_profile_eager

system: 5.10.42-1-MANJARO
version: tensorflow 2.5 (Manjaro repository)

HI @SmacznaKawusia

Welcome to the TensorFlow Forum!

Yes, When you run a Keras model in eager execution, most of the ops and the model’s weights are placed on the host memory by default as It is designed to be interactive and placing ops on the host allows them to be executed immediately.

It can be slower when the model is large or complex because it will have to transfer data between the host and device memory on each operation. This is useful for debugging and experimentation.

To improve the performance of eager execution some techniques such as - can use a GPU, smaller batch size and less complex model. You can also try using tf.function decorator. Please refer to this link for more details on tf.function.