Please Help: How to profile using tensorflow 2.17?

Target versions: tf 2.17+, Keras 3.4.1+

Most of the profiler’s functionality works, but not all of it, and returns a cryptic error: No step marker observer and hence the step time is unknown.

enter image description here

I’ve dug down the tf and tf/profiler issue-threads and docs, but the workarounds are for docker or permissions problems, none of which appears here. (one here, for example, another old one on SO…)

The functionality that works is most of the “Tools” dropdown at the middle-left of the image (excepting the overview page):

enter image description here

I’ve created a simple colab to reproduce the results, this is the Python code for it:


# !pip uninstall tensorflow keras tensorboard-plugin-profile tensorboard tb-nightly tensorboardX -y
# !pip install -U tensorflow==2.17.0 keras==3.4.1 tensorboard-plugin-profile

import tensorflow as tf
import keras
from keras.api.layers import Dense, Flatten
from keras.api.callbacks import  TensorBoard
from datetime import datetime
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(128)
ds_train = ds_train.cache()
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

model = keras.models.Sequential([
  keras.layers.Input(shape=(28, 28, 1)),
  keras.layers.Flatten(),
  keras.layers.Dense(412,activation='relu'),
  keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=keras.optimizers.Adam(0.001),
    metrics=['accuracy']
)



logs = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
options = tf.profiler.experimental.ProfilerOptions(host_tracer_level = 3,
                                                   python_tracer_level = 1,
                                                   device_tracer_level = 1)

tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs, histogram_freq = 1) # can not use profile batch anymore (see docs.)

# train
tf.profiler.experimental.start(logs,options=options)
model.fit(ds_train, epochs=9, validation_data=ds_test,callbacks = [tboard_callback], steps_per_epoch=35)
tf.profiler.experimental.stop()

%load_ext tensorboard

%tensorboard --logdir="{logs}"

Can anyone suggest what’s going on here and how to get it fixed?

+1. I am surprised not that many people have reported it. There was a GitHub issue open as well.