Migrating AUC metric from 2.8 to 2.12: y_pred.rank is None

I’m upgrading my training code from 2.8 to 2.12. Fitting a simple multi-label classifier on 2.12.0 raises an exception only when I use the following metric:

    tf.keras.metrics.AUC(curve="ROC",  multi_label=True)

No exception is raised when I don’t use this metric. It’s raised whether or not I use a validation dataset, so this is happening when computing metrics on the training data. The error is:


File ".tox/train/lib/python3.10/site-packages/keras/engine/training.py", line 1284, in train_function  *
    return step_function(self, iterator)
File ".tox/train/lib/python3.10/site-packages/keras/engine/training.py", line 1268, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File ".tox/train/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in run_step  **
    outputs = model.train_step(data)
File ".tox/train/lib/python3.10/site-packages/keras/engine/training.py", line 1055, in train_step
    return self.compute_metrics(x, y, y_pred, sample_weight)
File ".tox/train/lib/python3.10/site-packages/keras/engine/training.py", line 1149, in compute_metrics
    self.compiled_metrics.update_state(y, y_pred, sample_weight)
File ".tox/train/lib/python3.10/site-packages/keras/engine/compile_utils.py", line 605, in update_state
    metric_obj.update_state(y_t, y_p, sample_weight=mask)
File ".tox/train/lib/python3.10/site-packages/keras/utils/metrics_utils.py", line 77, in decorated
    update_op = update_state_fn(*args, **kwargs)
File ".tox/train/lib/python3.10/site-packages/keras/metrics/base_metric.py", line 140, in update_state_fn
    return ag_update_state(*args, **kwargs)
File ".tox/train/lib/python3.10/site-packages/keras/metrics/confusion_metrics.py", line 1453, in update_state  **
    self._build(tf.TensorShape(y_pred.shape))
File ".tox/train/lib/python3.10/site-packages/keras/metrics/confusion_metrics.py", line 1402, in _build
    raise ValueError(

ValueError: `y_true` must have rank 2 when `multi_label=True`. Found rank None. Full shape received for `y_true`: <unknown>

One thing I noticed from the stacktrace is that the ValueError message says it’s about y_true, but in fact one level up we see it’s using y_pred. Maybe I’m being pedantic to notice the mismatch, but as an experiment I went ahead and tried directly editing confusion_metrics.py on line 1453 to call self._build on y_true as _build says it expects. With that hack, my model training finishes!

So now I have two questions:

  1. Is it a bug in keras to call self._build(tf.TensorShape(y_pred.shape)) when AUC._build’s error message indicate it expects to receive y_true.shape? I’m shocked that y_true and y_pred would ever have different shapes, but clearly they sometimes do.
  2. What API change would cause this error to be start appearing only when I upgrade Tensorflow? I browsed the changelogs and didn’t see anything that felt relevant.

I could probably produce a minimal example if it’s helpful.

1 Like

Here’s a minimum reproducible example. The idea is that I’m given a list of filenames, each file containing a serialized vector embedding, and I have to load it (this example below just uses np.random to simulate the loading).

The error happens when passing to model.fit a tf.data.Dataset that maps over this tf.numpy_function. If instead I pass a fully realized array of embeddings the model trains successfully. Or another workaround is to not use an AUC metric, even if I do use the lazy tf.data.Dataset.

import numpy as np
import tensorflow as tf

EMBEDDING_SHAPE = (64,)


def load_embedding(x, y):
    """Convert x (a filename) to embedding, also returning y untouched."""
    return (
        # The error doesn't happen without the numpy_function. No error from:
        # return (x*2, y)
        tf.numpy_function(
            lambda fname: np.random.rand(*EMBEDDING_SHAPE).astype(np.float32),
            [x],
            Tout=tf.float32,
            stateful=False,
            name="EmbeddingReader",
        ),
        y,
    )


def mk_model(input_shape):
    inputs = tf.keras.layers.Input(shape=input_shape, name="embeddings")
    outputs = tf.keras.layers.Dense(2, activation="sigmoid")(inputs)
    model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

    # NO ERROR IF THE AUC() METRIC IS NOT USED
    metrics = [tf.keras.metrics.AUC(curve="ROC", multi_label=True)]
    model.compile(
        loss="binary_crossentropy",
        metrics=metrics,
    )

    return model


def main():
    filenames = np.array(["a", "b", "c", "d"])
    y = np.array([[1, 0], [1, 1], [0, 0], [0, 1]])
    training_ds = tf.data.Dataset.from_tensor_slices((filenames, y))
    training_ds = training_ds.map(load_embedding).batch(16)

    model = mk_model(EMBEDDING_SHAPE)
    history = model.fit(
        training_ds,  # SWITCHING THIS TO NOT USE A DATASET, AS IN THE NEXT LINE, FIXES IT
        # [X for X, _ in training_ds],  y,
        epochs=1,
    )


if __name__ == "__main__":
    main()

Answering my first question, this incorrect exception message has been resolved in keras 2.13, specifically State `y_pred` must be 2 dimensional for AUC. · keras-team/keras@05674c4 · GitHub

Meanwhile I was able to workaround the exception by explicitly setting the shape on the X Tensor. I found this workaround at tensorflow - keras custom layer unknown output shape - Stack Overflow

The code in my previous comment is sending to AUC.update_state the argument

y_pred=<tf.Tensor 'model/dense/Sigmoid:0' shape=<unknown> dtype=float32>

After adding a set_shape() call to the result of the tf.numpy_function, now AUC.update_state sees the argument

y_pred=<tf.Tensor 'model/dense/Sigmoid:0' shape=(None, 2) dtype=float32>

And knowing y_pred’s shape, it can proceed successfully.

I am not clear why setting the shape on X is the thing ends up getting y_pred’s shape set—or why older versions of Keras did not mind the unknown shape—but code is flowing now.