Hello,

Can someone help me speed understand why a simple keras MLP binary classifier evaluates (predicts) significantly slower for single samples than the skl MLPClassifier, despite it training much quicker than it. I already figured out that I should NOT use the keras model’s predict method, since that has a lot of overheads to convert the single sample into a tf dataset etc etc. But even with a direct evaluation I am finding much worse performance, and I hope someone can help me speed this up.

My application requires single-sample evaluation, I’m afraid … I cannot do batch evaluation.

Here’s a reproducer:

```
import numpy as np
import sklearn as skl
import sklearn.neural_network
import tensorflow as tf
# random events, just to test training time
nEvents = 100000
nFeatures = 2
train_X = np.random.uniform(size=(nEvents,2))
train_y = np.concatenate( (np.ones(nEvents//2), np.zeros(nEvents//2)) )
```

Compare training times:

```
%%time
skl_model = skl.neural_network.MLPClassifier(random_state=1,hidden_layer_sizes=(128,128,128,128),alpha=0,batch_size=512,verbose=True).fit(train_X, train_y)
```

Produces: `Wall time: 26.5 s`

and printout say it ran for 13 epochs (iterations)

Compare to:

```
%%time
tf.keras.utils.set_random_seed(1)
tf_model = tf.keras.Sequential([
tf.keras.Input((nFeatures,)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1,activation='sigmoid')
])
tf_model.compile(loss='binary_crossentropy', optimizer='adam')
history = tf_model.fit(train_X, train_y, epochs=200, batch_size=512, verbose=0, callbacks=[tf.keras.callbacks.EarlyStopping(monitor='loss',patience=10)])
print(f"Ran for {len(history.epoch)} epochs")
```

Produces: `Wall time: 17.9 s`

and it ran for 29 epochs … so keras/tf is faster at training (even when it runs more epochs)! great!

Now the problem … evaluation:

```
%time for i in range(10000): skl_model.predict_proba(np.array([(0.5,0.5)]))
%time for i in range(10000): tf_model(np.array([(0.5,0.5)]),training=False)
```

Produces:

```
CPU times: user 3.22 s, sys: 2.67 s, total: 5.89 s
Wall time: 3.18 s
CPU times: user 56 s, sys: 627 ms, total: 56.6 s
Wall time: 56.7 s
```

so keras/tf is much much slower here.

How can I close this gap?

Thanks!