Difference between "train_on_batch()" and "test_on_batch()" return values

Hi.

I’m using Keras.

version:
Python 3.6.2
keras 2.6.0
tensorflow 2.6.0

There is a difference between train_on_batch() and test_on_batch() loss.
What is the reason for this?

train_on_batch(), test_on_batch(), evaluate()
--------------------------------------------------------------
:
0.5317689776420593, 0.5236611366271973, 0.5236611366271973
0.5239976644515991, 0.519664466381073, 0.519664466381073
0.5211538076400757, 0.515764057636261, 0.515764057636261
0.5445187091827393, 0.5118800401687622, 0.5118800401687622
0.5287842750549316, 0.5079948902130127, 0.5079948902130127
0.49349671602249146, 0.5042303800582886, 0.5042303800582886
:

The loss of test_on_batch () and evaluate () was the same.

The difference between the values of train_on_batch () and test_on_batch () can be extremely large, confusing which is the correct value.

Take a look at:

1 Like

Hi, thanks info.

I understand things like Dropout.

However, there was another mystery related to this.
I will try to reproduce it.

BRs
take hamster

Hi.

When using evaluate (), the loss caused by train_on_batch () rewinds.
Does anyone know why this happens?
And how can I solve it?

The output is shown below.
After doing train_on_batch () 10 times, evaluate ().
The loss before evaluating () was “0.002397470874711871”,
After performing evaluate (), it returns to “1.6820437908172607”.

start training
[3.641401767730713, 0.2685714364051819]    <- evaluate()

D - train on batch set 1/50
D - train on batch 1/10 (1/50)
train: [1.8222469091415405, 0.6342856884002686]    <- train_on_batch()
43.84374737739563 [s]
D - train on batch 2/10 (1/50)
train: [0.0025117292534559965, 1.0]    <- train_on_batch()
43.25669574737549 [s]
D - train on batch 3/10 (1/50)
train: [0.0025033268611878157, 1.0]    <- train_on_batch()
46.86725831031799 [s]
D - train on batch 4/10 (1/50)
train: [0.0029449830763041973, 1.0]    <- train_on_batch()
47.83594560623169 [s]
D - train on batch 5/10 (1/50)
train: [0.0016231434419751167, 1.0]    <- train_on_batch()
44.506582498550415 [s]
D - train on batch 6/10 (1/50)
train: [0.0032135520596057177, 1.0]    <- train_on_batch()
44.22938537597656 [s]
D - train on batch 7/10 (1/50)
train: [0.0022874141577631235, 1.0]    <- train_on_batch()
44.848825216293335 [s]
D - train on batch 8/10 (1/50)
train: [0.0031975528690963984, 1.0]    <- train_on_batch()
46.838255405426025 [s]
D - train on batch 9/10 (1/50)
train: [0.00267593702301383, 1.0]    <- train_on_batch()
45.724446058273315 [s]
D - train on batch 10/10 (1/50)
train: [0.002397470874711871, 1.0]    <- train_on_batch()
45.83753442764282 [s]
evaluate: [3.362639904022217, 0.3028571307659149]    <- evaluate()
D - train on batch set 2/50
D - train on batch 1/10 (2/50)
train: [1.6820437908172607, 0.6514285802841187]    <- train_on_batch() ?
46.7341628074646 [s]
D - train on batch 2/10 (2/50)
train: [0.0026804585941135883, 1.0]
45.43524193763733 [s]
D - train on batch 3/10 (2/50)
train: [0.0029256173875182867, 1.0]
44.48356604576111 [s]
D - train on batch 4/10 (2/50)
train: [0.0018705641850829124, 1.0]
44.31945013999939 [s]
D - train on batch 5/10 (2/50)
train: [0.0022483128122985363, 1.0]
45.50228929519653 [s]
D - train on batch 6/10 (2/50)
train: [0.002211614977568388, 1.0]
45.58434748649597 [s]
D - train on batch 7/10 (2/50)
train: [0.0023816321045160294, 1.0]
45.84853410720825 [s]
D - train on batch 8/10 (2/50)
train: [0.001954358071088791, 1.0]
45.299145221710205 [s]
D - train on batch 9/10 (2/50)
train: [0.0018027760088443756, 1.0]
45.008939266204834 [s]
D - train on batch 10/10 (2/50)
train: [0.0030839790124446154, 1.0]
45.06165862083435 [s]
evaluate: [2.3175859451293945, 0.36571428179740906]
D - train on batch set 3/50
D - train on batch 1/10 (3/50)
train: [1.1597096920013428, 0.6828571557998657]
45.89957118034363 [s]
D - train on batch 2/10 (3/50)

BRs
take hamster

Hi.

Using evaluate() changes the loss.
I write sample code.

(network)

model = Sequential(name='sample_02')

model.add(Input(shape=(3,)))
model.add(BatchNormalization())
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(4))

model.compile(optimizer='adam', loss='mean_squared_error')

    |
    v

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
batch_normalization (BatchNo (None, 3)                 12        
_________________________________________________________________
dense (Dense)                (None, 10)                40        
_________________________________________________________________
activation (Activation)      (None, 10)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 44        
=================================================================
Total params: 96
Trainable params: 90
Non-trainable params: 6
_________________________________________________________________

Base training code is:

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
#        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291
0.5780714750289917
0.5713894367218018
0.5647997856140137
0.5583046674728394
0.5519051551818848
0.5456019043922424
:
:

The loss is the same when using test_on_batch ().

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
#        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291
0.5780714750289917
0.5713894367218018
0.5647997856140137
0.5583046674728394
0.5519051551818848
0.5456019043922424
0.5394853949546814
:
:

The loss is different with and without evaluate ().
I would like to know the cause and solution of this.

(source code)

# starts training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))
print()


(Dump result)

0.584841251373291        <- 1st same
0.5547470450401306      <- different! 
0.5488174557685852
0.5429757833480835
0.537223756313324
0.531562328338623
:
:

BRs
take hamster

I don’t have a minimal runnable example as you don’t have shared dummy x and y data.

If you can share a Colab without external input data it could be better.

Hi.

Attach the smallest executable source code.
To build a common network first, enable the commented out part and run it only once.

import numpy as np
from keras.models import Model
from keras.models import Sequential
from keras.layers import Input, Dense, Activation, Dropout
from keras.layers import BatchNormalization
from keras.models import load_model



# Enable this on the first run and build the network.
'''
model = Sequential(name='sample_01')

model.add(Input(shape=(3,)))
model.add(BatchNormalization())
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(4))

model.compile(optimizer='adam', loss='mean_squared_error')

model.summary()

model.save("model_sample_01.h5")

exit()
'''



model = load_model("model_sample_01.h5")

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10





# training
for i in range(max_loop):
    for j in range(max_in_loop):
        result_1 = model.train_on_batch(x_data_keras, y_data_keras)
#        result_2 = model.test_on_batch(x_data_keras, y_data_keras)
        result_3 = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
        print(str(result_1))


print("end")

BRs
take hamster

The problem is not evaluate is that your are not fixing the seeds for a reproducible run.

I’ve slightly modified your example

import os

os.environ["PYTHONHASHSEED"]=str(1234)

import numpy as np
import unittest
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import load_model
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

# See https://github.com/tensorflow/tensorflow/issues/31149
initializer = tf.keras.initializers.GlorotUniform(seed=42)

def get_model():
  model = Sequential()
  model.add(Input(shape=(3,)))
  model.add(BatchNormalization())
  model.add(Dense(10,kernel_initializer=initializer))
  model.add(Activation('relu'))
  model.add(Dense(4,kernel_initializer=initializer))
  model.compile(optimizer='adam', loss='mean_squared_error')
  model.summary()
  return model

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10




model = get_model()
result_1={}
result_2={}
result_3={}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

model = get_model()
result_1_1 = {}
result_2_2 = {}
result_3_3 = {}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1_1)

case = unittest.TestCase()
case.assertDictEqual(result_1,result_1_1)

Hi.

I commented out some of the code for the first training.

# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

|
v

# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        #result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

The whole source code is below:

import os

os.environ["PYTHONHASHSEED"]=str(1234)

import numpy as np
import unittest
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import load_model
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

# See https://github.com/tensorflow/tensorflow/issues/31149
initializer = tf.keras.initializers.GlorotUniform(seed=42)

def get_model():
  model = Sequential()
  model.add(Input(shape=(3,)))
  model.add(BatchNormalization())
  model.add(Dense(10,kernel_initializer=initializer))
  model.add(Activation('relu'))
  model.add(Dense(4,kernel_initializer=initializer))
  model.compile(optimizer='adam', loss='mean_squared_error')
  model.summary()
  return model

x_data = [
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 0.0],
[1.0, 0.0, 1.0],
[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
]

y_data = [
[0.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 0.0],
]

x_data_keras = np.array(x_data)
y_data_keras = np.array(y_data)

max_loop = 1
max_in_loop = 10




model = get_model()
result_1={}
result_2={}
result_3={}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        #result_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1)

model = get_model()
result_1_1 = {}
result_2_2 = {}
result_3_3 = {}
# training
for i in range(max_loop):
    for j in range(max_in_loop):
        idx=i+j
        result_1_1[idx] = model.train_on_batch(x_data_keras, y_data_keras)
        #result_2_2[idx] = model.test_on_batch(x_data_keras, y_data_keras)
        result_3_3[idx] = model.evaluate(x_data_keras, y_data_keras, verbose = 0)
print(result_1_1)

case = unittest.TestCase()
case.assertDictEqual(result_1,result_1_1)

Dump result

AssertionError: {0: 0[20 chars]: 0.730724573135376, 2: 0.7224830389022827, 3:[15
3 chars]1531} != {0: 0[20 chars]: 0.7113834619522095, 2: 0.704352080821991, 3:[1
51 chars]1235}
  {0: 0.7390658855438232,
-  1: 0.730724573135376,
-  2: 0.7224830389022827,
-  3: 0.7143421769142151,
-  4: 0.7063030004501343,
-  5: 0.6983664035797119,
-  6: 0.6905329823493958,
-  7: 0.6828033328056335,
-  8: 0.6751779317855835,
-  9: 0.6676561236381531}
+  1: 0.7113834619522095,
+  2: 0.704352080821991,
+  3: 0.6973998546600342,
+  4: 0.6905273199081421,
+  5: 0.683735191822052,
+  6: 0.6770237684249878,
+  7: 0.6703934669494629,
+  8: 0.663844645023346,
+  9: 0.6573768854141235}

The loss differs depending on whether the evaluate () is inserted or not. :frowning:
Why does this happen?

BRs
take hamster

Yes I think this is known:

1 Like

Hi.

I saw the contents.
For the time being, I’m going to use only test_on_batch (), not evaluate ().
Thanks

BRs
take hamster

Yes, please upvote and subscribe to the ticket.

1 Like

I have submitted a candidate fix at Fix reset_metrics by bhack · Pull Request #15342 · keras-team/keras · GitHub

1 Like