I am trying to understand why there is a difference between calculating a dense layer operation directly and using the keras
implementation.
Following the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) tf.keras.layers.Dense()
should implement the operation output = activation(dot(input, kernel) + bias)
but result
and result1
below are not the same.
tf.random.set_seed(1)
b = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(5,10)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
result = tf.nn.relu(tf.linalg.matmul(a=kernel, b=x) + b)
tf.print(result)
test = tf.keras.layers.Dense(units = 5,
activation = 'relu',
use_bias = True,
kernel_initializer = tf.keras.initializers.Constant(value=kernel),
bias_initializer = tf.keras.initializers.Constant(value=b),
dtype=tf.float32)
result1 = test(tf.transpose(x))
print()
tf.print(result1)
output
[[2.87080455]
[3.25458574]
[3.28776264]
[3.14319134]
[2.04760242]]
[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
Using test.get_weights()
I can see that the kernel and bias (b
) are getting set to the correct values. I am using TF version 2.12.0.