Creating my own layer in Keras

Sorry I am new to deep learning and keras. I am trying to define a layer myself.

I looked into the keras document, The base Layer class

class SimpleDense(Layer):

  def __init__(self, units=32):
      super(SimpleDense, self).__init__()
      self.units = units

  def build(self, input_shape):  # Create the state of the layer (weights)
    w_init = tf.random_normal_initializer()
    self.w = tf.Variable(
        initial_value=w_init(shape=(input_shape[-1], self.units),
                             dtype='float32'),
        trainable=True)
    b_init = tf.zeros_initializer()
    self.b = tf.Variable(
        initial_value=b_init(shape=(self.units,), dtype='float32'),
        trainable=True)

  def call(self, inputs):  # Defines the computation from inputs to outputs
      return tf.matmul(inputs, self.w) + self.b

# Instantiates the layer.
linear_layer = SimpleDense(4)

I understand when I create linear_layer, the __init__ method is called, and when I put inputs into linear_layer, the call method is called. But I don’t get when the build method is called, more specifically, how is input_shape in build method specified? What is the input_shape here? I don’t know when the build method is called so I don’t know what arguments are put in as input_shape argument.

Besides, I want to specify a parameter with a fixed size, which is (1,768) in my case. So in this case, should I still use input_shape in build method?

The second question is that, should I consider the batch dimension in the call method? For example, my input is a 3d array with the first dimension being sample size or batch size, let’s say the input is a 3d array of shape (1000,10,5), which means there are 1000 samples. If I want to transpose the input matrix, i.e. from shape (10,5) to (5,10), should I use tf.transpose(x, perm=[1, 0]), which does not consider the batch dimension, or should I use tf.transpose(x, perm=[0, 2, 1]), which takes consideration of the batch dimension? Besides, I want to do matrix multiplication in the call method (that’s why I transpose the inputs), should I consider batch dimension in tf.matmul?