How to implement tf.keras.layers.MultiHeadAttention?

apzk · February 4, 2022, 12:06pm

Hello,

I am trying to analyse 1D vectors using the MultiHeadAttention layer but when I try to implement it into a Sequential model it throws : TypeError: call() missing 1 required positional argument: ‘value’.

Is it possible to implement this layer into a Sequential model or should it be done another way ?

8bitmp3 · February 10, 2022, 6:03pm

Hi @apzk Welcome to the TF Forum! Can you share a code snippet with the input (preprocessed) and the model, or a Colab, and we’ll take a look and try to debug it?

Meanwhile, here’s the Multi-Head Attention implementation from the भाषा की समझ के लिए ट्रांसफार्मर मॉडल | Text | TensorFlow tutorial:

class MultiHeadAttention(tf.keras.layers.Layer):
  def __init__(self, d_model, num_heads):
    super(MultiHeadAttention, self).__init__()
    self.num_heads = num_heads
    self.d_model = d_model

    assert d_model % self.num_heads == 0

    self.depth = d_model // self.num_heads

    self.wq = tf.keras.layers.Dense(d_model)
    self.wk = tf.keras.layers.Dense(d_model)
    self.wv = tf.keras.layers.Dense(d_model)

    self.dense = tf.keras.layers.Dense(d_model)

  def split_heads(self, x, batch_size):
    """Split the last dimension into (num_heads, depth).
    Transpose the result such that the shape is (batch_size, num_heads, seq_len, depth)
    """
    x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
    return tf.transpose(x, perm=[0, 2, 1, 3])

  def call(self, v, k, q, mask):
    batch_size = tf.shape(q)[0]

    q = self.wq(q)  # (batch_size, seq_len, d_model)
    k = self.wk(k)  # (batch_size, seq_len, d_model)
    v = self.wv(v)  # (batch_size, seq_len, d_model)

    q = self.split_heads(q, batch_size)  # (batch_size, num_heads, seq_len_q, depth)
    k = self.split_heads(k, batch_size)  # (batch_size, num_heads, seq_len_k, depth)
    v = self.split_heads(v, batch_size)  # (batch_size, num_heads, seq_len_v, depth)

    # scaled_attention.shape == (batch_size, num_heads, seq_len_q, depth)
    # attention_weights.shape == (batch_size, num_heads, seq_len_q, seq_len_k)
    scaled_attention, attention_weights = scaled_dot_product_attention(
        q, k, v, mask)

    scaled_attention = tf.transpose(scaled_attention, perm=[0, 2, 1, 3])  # (batch_size, seq_len_q, num_heads, depth)

    concat_attention = tf.reshape(scaled_attention,
                                  (batch_size, -1, self.d_model))  # (batch_size, seq_len_q, d_model)

    output = self.dense(concat_attention)  # (batch_size, seq_len_q, d_model)

    return output, attention_weights

yuedi_zhu · September 29, 2022, 8:44am

Topic main hello，I have the same problem. Has your problem been solved?

Topic		Replies	Views
Implement MultiHeadAttention() into an simple Model General Discussion models , help_request	1	1080	September 10, 2024
Adding a transformer layer Keras models , keras	3	916	June 15, 2023
Help needed with TimeDistributed MultiHeadAttention General Discussion api , keras	2	879	April 21, 2023
TypeError: Failed to convert elements of (None, -1, 3, 1) to Tensor. Consider casting elements to a supported type General Discussion api , keras	1	2091	October 2, 2024
Using MultiHeadAttention in custom layers General Discussion api , docs , keras , custom-layer	1	1012	April 21, 2023

How to implement tf.keras.layers.MultiHeadAttention?

Related topics