TypeError: Failed to convert elements of (None, -1, 3, 1) to Tensor. Consider casting elements to a supported type

I have a custom tensorflow layer which works fine by generating an output. But it throws an error when used with the Keras functional API. Here is the code:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input

# --------- Custom Layer -------
def scaled_dot_product_attention(query, key, value, mask=None):
  key_dim = tf.cast(tf.shape(key)[-1], tf.float32)
  scaled_scores = tf.matmul(query, key, transpose_b=True) / np.sqrt(key_dim)

  if mask is not None:
    scaled_scores = tf.where(mask==0, -np.inf, scaled_scores)

  softmax = tf.keras.layers.Softmax()
  weights = softmax(scaled_scores) 
  return tf.matmul(weights, value), weights

class MultiHeadSelfAttention(tf.keras.layers.Layer):
  def __init__(self, d_model, num_heads):
    super(MultiHeadSelfAttention, self).__init__()
    self.d_model = d_model
    self.num_heads = num_heads

    self.d_head = self.d_model // self.num_heads

    self.wq = tf.keras.layers.Dense(self.d_model)
    self.wk = tf.keras.layers.Dense(self.d_model)
    self.wv = tf.keras.layers.Dense(self.d_model)

    # Linear layer to generate the final output.
    self.dense = tf.keras.layers.Dense(self.d_model)
  
  def split_heads(self, x):
    batch_size = x.shape[0]

    split_inputs = tf.reshape(x, (batch_size, -1, self.num_heads, self.d_head))
    return tf.transpose(split_inputs, perm=[0, 2, 1, 3])
  
  def merge_heads(self, x):
    batch_size = x.shape[0]

    merged_inputs = tf.transpose(x, perm=[0, 2, 1, 3])
    return tf.reshape(merged_inputs, (batch_size, -1, self.d_model))

  def call(self, q, k, v, mask):
    qs = self.wq(q)
    ks = self.wk(k)
    vs = self.wv(v)

    qs = self.split_heads(qs)
    ks = self.split_heads(ks)
    vs = self.split_heads(vs)

    output, attn_weights = scaled_dot_product_attention(qs, ks, vs, mask)
    output = self.merge_heads(output)

    return self.dense(output)

# ----- Testing with simulated data ------- 
x = np.random.rand(1,2,3)
values_emb = MultiHeadSelfAttention(3, 3)(x,x,x, mask = None)
print(values_emb)

This generates the following output:

tf.Tensor(
[[[ 0.50706375 -0.3537539  -0.23286441]
  [ 0.5081617  -0.3548487  -0.23382033]]], shape=(1, 2, 3), dtype=float32)

But when I use it in the Keras functional API it doesn’t work. Here is the code:

x = Input(shape=(2,3))
values_emb = MultiHeadSelfAttention(3, 3)(x,x,x, mask = None)
model = Model(x, values_emb)
model.summary()

This is the error:

TypeError: Failed to convert elements of (None, -1, 3, 1) to Tensor. Consider casting elements to a supported type.

Does anyone know why this happens and how it can be fixed?

1 Like

Hi @Amin_Shn,

Welcome to the forum!

Above error is due to mismatch of tensor shape in the model.I suggest to use tf.shape() for dynamic shape handling in tensors to obtain batch_size and seq_len in split_heads and merge_heads functions, update the input shape to (None, None, 3) for compatiblity to variable batch sizes and sequence lengths. I’ve added this working gist of above with required changes for your reference and Please let us know if any questions.

Thank You.