Keras Transformer Implementation Error

Dear All,

I am new to implement Transformer with Keras, I could not figure out the solution for this following error:

Here is my input tokens:

(base) $ cat trainingSet_7mer_1.token.txt
18,0,11,10,7,14,2
18,0,1,6,7,18,2
13,0,11,15,7,13,2
18,0,1,12,7,5,2
10,15,11,9,7,10,2
10,15,11,9,7,5,2
13,0,1,16,7,3,2
13,0,11,16,7,2,2
13,15,1,10,2,10,2
0,15,6,13,12,16,13
13,15,1,0,7,3,2
18,15,11,0,7,5,2
13,15,1,0,7,13,2
13,15,1,18,7,15,2
18,0,1,5,2,16,2
12,19,15,16,2,7,19
0,11,15,16,2,7,9
19,7,15,0,8,2,17
18,15,11,5,2,16,2
10,0,1,15,7,3,2
9,10,15,16,2,7,9
19,13,15,16,2,7,9
10,10,15,16,2,7,1
18,0,1,5,7,5,2
18,15,1,15,7,9,2
10,15,11,19,7,3,2
18,15,11,0,2,16,2
7,6,15,3,1,11,17
10,15,11,9,7,19,2
10,0,11,15,7,5,2
18,0,11,15,7,5,2
18,15,1,19,2,6,2
18,15,11,2,2,8,2
10,18,15,16,2,7,9
18,0,1,5,7,12,2
10,15,11,19,7,15,2
6,12,7,7,7,15,17
13,0,11,15,7,8,2
18,0,11,15,2,12,2
18,0,11,0,2,12,2
3,10,15,8,0,11,17
7,6,7,9,19,1,17
18,15,11,7,2,3,2
18,0,1,19,7,5,2
15,19,15,16,2,7,1
18,15,11,8,2,19,2
18,0,1,0,2,19,2
10,8,11,10,16,19,15
10,0,11,5,7,13,2
13,19,15,16,2,7,9
13,15,1,9,2,3,2
13,0,1,0,7,12,2
7,15,7,9,5,2,17
18,19,15,16,2,7,19
10,15,11,0,7,13,2
18,15,11,7,2,12,2
18,0,11,16,7,5,2
16,1,2,7,2,9,0
5,19,15,16,2,7,19
13,8,15,16,2,7,9
16,1,9,7,2,2,0
13,19,1,15,3,9,11
13,15,11,16,7,14,2
10,15,11,10,7,14,2
(base) $ python ~/bin/train_transformer2.py 7 20 trainingSet_7mer_1.token.txt 64 trainingSet_7mer_1.value.txt testSet_7mer_obs_1.token.txt 64 testSet_7mer_obs_1.value.txt testSet_7mer_obs_1.value.predicted.txt 64 150 8 0.25 0.004 transformer.1
2025-01-12 19:13:26.420643: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-12 19:13:26.458060: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-12 19:13:26.458352: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-12 19:13:27.326611: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[[18 0 11 10 7 14 2]
[18 0 1 6 7 18 2]
[13 0 11 15 7 13 2]
[18 0 1 12 7 5 2]
[10 15 11 9 7 10 2]
[10 15 11 9 7 5 2]
[13 0 1 16 7 3 2]
[13 0 11 16 7 2 2]
[13 15 1 10 2 10 2]
[ 0 15 6 13 12 16 13]
[13 15 1 0 7 3 2]
[18 15 11 0 7 5 2]
[13 15 1 0 7 13 2]
[13 15 1 18 7 15 2]
[18 0 1 5 2 16 2]
[12 19 15 16 2 7 19]
[ 0 11 15 16 2 7 9]
[19 7 15 0 8 2 17]
[18 15 11 5 2 16 2]
[10 0 1 15 7 3 2]
[ 9 10 15 16 2 7 9]
[19 13 15 16 2 7 9]
[10 10 15 16 2 7 1]
[18 0 1 5 7 5 2]
[18 15 1 15 7 9 2]
[10 15 11 19 7 3 2]
[18 15 11 0 2 16 2]
[ 7 6 15 3 1 11 17]
[10 15 11 9 7 19 2]
[10 0 11 15 7 5 2]
[18 0 11 15 7 5 2]
[18 15 1 19 2 6 2]
[18 15 11 2 2 8 2]
[10 18 15 16 2 7 9]
[18 0 1 5 7 12 2]
[10 15 11 19 7 15 2]
[ 6 12 7 7 7 15 17]
[13 0 11 15 7 8 2]
[18 0 11 15 2 12 2]
[18 0 11 0 2 12 2]
[ 3 10 15 8 0 11 17]
[ 7 6 7 9 19 1 17]
[18 15 11 7 2 3 2]
[18 0 1 19 7 5 2]
[15 19 15 16 2 7 1]
[18 15 11 8 2 19 2]
[18 0 1 0 2 19 2]
[10 8 11 10 16 19 15]
[10 0 11 5 7 13 2]
[13 19 15 16 2 7 9]
[13 15 1 9 2 3 2]
[13 0 1 0 7 12 2]
[ 7 15 7 9 5 2 17]
[18 19 15 16 2 7 19]
[10 15 11 0 7 13 2]
[18 15 11 7 2 12 2]
[18 0 11 16 7 5 2]
[16 1 2 7 2 9 0]
[ 5 19 15 16 2 7 19]
[13 8 15 16 2 7 9]
[16 1 9 7 2 2 0]
[13 19 1 15 3 9 11]
[13 15 11 16 7 14 2]
[10 15 11 10 7 14 2]]
Traceback (most recent call last):
File “/home/kggx609/bin/train_transformer2.py”, line 331, in
main(args)
File “/home/kggx609/bin/train_transformer2.py”, line 292, in main
model = create_transformer_model((time_steps) , feature_channels, time_steps, number_of_heads, d_k, d_v, d_model, d_ff, n, dropout)
File “/home/kggx609/bin/train_transformer2.py”, line 222, in create_transformer_model
x = Encoder(enc_vocab_size, input_seq_length, h, d_k, d_v, d_model, d_ff, n, dropout_rate)(inputs)
File “/home/kggx609/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/scratch/tmp_user_data/kggx609/autograph_generated_file359jy56n.py", line 26, in tf__call
ag
.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.ld(self).encoder_layer,), None, fscope), None, loop_body, get_state, set_state, (‘x’,), {‘iterate_names’: ‘(i, layer)’})
File "/scratch/tmp_user_data/kggx609/autograph_generated_file359jy56n.py", line 23, in loop_body
x = ag
.converted_call(ag__.ld(layer), (ag__.ld(x), None, True), None, fscope)
File "/scratch/tmp_user_data/kggx609/autograph_generated_fileb0k9akbt.py", line 10, in tf__call
multihead_output = ag
.converted_call(ag__.ld(self).multihead_attention, (ag__.ld(x), ag__.ld(x), ag__.ld(x), ag__.ld(padding_mask)), None, fscope)
File "/scratch/tmp_user_data/kggx609/autograph_generated_filejshn3x2p.py", line 13, in tf__call
o_reshaped = ag
.converted_call(ag__.ld(self).attention, (ag__.ld(q_reshaped), ag__.ld(k_reshaped), ag__.ld(v_reshaped), ag__.ld(self).d_k, ag__.ld(mask)), None, fscope)
File "/scratch/tmp_user_data/kggx609/autograph_generated_file9753gee4.py", line 10, in tf__call
scores = (ag
.converted_call(ag__.ld(tf).matmul, (ag__.ld(queries), ag__.ld(keys)), dict(transpose_b=True), fscope) / ag__.converted_call(ag__.ld(math).sqrt, (ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(d_k), ag__.ld(np).float16), None, fscope),), None, fscope))
TypeError: Exception encountered when calling layer “encoder” (type Encoder).

in user code:

File "/home/kggx609/bin/train_transformer2.py", line 216, in call  *
    x = layer(x, None, True)
File "/home/kggx609/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
    raise e.with_traceback(filtered_tb) from None
File "/scratch/tmp_user_data/kggx609/__autograph_generated_fileb0k9akbt.py", line 10, in tf__call
    multihead_output = ag__.converted_call(ag__.ld(self).multihead_attention, (ag__.ld(x), ag__.ld(x), ag__.ld(x), ag__.ld(padding_mask)), None, fscope)
File "/scratch/tmp_user_data/kggx609/__autograph_generated_filejshn3x2p.py", line 13, in tf__call
    o_reshaped = ag__.converted_call(ag__.ld(self).attention, (ag__.ld(q_reshaped), ag__.ld(k_reshaped), ag__.ld(v_reshaped), ag__.ld(self).d_k, ag__.ld(mask)), None, fscope)
File "/scratch/tmp_user_data/kggx609/__autograph_generated_file9753gee4.py", line 10, in tf__call
    scores = (ag__.converted_call(ag__.ld(tf).matmul, (ag__.ld(queries), ag__.ld(keys)), dict(transpose_b=True), fscope) / ag__.converted_call(ag__.ld(math).sqrt, (ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(d_k), ag__.ld(np).float16), None, fscope),), None, fscope))

TypeError: Exception encountered when calling layer 'encoder_layer' (type EncoderLayer).

in user code:

    File "/home/kggx609/bin/train_transformer2.py", line 175, in call  *
        multihead_output = self.multihead_attention(x, x, x, padding_mask)
    File "/home/kggx609/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
        raise e.with_traceback(filtered_tb) from None
    File "/scratch/tmp_user_data/kggx609/__autograph_generated_filejshn3x2p.py", line 13, in tf__call
        o_reshaped = ag__.converted_call(ag__.ld(self).attention, (ag__.ld(q_reshaped), ag__.ld(k_reshaped), ag__.ld(v_reshaped), ag__.ld(self).d_k, ag__.ld(mask)), None, fscope)
    File "/scratch/tmp_user_data/kggx609/__autograph_generated_file9753gee4.py", line 10, in tf__call
        scores = (ag__.converted_call(ag__.ld(tf).matmul, (ag__.ld(queries), ag__.ld(keys)), dict(transpose_b=True), fscope) / ag__.converted_call(ag__.ld(math).sqrt, (ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(d_k), ag__.ld(np).float16), None, fscope),), None, fscope))

    TypeError: Exception encountered when calling layer 'multi_head_attention' (type MultiHeadAttention).

    in user code:

        File "/home/kggx609/bin/train_transformer2.py", line 124, in call  *
            o_reshaped = self.attention(q_reshaped, k_reshaped, v_reshaped, self.d_k, mask)
        File "/home/kggx609/.local/lib/python3.8/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
            raise e.with_traceback(filtered_tb) from None
        File "/scratch/tmp_user_data/kggx609/__autograph_generated_file9753gee4.py", line 10, in tf__call
            scores = (ag__.converted_call(ag__.ld(tf).matmul, (ag__.ld(queries), ag__.ld(keys)), dict(transpose_b=True), fscope) / ag__.converted_call(ag__.ld(math).sqrt, (ag__.converted_call(ag__.ld(tf).cast, (ag__.ld(d_k), ag__.ld(np).float16), None, fscope),), None, fscope))

        TypeError: Exception encountered when calling layer 'dot_product_attention' (type DotProductAttention).

        in user code:

            File "/home/kggx609/bin/train_transformer2.py", line 68, in call  *
                scores = tf.matmul(queries, keys, transpose_b=True) / math.sqrt(tf.cast(d_k, np.float16))

            TypeError: must be real number, not Tensor


        Call arguments received by layer 'dot_product_attention' (type DotProductAttention):
          • queries=tf.Tensor(shape=(None, 8, 7, None), dtype=float32)
          • keys=tf.Tensor(shape=(None, 8, 7, None), dtype=float32)
          • values=tf.Tensor(shape=(None, 8, 7, None), dtype=float32)
          • d_k=64
          • mask=None


    Call arguments received by layer 'multi_head_attention' (type MultiHeadAttention):
      • queries=tf.Tensor(shape=(None, 7, 512), dtype=float32)
      • keys=tf.Tensor(shape=(None, 7, 512), dtype=float32)
      • values=tf.Tensor(shape=(None, 7, 512), dtype=float32)
      • mask=None


Call arguments received by layer 'encoder_layer' (type EncoderLayer):
  • x=tf.Tensor(shape=(None, 7, 512), dtype=float32)
  • padding_mask=None
  • training=True

Call arguments received by layer “encoder” (type Encoder):
• input_sentence=tf.Tensor(shape=(None, 7), dtype=float32)

Inputs are token of 20 dim from 0 to 19, it is of shape(64,7,20) of type np.int8, I appreciate your help, thanks in advance!

Hi @QichengMa, If possible could you please provide the stand alone code to reproduce the issue. And also let us know the tensorflow and keras version you are using. Thank You.