`key_dim` in multihead attention layer

ariG23498 · October 6, 2021, 8:36pm

Hey all,

I am looking at the documentation of MultiHeadAttention layer. I do not really understand the use of the key_dim parameter.

In the doc it says:

key_dim: Size of each attention head for query and key.

Thanks in advance

Kiran_Sai_Ramineni · September 11, 2024, 12:36pm

Hi @ariG23498, The key_dim is the dimension of key for each head, represents how much vector length does each head process. The key_dim should equal to embed_dim /head_num . So, if we want to have a head_num of 5, the key_dim has to be 2, if embedding_dim is 10. Thank You.

Topic		Replies	Views
Implement MultiHeadAttention() into an simple Model General Discussion models , help_request	1	951	September 10, 2024
How to implement tf.keras.layers.MultiHeadAttention? Keras api , help_request	2	4573	September 29, 2022
MultiHeadAttention output shape & use_causal_mask General Discussion api , keras	1	402	September 25, 2024
Using MultiHeadAttention in custom layers General Discussion api , docs , keras , custom-layer	1	979	April 21, 2023
Issue facing in multihead attention Keras issues , tensorflow	1	107	May 7, 2024

`key_dim` in multihead attention layer

Related topics