Hey all,
I am looking at the documentation of MultiHeadAttention
layer. I do not really understand the use of the key_dim
parameter.
In the doc it says:
key_dim
: Size of each attention head for query and key.
Thanks in advance
Hey all,
I am looking at the documentation of MultiHeadAttention
layer. I do not really understand the use of the key_dim
parameter.
In the doc it says:
key_dim
: Size of each attention head for query and key.
Thanks in advance