The MultiHeadAttention documentation states:
When using
MultiHeadAttention
inside a custom layer, the custom layer must implement its ownbuild()
method and callMultiHeadAttention
's_build_from_signature()
there.
Is this guidance up to date? I don’t see this advice implemented in any of the examples using this layer in the Tensorflow documentation, like this one.
If this is up to date, can anyone share an example? The signature of the build method is def build(self, input_shape)
, whereas we have def _build_from_signature(self, query, value, key=None)
for the multi-head attention layer. How do I get values for query
, value
, and key
, from input_shape
?