When using embed_content to request embeddings, there is a task_type argument for which to specify - RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING.
Unless I’m mistaken, an embedding is an embedding is an embedding – I’m not sure why it cares how I intend to use it/them?
This cookbook example: Google Colab uses CLUSTERING to achieve the task result. I deliberately changed that to other options, and it obviously doesn’t work as well (it’s obvious from the visualization). So there is some value in having this knob to control the effects of embedding. The documentation could be a bit more helpful if there was more description than the half-line text accompanying the type definition.
Unless I’m mistaken, an embedding is an embedding is an embedding – I’m not sure why it cares how I intend to use it/them?
Generally speaking this is true, but these models were trained using the task as an input during the training process so from a strictly technical POV there are different inputs, so you should expect different outputs. Wikipedia covers this kind of approach as it was used in the T5 models.
One example of where the difference is clear is in RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT. A document is typically quite long (e.g. a whole Wiki page), while a query is typically a few words (e.g. t5 model), maybe a sentence (e.g. how do task embeddings work), but the embeddings need to be geometrically similar so that a query can be used to retrieve a document.