Keras Hub and Universal Sentence Encoder

I have a binary classification model that uses Universal Sentence Encoder as a preprocessing layer to convert email subject lines to fixed-length embeddings. (The layer is trainable so that it can learn from my corpus of training data.) I’m currently loading the Keras layer from TensorFlow Hub, but I wonder if I can load it from Keras Hub.

use_layer = tfh.KerasLayer("", trainable=True)
subject_line_featurizer = tf.keras.Sequential([
  tf.keras.layers.Input(shape=(), dtype=tf.string, name="input_subject_line"),
], name=subject_line_featurizer")

I don’t see Universal Sentence Encoder listed as one of the available models in Keras Hub. Am I missing something?

Also, should I try another prebuilt model for this layer? Gemma, perhaps?

I came across this description of the keras_hub.tokenizers.GemmaTokenizer class. Can we use it as a preprocessing layer to convert sentences to fixed-length embeddings?

Here is some suggested code for creating a layer that outputs embeddings from a Gemma model.

class GemmaEncoder(keras.Layer):

    def __init__(self):
        self.gemma_lm = keras_hub.models.GemmaCausalLM.from_preset("gemma2_2b_en")

    def call(self, inputs):
      preprocessed = self.gemma_lm.preprocessor.generate_preprocess(inputs)
      embeddings = self.gemma_lm.backbone.token_embedding(preprocessed['token_ids'])
      return embeddings

encoder= GemmaEncoder()
encoder(np.array(["i ate a lemon", "i ate an orange"]))

However, I’m having some trouble swapping this layer in for the universal sentence encoder I was previously using.

The universal sentence encoder layer outputs a tensor with shape=(2, 512), but this layer outputs a tensor with shape=(2, 1024, 2304).

My notebook crashes on the Gemma encoder due to an out of memory error.