Oversized Embedding Tables in TensorFlow

Hi
I’m currently exploring options for running on-device inference with TensorFlow and have a question regarding the handling of large embedding tables. In TFLite, it’s possible to store large embedding tables on disk and perform inference using them. However, I’m unclear about how TensorFlow manages large embedding tables for on-device inference.

Does TensorFlow require that all embedding tables be loaded into memory before running inference? If so, what happens when the required memory exceeds the available device memory? Can TensorFlow also support loading embedding tables from disk during inference, or is there a different recommended approach for handling large models on devices with limited memory?

Hello @rita19991020,

TensorFlow typically loads embedding tables into memory for inference. To efficiently manage large models on devices with limited memory, it is recommended to use optimization techniques like quantization or pruning. For more detailed instructions, you can refer to this documentation.

Thank you.