Hi
I’m currently exploring options for running on-device inference with TensorFlow and have a question regarding the handling of large embedding tables. In TFLite, it’s possible to store large embedding tables on disk and perform inference using them. However, I’m unclear about how TensorFlow manages large embedding tables for on-device inference.
Does TensorFlow require that all embedding tables be loaded into memory before running inference? If so, what happens when the required memory exceeds the available device memory? Can TensorFlow also support loading embedding tables from disk during inference, or is there a different recommended approach for handling large models on devices with limited memory?