Oversized Embedding Tables in TensorFlow

rita19991020 · May 7, 2024, 8:14am

Hi
I’m currently exploring options for running on-device inference with TensorFlow and have a question regarding the handling of large embedding tables. In TFLite, it’s possible to store large embedding tables on disk and perform inference using them. However, I’m unclear about how TensorFlow manages large embedding tables for on-device inference.

Does TensorFlow require that all embedding tables be loaded into memory before running inference? If so, what happens when the required memory exceeds the available device memory? Can TensorFlow also support loading embedding tables from disk during inference, or is there a different recommended approach for handling large models on devices with limited memory?

Andras_Takacs · August 26, 2024, 4:06pm

Hello @rita19991020,

TensorFlow typically loads embedding tables into memory for inference. To efficiently manage large models on devices with limited memory, it is recommended to use optimization techniques like quantization or pruning. For more detailed instructions, you can refer to this documentation.

Thank you.

Topic		Replies	Views
After tflite invoke, the tflite model use a larger memory General Discussion models , tflite , help_request	2	1066	March 17, 2022
Model stored on sdcard Micro tflite	2	1455	March 16, 2022
No model size reduction in Tflite model size with integer Quantisation General Discussion models , keras , tflite , model_optimization , help_request	6	2141	July 7, 2021
Batch Inference with tflite General Discussion android , tflite , help_request	3	3137	March 8, 2022
TFlite model too heavy for android General Discussion models , datasets , android , tflite , help_request	1	1402	November 12, 2021

Oversized Embedding Tables in TensorFlow

Related topics