Hi,
I am running a model on a pluggable device I am developing for a new inference device.
When tracing the calls to the pluggable device, I see tensorflow keeping layer intermediate results for longer than they need to be on the device, with bigger models ending up out of device memory.
And just to make it clear, I am talking about intermediate results that are only used once and never more.
Is it expected behvaviour?
Why doesn’t tensorflow free intermediate results as soon as they are used?
In the kernels compute functions, I tend to use TF_ForwardInputOrAllocateOutput
whenever possible or TF_AllocateOutput
otherwise.
Should I force using reusing the input by calling TF_SetOutput
on the input TF_Tensor *
?