Call model inference in C/C++ from inputs, allocated in GPU memory

Dear all,

I’d like to use TF-model in scientific simulation code, written in C++. This code allow to run simulation on GPU, so all necessary input data could be already placed on GPU.

In order to call TF model, I’m planning to use TF_NewTensor

Now my question is: Is it possible to control, where to place TF_Tensor? Can I just wraped it around existing on-GPU array to avoid CPU-to-GPU- memory transfer?

Thank you very much in advance!

1 Like

Hi @Yury_Lysogorskiy ,

  • TF_NewTensor doesn’t allow direct control over tensor placement (CPU vs GPU).
  • Standard TensorFlow C API doesn’t provide a way to wrap existing GPU arrays without memory transfer.

Possible alternatives:

  • Create a TensorFlow custom device
  • Use CUDA-aware TensorFlow builds for CUDA integration
  • Develop a custom TensorFlow operation

For your use case, CUDA integration or custom ops might be the most promising approaches to
interface your GPU-resident data with TensorFlow.

Thank You .