The good way to detect memory violation in tensorflow

Will_Sun · February 5, 2024, 6:06am

Hi there. I’m new to this community. I’m not sure if this topic is proper to raise here. Please forgive me and point me to the correct channel, if not. Many thanks!

Problem

Our project is using tensorflow as backbone and by writing custom op to realize our main functions.
We usually allocate a lot of temporary/output memories, which is managed by tensorflow.
However, operating raw data pointer is very dangerous and fallible.

Question

Does tensorflow provide any ways to help detect memory violation? Like prompting the user that reading/writing outside the memory space of one tensor happens?
Is there any suggestions to detect or prevent memory violation as early as possible?

Kiran_Sai_Ramineni · February 21, 2024, 8:16am

Hi @Will_Sun, Could you please take a look at the Profiling document. Profiling helps to understand the hardware resource consumption (like time and memory) of the various TensorFlow operations (ops) in your model. Thank You.

Will_Sun · February 26, 2024, 7:38am

Hi Kiran, thanks for your reply. I will check profile tool to see if it meets my requirement.
The problem I have is more often related with incorrect result instead of memory resource consumption.
Usually, we might have quite a few run-time variables defined in GPU. When memory violation happens by some operations on one variable, it may crash some other variable space and fail sanity check in some point, eventually. Unfortunately, this sanity check failure usually happens quite late and unpredictable.
That’s really painful. I hope the violation can be found just when it happens. One idea is that I can allocate some more space and write some tags there for memory check later. The problem is that memory allocation is totally handled by tensorflow. I don’t know if tensorflow has provided some inherent solutions.

Topic		Replies	Views
Callback API to trace each operator? TensorFlow api	1	72	July 30, 2024
How to analyze model performance in TF? General Discussion pytorch , tensorflow	2	66	August 20, 2024
How to optimize useless tensors in memory General Discussion keras , memory , help_request , tensorflow	1	1332	July 16, 2024
Call model inference in C/C++ from inputs, allocated in GPU memory General Discussion memory , gpu , tf-model	1	340	October 1, 2024
Profiling Multi-Process TF Sessions General Discussion gpu , help_request	1	1062	September 26, 2024

The good way to detect memory violation in tensorflow

Related topics