Perform TimeDistributed step by step to save VRAM

Niklas · January 19, 2022, 4:08pm

We are applying TensorFlow/Keras to analyse data obtained in an astroparticle physics experiment that consists of a grid of detectors. The input of our network has a shape of (n, n, 360, embed_dim), where n*n is the size of our grid.
In the first step, we are using two TimeDistributed layers to individually analyse the signals on the detector level, followed by a convolution over all detectors to combine the results.
Increasing n, we found that when we use transformers, the necessary VRAM exceeds the 24 GB that our GPU cluster can provide, as TensorFlow seems to perform all of the n*n individual detector-analyses at once, resulting in huge tensors for large values of n.
As the operations are independent of each other, is it possible to force TensorFlow to calculate the operations step by step instead?

Bhack · January 19, 2022, 6:18pm

What TF version are you using?

Niklas · January 19, 2022, 7:02pm

We are using version 2.4.1

Bhack · January 19, 2022, 7:04pm

Can you test on the last TF version?

Niklas · January 19, 2022, 7:22pm

Thank you for your answer. The behavior is the same using TensorFlow version 2.7.0. We assumed the behavior is intended, to speed up the calculations.

Topic		Replies	Views
Parallelising custom function in tensorflow using graph execution General Discussion tfrt , gpu , xla , help_request , tfcore	13	2901	January 21, 2022
Grouped transposed convolution? General Discussion	5	1730	May 27, 2021
Simple benchmarking of model predication times General Discussion models , help_request	5	470	August 20, 2021
Multiplication of outputs from two Conv2D layers General Discussion help_request	8	1238	July 28, 2021
CNN net much slower than DNN General Discussion models , keras , gpu , help_request	9	2528	October 30, 2021

Perform TimeDistributed step by step to save VRAM

Related topics