Google Colab Crashing when trying to compute second order derivatives

Aravinth_Krishnan · March 28, 2024, 4:07am

I am using google colab to implement a Physics-Informed Neural Network. I upgraded my Colab to System RAM of 12.7 GB and Disk of 107.7 GB. However, when computing the loss function, I am getting the following error: Your session crashed after using all available RAM.

This is my code:

`
def compute_loss(model, grid_points, real_OSM_values_Circle_tensor):
#Compute physics based loss
physics_based_loss = tf.reduce_mean(tf.square(get_residual(model,grid_points)))

#initialise loss
loss = physics_based_loss

#Add physics_based_loss and data loss
for i in range(len(real_OSM_values_Circle_tensor)):
u_pred = model(grid_points[i:i+1,:])
loss += tf.reduce_mean(tf.square(u_pred - real_OSM_values_Circle_tensor[i]))
return loss

model = init_model()
#compute_loss(model, grid_points, real_OSM_values_Circle_tensor)

def get_grad(model,grid_points):
with tf.GradientTape(persistent=True) as tape:
#This tape is for derivatives with respect to trainable variables
tape.watch(model.trainable_variables)
loss = compute_loss(model, grid_points, real_OSM_values_Circle_tensor)
g = tape.gradient(loss, model.trainable_variables)
del tape

return loss, g

lr = 0.001

Choose the optimizer

optim = tf.keras.optimizers.Adam(learning_rate=lr)

Define one training step as a TensorFlow function to increase speed of training

@tf.function
def train_step():

Compute current loss and gradient w.r.t. parameters

loss, grad_theta = get_grad(model,grid_points)

Perform gradient descent step

optim.apply_gradients(zip(grad_theta, model.trainable_variables))

return loss

Number of training epochs

N = 1
hist =

Start timer

t0 = time()

for i in range(N+1):

loss = train_step()

Append current loss to hist

hist.append(loss.numpy())
“”"

Output current loss after 50 iterates

if i%50 == 0:
print(‘It {:05d}: loss = {:10.8e}’.format(i,loss))
“”"
print(‘It {:05d}: loss = {:10.8e}’.format(i,loss))

Print computation time

print(‘\nComputation time: {} seconds’.format(time()-t0))
`

When I try initialising a model and running it, the code seems to get stuck on line loss = train_step(). I tried setting up a really simple NN and the number of epochs to 1 just to ensure the code runs but even then I am getting the same error. Should I upgrade the RAM or is there a way to make my code more efficient (len(real_OSM_values_Circle_tensor) = 22500).

This is my first time in this forum, so please let me know if this is not the appropriate place and where can I post this question. Thank you!

Topic		Replies	Views
Running out of GPU memory in custom training loop General Discussion memory , gpu	6	847	January 20, 2024
Applying TF GradientTape to custom loss function and running training TensorFlow models , help_request , tfcore	0	1203	September 28, 2022
I am getting this error kindly help me to resolve General Discussion tfkeras , python , train-function	1	119	May 6, 2024
Getting retracing error General Discussion	1	636	January 14, 2024
Problems with training a model on a dataset that doesn't fit into RAM memory General Discussion python , tfcore , tensorflow-data , tf_function	3	972	November 29, 2023