Browser Crashes with CONTEXT_LOST_WEBGL

Hi, I finally ran out of excuses to learn ML and Tensorflow.js, I’m building my first model and came across an error that I don’t quite understand where is coming from.

I have a basic model that takes an image and calculates a certain value, the input layer is a cropping layer that removes a number of pixels from the top and bottom of the image. The model works well, but when I change the settings of the cropping layer, say, remove 25 pixels from the top instead of 75, the browser (chrome) flickers and outputs the following error:

NOTE: Right before the above error, it prints out the following message *Couldn't parse line number in error* followed by what appears to be GLSL code.

The same happens if I remove the cropping layer altogether.

I’m using tfjs v3.8.0 but also tested with v2.0.0 with similar outcomes. This is my model:

const model = tf.sequential();

// Cropping Layer
model.add(
  tf.layers.cropping2D({
    // If I change 75 to anything below 50, it crashes before completing the first epoch,
    // If this layer is removed, it crashes almost immediately after training starts
    cropping: [
      [75, 25],
      [0, 0]
    ],
    // image height, width, depth
    inputShape: [160, 320, 3]
  })
);

model.add(
  tf.layers.conv2d({
    filters: 16,
    kernelSize: [3, 3],
    strides: [2, 2],
    activation: 'relu',
  })
);

model.add(
  tf.layers.maxPool2d({
    poolSize: [2, 2]
  })
);

model.add(
  tf.layers.conv2d({
    filters: 32,
    kernelSize: [3, 3],
    strides: [2, 2],
    activation: 'relu'
  })
);

model.add(
  tf.layers.maxPool2d({
    poolSize: [2, 2]
  })
);

model.add( tf.layers.flatten());
model.add( tf.layers.dense({ units: 1024, activation: 'relu' }));
model.add( tf.layers.dropout({ rate: 0.25 }));
model.add( tf.layers.dense({ units: 128, activation: 'relu' }));
model.add( tf.layers.dense({ units: 1, activation: 'linear' }));

model.compile({
  optimizer: 'adam',
  loss: 'meanSquaredError',
  metrics: [
    'accuracy',
  ],
});

Am I doing something obviously wrong?

Thanks.

Welcome to the forum and thank you for sharing your work. Let me look into this with our team to find out what may be going on here. You may have found a bug. In the meantime would you be able to share with us a live working (or not working in this case) demo of the bug eg on Codepen.io or Glitch.com so we can inspect it live?

Thank you!

Thanks @Jason,

After more testing, I realized the error might be related to the fact that I’m using a Mac computer with M1 chip ( It is strange though it happens after a few iterations and not immediately). I tested it on a different apple machine (Quad-Core Intel Core i7) and it did not show up.

You can find the sample code here: Glitch :・゚✧ . The bug shows up when you click on the “Train Without Cropping” button.

It would be great if someone out there with an M1 chip could confirm.

Regards,

a.

I will try and hunt down someone with an M1 chip to see if they can replicate the issue. Thanks for reporting! If you happen to replicate issue before me on an M1 machine please do open a bug on our github to track here: Sign in to GitHub · GitHub and feel free to reference this forum discussion as context.

Answering my own question just in case it helps anyone else.

The problem turned out to be my computer running out of GPU memory. I managed to replicate the same issue in a windows machine with a Radeon RX 570 graphics card.

Not enough memory explains why removing the cropping layer or reducing the number of pixels cropped was triggering the issue, it looks like I was right at the limit of how much memory I had available and just a few pixels were making the difference.

Interestingly enough, monitoring the output of tf.memory() did not help since the problem is NOT the tensors created by the application but by TFjs internally, so printing out these stats at every batchEnd or epochEnd did not show any abnormality.

A couple of ways I managed to get around the problem was to reduce the batchSize which limited the number of tensors allocated by my application and also to reduce the inputShape to ingest a smaller image.

Hope that helps someone! It took me weeks to track it down.

Cheers.

1 Like

Thanks for following up. Is this answer also the answer to the other thread we have going about GPU memory usage? I assume these are related?

@Jason, yes, they are definitely related. I just posted a follow-up there.

I guess we can “mark” this one as “solved” :wink:.

Thanks.

1 Like

Wonderful thank you!

can anyone help me how to reduce the batchsize, I am getting the same error “CONTEXT_LOST_WEBGL” tfjs:17 ??

Please consider checking my tutorials in my free course here that explain how to set batch size etc:

https://goo.gle/learn-tfjs