Tensorflow crashing before running the first epoch

TB_4240 · September 15, 2023, 3:47am

I’m very new to tensorflow and also using linux, so would greatly appreciate explanations that assume basically no prior knowledge. I am having this issue where once I start running the code it will say epoch 1/n and after a few minutes will crash(???) and then the kernel will restart. I have tried several times in vain to fix this issue and have even uninstalled and reinstalled tensorflow, but I think I have narrowed down where the issue could be coming from. I believe it’s either an issue with my installations of tensorflow or a related package or it’s an issue with the hardware of my laptop (but I could be wrong about that, as I said, I’m very new). I don’t believe it’s an issue with the code as I have taken code directly from online examples, as I first wanted to test that the packages worked for me. The two sets of code are from this site and the RNN example on the tensorflow website.

I am using tensorflow with GPU capabilities, as it seemed like that was the correct option for me to use. My GPU is a nvidia GeForce GTX 1650 Ti, which I believe fits the specs of what tensorflow requires, and I am running linux through WSL.

On some of my more recent attempts at running the code also produced an error “IOStream.flush timed out” which, from the searching around I did, is something to do with how much stuff gets cached at once and it not liking having to do too much. However, I couldn’t see how to implement a fix into the code if that was the issue. I also get the messages about not having NUMA support which seems from the searching I did to not be anything to worry about. I also have the warning about it not being able to locate tensorrt, but I am unsure if that is even a problem or not.

Any help or guidance with this issue would be greatly appreciated, and I am happy to provide more information if I have left anything important out.

tagoma · September 15, 2023, 10:04am

Hi @TB_4240 & Welcome to the Tensorflow forum.
It seems that you are running your code in a notebook. Do you get any error message? Any hints in the prompt?
Also did you try run this model still on your machine but in alternative way, e.g. from python .py file, or a code editor?

TB_4240 · September 16, 2023, 10:06pm

Sorry, it completely slipped my mind to say this in the original post, but in both cases I copied the code into two .py files and have been running them in spyder instead of running them in notebooks. The error messages I get are just “restarting kernel” and “IOStream.flush” timed out, plus the warnings about NUMA support and not being able to find tensorrt. I can also run it again and attach screenshots of the kernel if that would be beneficial?

And while I’m adding things I forgot to mention, I have already experimented with reducing the batch size with no success, it still seems to crash every time.

tagoma · September 17, 2023, 7:53am

Hi I’m not familiar with Spyder but your issue seems to relate to ipython kernels.
Can’t you modify slightly your code to run it as an executable?
Do you have a multithreading loop somewhere in your code where you can get your code sleeping say for 1 second?

TB_4240 · September 18, 2023, 5:30pm

What would running it as an executable do differently? I’m sure I can figure out how to make it do that, but I am unsure as to why it would make a difference.

There’s no explicit multithreading loop that I can see, so if there is one I’m either missing it or it’s implicit in something else. How would I go about implementing one and what difference would it hopefully make?

Thank you for trying to help me with this.

tagoma · September 18, 2023, 6:42pm

Hi @TB_4240. The idea is identfying the origin of your issue. Spyder (if I understand well) relies on Ipython that is say an interface for the Python language. If you run successfully your code contained in a .py file (that is using directly using Python interpreter on your machine), then you’ll know the issue is Spyder (for whatever reason). I mean the basic idea is getting of every additional “layer” and get the programming running closer to Python core. Just an idea. FWIW.
Anyways hopefully you’ll get your code running in the end.

Topic		Replies	Views
The kernel appears to have died. It will restart automatically. whenever i try to run the plt.imshow() and plt.show() function in jupyter notebook Keras keras	2	350	May 12, 2024
Entire computer crashes when running tensorflow code TensorFlow models , tensorflow	2	71	November 16, 2024
Conv1d causing system to crash TensorFlow models , datasets , tfdata , gpu	2	504	November 7, 2023
Spektral GNN Model Training Freezes/Hangs After 1st Epoch - Please, Help Me 🙏 General Discussion models , help_request	1	452	October 7, 2023
Cannot run on Nvidia GPU General Discussion nlp , keras , gpu , windows , help_request	9	6167	February 2, 2022

Tensorflow crashing before running the first epoch

Related topics