Spektral GNN Model Training Freezes/Hangs After 1st Epoch - Please, Help Me

Pedro_Satorre · October 6, 2023, 5:04am

Please, I need urgent help!

Hello there! I am writing here because I need urgent help… If anyone would know how to solve the following issue, I would be unmeasurably grateful!

I am working on my Data Science master’s thesis, and I am coming across a very annoying training issue, which leaves me clueless on what is going wrong in the code:

→ My Spektral GNN model trains the 1st epoch (of the training set within the inner loop of my Nested Stratified Group K-Fold Cross-Validation strategy), but then the training process freezes/hangs even though the code of cell keeps running for hours & hours, without jumping into training the 2nd epoch, nor providing any update on the output of the cell of code (even though I positioned multiple print statements at strategical locations of the code to understand how far in is the execution of the code), nor raising any error message (as you will be able to see in the screenshots I will provide below, specifically the very last screenshot shows how the output looks after tens of hours of the code running), until I interrupt and stop the kernel.

(Spektral is a GNN library built on top of Keras & TensorFlow)

The context of my project:

I have soccer tracking data, which tracks the position, speed, velocity and many other features of each player and the ball. This tracking data is recorded at 25 frames (i.e. rows of data on a Pandas DataFrame) per second played of the soccer match. I have tracking data for 361 soccer matches.

My aim is to use a GNN to predict a binary target variable, which states whether a goal will be scored within the next 10 seconds or not (1 = goal within next 10s, 0 = no goal within next 10s). Due to the nature of this sport, my data is very much imbalanced (many more 0s than 1s in my target variable), as only some goals are scored per match, and sometimes no goals are scored at all.

I have converted each frame/row of data in the DataFrame into Graph data using the NetworkX library and then saved all this raw graph data into multiple pickle files (multiple, due to RAM memory constraints). Then I re-load these pickle files and concatenate them together in the right order to preserve the original order of the data, then I transform each graph data into a Spektral Graph object, and then convert the list of all these Spektral Graph objects into a Spektral Dataset container object in Mixed Mode, with variable name Soccer_Tracking_Graph_Mixed_Data_Mode in my code.

I am working via the Anaconda Desktop App on JupyterLab (with Python 3.9.16) on my laptop (because I have not been allowed to use the cloud servers from the soccer club I am doing my thesis for):

Processor: 11th Gen Intel(R) Core™ i7-1185G7 @ 3.00GHz 3.00 GHz
RAM: 16GB
64-bit operating system, x64-based processor
Windows 11 Home, Version 22H2

This Spektral Dataset has the following properties:

There are 675 127 fully connected graphs inside the dataset
Each Graph is fully connected → Since dataset is in Mixed Data Mode: there is only 1 global adjacency matrix (to represent the adjacency matrices of all individual graphs)
Each Graph has 23 nodes (index 0 = Ball, indices 1-22 = 22 Players of both teams, 1-11 = Players of Home Team, 12-22 = Players of Away Team)
Each Graph has 19 different node-features per node in the node-features’ matrix
Each Graph has 253 edges
Each edge connecting 2 nodes has 2 different edge-features in each Graph
In each Graph there is one global/graph target variable (as explained eariler)

Screenshots Showing the Spektral Dataset Info:

Screenshot Showing Each Individual Graph’s Info (Within the Dataset):

Screenshots Showing Some User-Defined Functions Used Throughout My Code:

Screenshots Showing the Rest of My Code:

Apart from the main issue I have - if you find any other mistakes or errors in my code, absolutely any feedback is immensely appreciated!

Please, if you can and are able to, please help me out - I have been unsuccessfully trying to solve this issue all this last month and this is basically my last resource/bullet.

I only have a month or so left of practical work before the deadline of my thesis.

Renu_Patel · October 6, 2023, 7:25am

Hi @Pedro_Satorre

Welcome to the TensorFlow Forum!

The given code description is difficult to understand. Please share the minimal reproducible code in the code format not in the screenshot to replicate and understand the issue. Thank you.

Topic		Replies	Views
Spektral GNN Model Training Freezes/Hangs After 1st Epoch - Please, Help Me 🙏 General Discussion models , help_request	1	456	October 7, 2023
Fails when the dataframe has over 60 000 rows General Discussion help_request	4	686	January 12, 2023
Graph execution error: General Discussion models , graph	5	722	November 11, 2023
Tensorflow Freezes during a lengthy backtest General Discussion help_request , tfcompat	2	88	June 18, 2024
Input 0 of layer sequential_3 is incompatible with the layer General Discussion datasets , help_request	2	2871	June 23, 2022