Spektral GNN Model Training Freezes/Hangs After 1st Epoch - Please, Help Me 🙏

Please, I need urgent help! :sos::pray:

I need urgent help… If anyone would know how to solve the following issue, I would be unmeasurably grateful!

I’m working on my Data Science master’s thesis, & I am coming across a very annoying training issue, which leaves me clueless on what is going wrong in the code:

→ My Spektral GNN model trains 1st epoch (of training set in inner loop of CV strategy), but then training process freezes/hangs even though the code of cell keeps running for hours & hours, without jumping into training 2nd epoch, nor providing any update on the output (even though I positioned multiple print statements at strategical locations of the code to understand how far in is the execution of the code), nor raising any error message, until I interrupt & stop the kernel.


Code From the Python Script Module With User-Defined Functions:

# UNIVERSAL IMPORTS USED THROUGHOUT THE MODULE #

import numpy as np
import pandas as pd
import pickle
import random
import os
import gc
from spektral.data import Dataset, Graph
from spektral.layers import CrystalConv, GlobalAvgPool
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.initializers import GlorotUniform

import Thesis_Spektral_GNN_Model_Functions as TSGNNFs


# DATA-RELATED FUNCTIONS #

def Read_All_Raw_Graph_Data_From_Pickle_Files( num_files, spektral_data_mode = "Mixed Mode" ):
    """
    Function That Reads-In the Raw Graph Data Created From the DataFrame Rows From All 19 Pickle `.pkl` Files
    Input: num_files = Number of Pickle `.pkl` Files To Read-In
    Output:  = Combined list of raw graph data
    """
    List_of_Raw_Graph_Pickle_Data = []
    for File_Num in range(1, num_files + 1):
        if spektral_data_mode == "Disjoint Mode":
            File_Name = f"Graph-Format Data/Final_Graph_Data_4_GCN_Spektral_Model_{File_Num}.pkl"
        if spektral_data_mode == "Mixed Mode":
            File_Name = f"Graph-Format Data/Final_Graph_Data_4_GCN_Spektral_Model_{File_Num}_Mixed_Mode.pkl"
        if os.path.exists(File_Name):
            with open(File_Name, 'rb') as f:
                Chunk_Data = pickle.load(f)
            List_of_Raw_Graph_Pickle_Data.extend(Chunk_Data)
            del Chunk_Data
            gc.collect()
            print(f"Data For Chunk #{File_Num}  -->  Successfully Concatenated", '\n')
            print("_____", '\n')
            if File_Num == 19:
                print("FINISHED", '\n')
        else:
            print(f"WARNING: '{File_Name}' Not Found")
    return List_of_Raw_Graph_Pickle_Data

class Soccer_Tracking_Graph_Dataset( Dataset ):
    """
    Class That Loads Soccer/Football Tracking Raw Graph-Data From a Pickle `.pkl` File & Converts It Into a List of Spektral Graph Objects In a Spektral Mixed Data Mode Structure
         Every Graph Object Within the List Represents a Snapshot of a Frame of a Match (i.e. a Row of Tracking DataFrame Data)
    """
    def __init__( self, num_files = None, list_of_graphs = None, sublist_of_graphs = False, global_adjacency_matrix = None, **kwargs ):
        self.Num_Files = num_files
        self.List_of_Graphs = list_of_graphs
        self.SubList_of_Graphs = sublist_of_graphs
        super().__init__( **kwargs )
        # If the List of Graphs Is NOT a Sub-List/Filtered List of Graphs --> Initialize a Single Same Adjacency Matrix For the Entire Dataset  -->  Converting From Disjoint Data Mode Into a Mixed Data Mode
        if not self.SubList_of_Graphs and self.List_of_Graphs is not None and len(self.List_of_Graphs) > 0:
            self.a = self.List_of_Graphs[0].a   # Assuming All Graphs Share the Same Adjacency Matrix - Convert the Original NumPy Dense Adjacency Matrix Into a TensorFlow Sparce Tensor Adjacency Matrix
            for Individual_Graph in self.List_of_Graphs:
                Individual_Graph.a = None   # `a = None` As It's Common Across All Graphs         
        # If the List of Graphs IS a Sub-List/Filtered List of Graphs --> Explicitly Provide a Global Adjacency Matrix
        elif self.SubList_of_Graphs and global_adjacency_matrix is not None:
            self.a = global_adjacency_matrix   # Setting the Global Adjacency Matrix
            for Individual_Graph in self.List_of_Graphs:
                Individual_Graph.a = None   # `a = None` As It's Common Across All Graphs
        else:
            self.a = None

    def read( self ):
        if self.Num_Files is None and self.List_of_Graphs is not None:
            return self.List_of_Graphs
        if self.Num_Files is not None:
            # If List of Graphs/Graph Objects Are Not Provided --> Read From Pickle Files
            # Use `self.Num_Files` To Read & Load the Raw Graph Data From All 19 `.pkl` Files
            Raw_Graph_Pickle_Data = TSGNNFs.Read_All_Raw_Graph_Data_From_Pickle_Files( num_files = self.Num_Files, spektral_data_mode = "Mixed Mode" )
            List_of_Spektral_Graph_Objects = [ Graph( x = X, a = A, e = E, y = y ) for (X, A, E, y) in Raw_Graph_Pickle_Data ]
            return List_of_Spektral_Graph_Objects

def Create_Dictionary_of_Match_ID_To_Starting_and_Ending_Indices():
    """
    Function That Creates a Dictionary Whose Key = Match ID (0 - 360) & Values Are Tuples of (Staring_Index, Ending_Index)
    """
    # List of #Frames Per Match
    Graphs_Per_Match = [1918, 2125, 1664, 1709, 2093, 1809, 1943, 1763, 2132, 2301, 1686, 1519, 1842, 1876, 1868, 1996, 1760, 239, 657, 566, 1844, 1792, 1744, 1545, 1894, 1730, 1932, 2055, 1651, 1843, 1869, 1849, 1933, 2003, 188, 1834, 816, 1596,
                        1801, 1925, 1889, 1513, 1686, 2010, 401, 2133, 2041, 1946, 1675, 1622, 1829, 1939, 1567, 2104, 1827, 1927, 1884, 2198, 1870, 532, 1780, 2025, 1875, 1720, 1981, 1480, 1648, 2185, 2006, 1824, 1803, 2263, 2045, 1628, 1888, 1736,
                        2205, 1934, 1941, 1691, 1853, 637, 1669, 1687, 293, 2041, 1980, 733, 2033, 1863, 1675, 624, 1540, 2261, 1953, 2405, 1701, 1866, 2398, 1918, 1985, 1362, 1631, 1843, 2155, 2096, 2300, 2089, 1359, 1776, 1520, 1911, 1896, 2057,
                        1899, 2000, 1898, 1936, 1994, 2164, 2119, 1875, 331, 1863, 1858, 2067, 1805, 2041, 1984, 1811, 1945, 1716, 1898, 2102, 1504, 2214, 1941, 1890, 2234, 1991, 1949, 383, 1626, 755, 1727, 1938, 1947, 1800, 1659, 1861, 1691, 2084,
                        1967, 1915, 1533, 1957, 1988, 2125, 1899, 1399, 2012, 2316, 2045, 2185, 1937, 2068, 1969, 2067, 1750, 1815, 1701, 1853, 1742, 2063, 2010, 2006, 1780, 2037, 2163, 1949, 2052, 1847, 1839, 1628, 2026, 1893, 2041, 1799, 1565, 2076,
                        1886, 2214, 1966, 1803, 2251, 1863, 2262, 1699, 1667, 1770, 2240, 1947, 2098, 1920, 1915, 2276, 1784, 1873, 1672, 1795, 1710, 2046, 1857, 2019, 2230, 1918, 2047, 1908, 1923, 1922, 2102, 1435, 2111, 2178, 2282, 1871, 2042, 1912,
                        1938, 2226, 2101, 1626, 1913, 1769, 2076, 1892, 1940, 1619, 2019, 2173, 1939, 1761, 2150, 1919, 1709, 2055, 2199, 1934, 1735, 2024, 1958, 1946, 2234, 2145, 2309, 1903, 1990, 1949, 2167, 2091, 1713, 1806, 2113, 1716, 2035, 1851,
                        1997, 1961, 2151, 2180, 1771, 1808, 1902, 2223, 2014, 2172, 1988, 1786, 1897, 2134, 1844, 1969, 1992, 1510, 2050, 1833, 1950, 1903, 2175, 1857, 2091, 2031, 2166, 1731, 2072, 1715, 2012, 1717, 1786, 2431, 2099, 2058, 2020, 1990,
                        1760, 1904, 1983, 1897, 1906, 1683, 2065, 2019, 2046, 1850, 1953, 2111, 1890, 1834, 1551, 1948, 1794, 2063, 2125, 2138, 2202, 1854, 1579, 1963, 1831, 1766, 2057, 2065, 2077, 2120, 2129, 1824, 2233, 2038, 1980, 2340, 1916, 1668,
                        1960, 1721, 1880, 1834, 1903, 1997, 2204, 1760, 2366, 1719, 1873, 1987, 1650, 2087, 1868, 1825, 2335, 1846, 1919]  # Length 361 (0 To 360)  -  38 Matches Per Line In This List
    print(f"#Matches In the Match ID <--> Index-Tuple's Dictionary = {len(Graphs_Per_Match)}", "\n")
    print(f"#Graphs In the Match ID <--> Index-Tuple's Dictionary = {sum(Graphs_Per_Match)}", "\n")
    MatchID_Index_dict = {}
    Start_Index = 0
    for Match_ID, Num_Graphs in enumerate(Graphs_Per_Match):
        End_Index = Start_Index + Num_Graphs
        MatchID_Index_dict[Match_ID] = (Start_Index, End_Index - 1)  # -1 As It's Inclusive
        Start_Index = End_Index  # Update Start_Index For Next Iteration
    return MatchID_Index_dict

def Get_Graphs_of_the_Match( match_id, matchID__index_tuple__dict, Spektral_dataset ):
    """
    Function That Fetches the Graphs/Frames of a Specific Match (With Match ID `match_id`), Which Will Help In Slicing the Spektral Dataset According To Match ID & Its Respective Starting & Ending Index Within the Spektral Dataset
    """
    Start_Index, End_Index = matchID__index_tuple__dict[match_id]
    return Spektral_dataset[Start_Index : End_Index + 1]

def Set_Seed(seed = 7):
    """
    Function That Fixes a Random Seed On Every Library Used Where Some Randomness Could Occur, So We ALWAYS Obtain Same Results, No Matter How Many Times We Restart & Rerun the Kernel
    Input: seed == Arbitrary Seed Value ; Default Value = 7
    """
    import os
    import random
    import numpy as np
    import tensorflow as tf
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    np.random.RandomState(seed)
    tf.random.set_seed(seed)

# GNN MODEL-RELATED CLASSES & FUNCTIONS #

class Spektral_GNN( Model ):
    '''
    Class That Builds the Spektral CrystalConvolutional GNN (Graph Neural Network) Architecture/Configuration With Model As the Parent Class From the Spektral Library
    '''
    def __init__( self, n_layers, n_channels, n_target_variables ):
        '''
        Constructor Code For Setting Up the Layers Needed (the Architecture) For Training the Model When Initilizing the GNN Model
        '''
        super().__init__()
        self.Conv_1 = CrystalConv( kernel_initializer = GlorotUniform(seed = 0) )
        self.Convolutions = [ CrystalConv( kernel_initializer = GlorotUniform(seed = Layer) ) for Layer in range(1, n_layers) ]
        self.Pool = GlobalAvgPool()
        self.Dense_1 = Dense( n_channels, activation = "relu", kernel_initializer = GlorotUniform(seed = n_layers) )
        self.DropOut = Dropout(0.5)
        self.Dense_2 = Dense( n_channels, activation = "relu", kernel_initializer = GlorotUniform(seed = n_layers + 1) )
        self.Dense_3 = Dense( n_target_variables, activation = "sigmoid", kernel_initializer = GlorotUniform(seed = n_layers + 2) )
    
    def call( self, inputs ):
        '''
        Forward Pass of the GNN Architecture
        '''
        x, a, e = inputs   # For `MixedLoader()`
        x = self.Conv_1( [x, a, e] )
        for Convolution in self.Convolutions:
            x = Convolution( [x, a, e] )
        x = self.Pool(x)   # For `MixedLoader()`
        x = self.Dense_1(x)
        x = self.DropOut(x)
        x = self.Dense_2(x)
        x = self.DropOut(x)
        return self.Dense_3(x)


Notebook Code:

import numpy as np
import pandas as pd
from scipy.sparse import coo_matrix
import sys
from itertools import chain
from itertools import product
from bisect import bisect_left
from sklearn.model_selection import GroupKFold, StratifiedGroupKFold
from spektral.data import Dataset, Graph, MixedLoader
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.metrics import AUC
from sklearn.metrics import roc_curve, auc
from sklearn.calibration import calibration_curve
import Thesis_Spektral_GNN_Model_Functions as TSGNNFs
%matplotlib inline

# Convert To Spektral Graph Objects
Soccer_Tracking_Graph_Disjoint_Data_Mode = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = 19, list_of_graphs = None, sublist_of_graphs = False, global_adjacency_matrix = None )

Soccer_Tracking_Graph_Mixed_Data_Mode = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = None, list_of_graphs = [ Graph for Graph in Soccer_Tracking_Graph_Disjoint_Data_Mode ], sublist_of_graphs = False, global_adjacency_matrix = None )

# Spektral Dataset & Graphs' Properties
print(f"Dataset Instantiation Info. of `Soccer_Tracking_Graph_Mixed_Data_Mode` = {Soccer_Tracking_Graph_Mixed_Data_Mode}", "\n")
Num_Graph_Nodes = max( Graph.n_nodes for Graph in Soccer_Tracking_Graph_Mixed_Data_Mode )
print(f"#Graph-Nodes = {Num_Graph_Nodes}", "\n")

Dense_Global_Adjacency_Matrix = Soccer_Tracking_Graph_Mixed_Data_Mode.a
print(f"Common (Dense) Adjacency Matrix For All Graphs In the Dataset - DataType = {Dense_Global_Adjacency_Matrix.dtype}", "\n")
print(f"Dimensions of the (Dense) Adjacency Matrix For All Graphs In the Dataset = {Dense_Global_Adjacency_Matrix.shape}", "\n")
print(f"(Dense) Adjacency Matrix For All Graphs In the Dataset = \n{Dense_Global_Adjacency_Matrix}", "\n")

Sparse_COO_Row_Major_Global_Adjacency_Matrix = np.array( np.where( Dense_Global_Adjacency_Matrix > 0 ) )
# print(f"Common Adjacency Matrix For All Graphs In the Dataset - DataType = {Sparse_COO_Row_Major_Global_Adjacency_Matrix.dtype}", "\n")
# print(f"Dimensions of the Adjacency Matrix For All Graphs In the Dataset = {Sparse_COO_Row_Major_Global_Adjacency_Matrix.shape}", "\n")
# print(f"Adjacency Matrix For All Graphs In the Dataset = \n{Sparse_COO_Row_Major_Global_Adjacency_Matrix}", "\n")

Sparse_Tensor_Global_Adjacency_Matrix = tf.sparse.from_dense( Dense_Global_Adjacency_Matrix )
print(f"Common Adjacency Matrix (That Is Actually Used In the Code) For All Graphs In the Dataset - DataType = {Sparse_Tensor_Global_Adjacency_Matrix.dtype}", "\n")
print(f"Dimensions of the Adjacency Matrix (That Is Actually Used In the Code) For All Graphs In the Dataset = {Sparse_Tensor_Global_Adjacency_Matrix.shape}", "\n")
print(f"Adjacency Matrix (That Is Actually Used In the Code) For All Graphs In the Dataset = \n{Sparse_Tensor_Global_Adjacency_Matrix}", "\n")

Num_Node_Features = Soccer_Tracking_Graph_Mixed_Data_Mode.n_node_features
print(f"#Node-Features Per Node In the Node-Features' Matrix = {Num_Node_Features}", "\n")

Num_Edge_Features = Soccer_Tracking_Graph_Mixed_Data_Mode.n_edge_features
print(f"#Edge-Features Per Edge In the Edge-Features' Matrix = {Num_Edge_Features}", "\n")
        
# RESHAPING Every Edge-Feature Matrix From Dense NumPy Array To Row-Major Ordering
for Graph in Soccer_Tracking_Graph_Mixed_Data_Mode:
     Graph.e = Graph.e[ Sparse_COO_Row_Major_Global_Adjacency_Matrix[0], Sparse_COO_Row_Major_Global_Adjacency_Matrix[1] ]

Num_Target_Variables = Soccer_Tracking_Graph_Mixed_Data_Mode.n_labels
print(f"#Target Variables = {Num_Target_Variables}", "\n")

Num_Graphs_In_Dataset = len(Soccer_Tracking_Graph_Mixed_Data_Mode)
print(f"Number of Graphs/Frames In the Dataset = {Num_Graphs_In_Dataset}", "\n")

# Extracting Info of 1st Graph In the Dataset
Graph_Num_1 = Soccer_Tracking_Graph_Mixed_Data_Mode[0]
print(f"1st Graph Instantiation Info. of `Graph_Num_1` = {Graph_Num_1}", "\n")

Num_Graph_Nodes_1st_Graph = Graph_Num_1.n_nodes
print(f"#Graph-Nodes In 1st Graph = {Num_Graph_Nodes_1st_Graph}", "\n")

Num_Node_Features_1st_Graph = Graph_Num_1.n_node_features
print(f"#Node-Features Per Node In the Node-Features' Matrix In 1st Graph = {Num_Node_Features_1st_Graph}", "\n")

Node_Features_1st_Graph = Graph_Num_1.x
print(f"Node-Features' Matrix In 1st Graph - DataType = {Node_Features_1st_Graph.dtype}", "\n")
print(f"Dimensions of the Node-Features' Matrix = {Node_Features_1st_Graph.shape}", "\n")
# print(f"Node-Features' Matrix In 1st Graph = \n{Node_Features_1st_Graph}", "\n")

Num_Edge_Features_1st_Graph = Graph_Num_1.n_edge_features
print(f"#Edge-Features Per Edge In the Edge-Features' Matrix In 1st Graph = {Num_Edge_Features_1st_Graph}", "\n")

Edge_Features_1st_Graph = Graph_Num_1.e
print(f"Edge-Features' Matrix In 1st Graph - DataType = {Edge_Features_1st_Graph.dtype}", "\n")
print(f"Dimensions of the Edge-Features' Matrix = {Edge_Features_1st_Graph.shape}", "\n")    # ORGINALLY: (23, 23, 2) = (n_nodes, n_nodes, n_edge_features_per_node)  -->  NOW: (506, 2) = ( (23 x 22), 2 )
# print(f"Edge-Features' Matrix In 1st Graph = \n{Edge_Features_1st_Graph}", "\n")

Num_Target_Variables_1st_Graph = Graph_Num_1.n_labels
print(f"#Target Variables In 1st Graph = {Num_Target_Variables_1st_Graph}", "\n")
Target_Variable_1st_Graph = Graph_Num_1.y
print(f"Target Variable In 1st Graph - DataType = {Target_Variable_1st_Graph.dtype}", "\n")
print(f"Dimensions of the Target Variable's Matrix = {Target_Variable_1st_Graph.shape}", "\n")
# print(f"Target Variable In 1st Graph = {Target_Variable_1st_Graph}", "\n")

# Relating Each Graph To Its Respective Match ID
MatchID__Index_Tuple__dict = TSGNNFs.Create_Dictionary_of_Match_ID_To_Starting_and_Ending_Indices()

Sorted_Start_Indices = sorted( (Start, Match_ID) for Match_ID, (Start, End) in MatchID__Index_Tuple__dict.items() )
Start_Indices, Match_ID_Keys = zip( *Sorted_Start_Indices )

Outer_Training_Match_IDs_set = set()
Outer_Test_Match_IDs_set = set()
Inner_Training_Match_IDs_set = set()
Inner_Validation_Match_IDs_set = set()

MatchID_Array = []
for Match_ID, (Start_Index, End_Index) in MatchID__Index_Tuple__dict.items():
    MatchID_Array.extend( [Match_ID] * (End_Index - Start_Index + 1) )

# Hyper-Parameter Tuning Space
HyperParameter_Space_Dict = { "Learning_Rate" : [0.001, 0.01],
                              "Num_Epochs" : [50],
                              "Batch_Size" : [512], #[256, 512],
                              "Num_Channels" : [64, 128],
                              "Num_Layers" : [3, 5] }

Hyper_Parameters_Combinations = [ dict( zip( HyperParameter_Space_Dict, Product ) ) for Product in product( *HyperParameter_Space_Dict.values() ) ]
print(f"#Sets of Hyper-Parameter Combinations = {len(Hyper_Parameters_Combinations)}", "\n")

# Fix Random Seed - Obtain Same Results
TSGNNFs.Set_Seed(42)

# Nested CV Loop
# Outer CV = Stratified Group K-Fold  ->  Split Data Into 19 Groups (19 Matches Each Group), To Ensure No Data Leakage, & Stratify These Groups, To Ensure That Target Variable's Classes/Labels Are As Balanced As Possible Across All Groups
   # - Outer-K = 19  ->  19 Groups of 19 Matches
# Inner CV = Group K-Fold  ->  Split Group's Data Into 19 Single Matches, To Ensure No Data Leakage From Match Into Another
   # - Inner-K = 19  ->  Within Each Outer Group, Split Them Further Into 19 Single Matches

Outer_CV = StratifiedGroupKFold( n_splits = 19, shuffle = False )
Inner_CV = GroupKFold( n_splits = 19 )

# Train Spektral GNN
# - Loss-Function = Log-Loss / Binary CrossEntropy
# - Optimizer Algorithm = Adam
# - Validation Metric = ROC-AUC

# Evaluate ROC-AUC
# - Calculate AUC
# - Plot ROC

# Evaluate Calibration Curve - Expected Calibration Error (ECE)
# - Calculate ECE
# Distribute Outcome Into K Bins & Compute Diff Btw Avg Prediction In Each Bin & Avg Expected Outcome 4 Examples In Each Bin
# - Plot Cal. Curves

# Store Best Validation Hyper-Parameters, Validation Loss & the Best Validation AUC Score
Best_Validation_Parameters = {}
Best_Validation_Loss = float("inf")
Best_Validation_AUC = 0
# Initialize Accumulators For All Outer Folds
All_y_True_Probabilities = []
All_y_Predicted_Probabilities = []
All_ECEs_df = pd.DataFrame( columns = ["Original Predicted Probability", "Predicted Probability", "Target", "Result"] )

# Outer Loop For Stratified Group K-Fold CV  -->  Into 19 Groups of 19 Matches Each
for Outer_Training_Indices, Outer_Test_Indices in Outer_CV.split( X = [ Graph.x for Graph in Soccer_Tracking_Graph_Mixed_Data_Mode ], y = [ Graph.y for Graph in Soccer_Tracking_Graph_Mixed_Data_Mode ], groups = MatchID_Array ):
    # Extract the Match IDs Used For Outer Training & Testing
    for Index in Outer_Training_Indices:
        Position = bisect_left( Start_Indices, Index )
        if Position != len(Start_Indices):
            # Double-Check If the Index Actually Falls Into the Range
            Match_ID = Match_ID_Keys[Position]
            Start_Index, End_Index = MatchID__Index_Tuple__dict[Match_ID]
            if Start_Index <= Index <= End_Index:
                Outer_Training_Match_IDs_set.add(Match_ID)
    Outer_Training_Match_IDs = list( Outer_Training_Match_IDs_set )
    for Index in Outer_Test_Indices:
        Position = bisect_left( Start_Indices, Index )
        if Position != len(Start_Indices):
            # Double-Check If the Index Actually Falls Into the Range
            Match_ID = Match_ID_Keys[Position]
            Start_Index, End_Index = MatchID__Index_Tuple__dict[Match_ID]
            if Start_Index <= Index <= End_Index:
                Outer_Test_Match_IDs_set.add(Match_ID)
    Outer_Test_Match_IDs = list( Outer_Test_Match_IDs_set )
    # Concatenate Training Graphs
    Outer_Training_Graphs_List = list( chain.from_iterable( [ TSGNNFs.Get_Graphs_of_the_Match( match_id = Match_ID, matchID__index_tuple__dict = MatchID__Index_Tuple__dict, Spektral_dataset = Soccer_Tracking_Graph_Mixed_Data_Mode ) for Match_ID in Outer_Training_Match_IDs ] ) )
    Outer_Training_Graphs = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = None, list_of_graphs = Outer_Training_Graphs_List, sublist_of_graphs = True, global_adjacency_matrix = Sparse_Tensor_Global_Adjacency_Matrix )
    Outer_Training_Targets = [ Graph.y for Graph in Outer_Training_Graphs ]
    # Derive the groups array for Inner_CV.split()
    Filtered_Training_Groups = np.array( [ MatchID_Array[Index] for Index in Outer_Training_Indices if MatchID_Array[Index] in Outer_Training_Match_IDs ] )
    # Concatenate Training Graphs
    Outer_Test_Graphs_List = list( chain.from_iterable( [ TSGNNFs.Get_Graphs_of_the_Match( match_id = Match_ID, matchID__index_tuple__dict = MatchID__Index_Tuple__dict, Spektral_dataset = Soccer_Tracking_Graph_Mixed_Data_Mode ) for Match_ID in Outer_Test_Match_IDs ] ) )
    Outer_Test_Graphs = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = None, list_of_graphs = Outer_Test_Graphs_List, sublist_of_graphs = True, global_adjacency_matrix = Sparse_Tensor_Global_Adjacency_Matrix )
    Outer_Test_Targets = [ Graph.y for Graph in Outer_Test_Graphs ]
    # Inner Loop 4 Group K-Fold  -  Balancing 'Will_Be_a_Goal' Values As Much As Possible
    for Inner_Training_Indices, Inner_Validation_Indices in Inner_CV.split( X = [ Graph.x for Graph in Outer_Training_Graphs ], y = Outer_Training_Targets, groups = Filtered_Training_Groups ):
        # Extract the Match IDs Used For Outer Training & Testing
        for Index in Inner_Training_Indices:
            Position = bisect_left(Start_Indices, Index)
            if Position != len(Start_Indices):
                # Double-Check If the Index Actually Falls Into the Range
                Match_ID = Match_ID_Keys[Position]
                Start_Index, End_Index = MatchID__Index_Tuple__dict[Match_ID]
                if Start_Index <= Index <= End_Index:
                    Inner_Training_Match_IDs_set.add(Match_ID)
        Inner_Training_Match_IDs = list( Inner_Training_Match_IDs_set )
        for Index in Inner_Validation_Indices:
            Position = bisect_left(Start_Indices, Index)
            if Position != len(Start_Indices):
                # Double-Check If the Index Actually Falls Into the Range
                Match_ID = Match_ID_Keys[Position]
                Start_Index, End_Index = MatchID__Index_Tuple__dict[Match_ID]
                if Start_Index <= Index <= End_Index:
                    Inner_Validation_Match_IDs_set.add(Match_ID)
        Inner_Validation_Match_IDs = list( Inner_Validation_Match_IDs_set )
        Inner_Training_Graphs_List = list( chain.from_iterable( [ TSGNNFs.Get_Graphs_of_the_Match( match_id = Match_ID, matchID__index_tuple__dict = MatchID__Index_Tuple__dict, Spektral_dataset = Soccer_Tracking_Graph_Mixed_Data_Mode ) for Match_ID in Inner_Training_Match_IDs ] ) )
        Inner_Training_Graphs = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = None, list_of_graphs = Inner_Training_Graphs_List, sublist_of_graphs = True, global_adjacency_matrix = Sparse_Tensor_Global_Adjacency_Matrix )
        Inner_Training_Targets = [ Graph.y for Graph in Inner_Training_Graphs ]
        Inner_Validation_Graphs_List = list( chain.from_iterable( [ TSGNNFs.Get_Graphs_of_the_Match( match_id = Match_ID, matchID__index_tuple__dict = MatchID__Index_Tuple__dict, Spektral_dataset = Soccer_Tracking_Graph_Mixed_Data_Mode ) for Match_ID in Inner_Validation_Match_IDs ] ) )
        Inner_Validation_Graphs = TSGNNFs.Soccer_Tracking_Graph_Dataset( num_files = None, list_of_graphs = Inner_Validation_Graphs_List, sublist_of_graphs = True, global_adjacency_matrix = Sparse_Tensor_Global_Adjacency_Matrix )
        Inner_Validation_Targets = [ Graph.y for Graph in Inner_Validation_Graphs ]
        tf.print("Starting the Inner Loop For Hyper-Parameter Tuning, Training & Validation", "\n")
        tf.print("_____", "\n")
        tf.print("_____", "\n")
        # Hyper-Parameter Tuning, Model Training & Validation
        for Parameters in Hyper_Parameters_Combinations:
            GNN_Model = TSGNNFs.Spektral_GNN( n_layers = Parameters["Num_Layers"], n_channels = Parameters["Num_Channels"], n_target_variables = Num_Target_Variables )
            Optimizer = Adam( learning_rate = Parameters["Learning_Rate"] )
            Loss_Function = BinaryCrossentropy()
            Inner_Early_Stopping = EarlyStopping( monitor = "val_loss", patience = 10, restore_best_weights = True )
            GNN_Model.compile( optimizer = Optimizer, loss = Loss_Function, metrics = ["AUC"] )
            # GNN_Model.summary()
            Inner_Training_Loader = MixedLoader( Inner_Training_Graphs, batch_size = Parameters["Batch_Size"], epochs = Parameters["Num_Epochs"], shuffle = False )
            Inner_Validation_Loader = MixedLoader( Inner_Validation_Graphs, batch_size = Parameters["Batch_Size"], shuffle = False )
            tf.print(f"Starting the Inner Training For the Hyper-Parameter Combination --> {Parameters}  &  Steps Per Epoch = {Inner_Training_Loader.steps_per_epoch}", "\n")tf.print("_____", "\n")
            Training_and_Validation_History_of_Metrics = GNN_Model.fit( Inner_Training_Loader.load(), steps_per_epoch = Inner_Training_Loader.steps_per_epoch, validation_data = Inner_Validation_Loader.load(), class_weight = { 0 : 1.0, 1 : 1.5 }, epochs = Parameters["Num_Epochs"], batch_size = Parameters["Batch_Size"], callbacks = [Inner_Early_Stopping] ) # , verbose = 2 )
            tf.print(f"Inner Training Done For the Hyper-Parameter Combination --> {Parameters}  &  Steps Per Epoch = {Inner_Training_Loader.steps_per_epoch}", "\n")
            tf.print(f"Starting Inner Validation Evaluation For the Hyper-Parameter Combination --> {Parameters}", "\n")
            tf.print("_____", "\n")
            # Evaluation & Hyper-Parameter Tuning Logic
            Validation_Loss = Training_and_Validation_History_of_Metrics.history["val_loss"][-1]  # Get the last validation loss from the training history
            Validation_Loss = GNN_Model.history["val_loss"][-1]  # Get the last validation loss from the training history
            Validation_Predictions = GNN_Model.predict( Inner_Validation_Loader.load(), steps_per_epoch = Inner_Validation_Loader.steps_per_epoch )
            ROC_AUC = roc_auc_score( [ Inner_Validation_Graph.y for Inner_Validation_Graph in Inner_Validation_Graphs ], Validation_Predictions )
            tf.print(f"Evaluation Done For the Hyper-Parameter Combination --> {Parameters}", "\n")
            tf.print("_____", "\n")
            if ROC_AUC > Best_Validation_AUC:
                Best_Validation_AUC = ROC_AUC
                Best_Validation_Parameters = Parameters
                Best_Validation_Loss = Validation_Loss
            K.clear_session()  # Clear the TensorFlow Backend Session At the End of Each Hyper-Parameter Setting
    tf.print("Inner Loop Done With Hyper-Parameter Tuning, Training & Validation", "\n")
    tf.print("_____", "\n")
    tf.print("_____", "\n")
    K.clear_session()  # Clear the TensorFlow Backend Session At the End of Each Hyper-Parameter Setting
    # Training the Best Validation GNN Model On the Full Training Set (Considering Outer Training Fold)
    Best_GNN_Model = TSGNNFs.Spektral_GNN( n_layers = Best_Validation_Parameters["Num_Layers"], num_channels = Best_Validation_Parameters["Num_Channels"] )
    Best_Optimizer = Adam( learning_rate = Best_Validation_Parameters["Learning_Rate"] )
    Loss_Function = BinaryCrossentropy()
    Outer_Early_Stopping = EarlyStopping( monitor = "val_loss", patience = 10, restore_best_weights = True )
    Best_GNN_Model.compile( optimizer = Best_Optimizer, loss = Loss_Function, metrics = ["AUC"] )
    # Best_GNN_Model.summary()
    Outer_Training_Loader = MixedLoader( Outer_Training_Graphs, batch_size = Best_Validation_Parameters["Batch_Size"], epochs = Best_Validation_Parameters["Num_Epochs"], shuffle = False )
    Outer_Test_Loader = MixedLoader( Outer_Test_Graphs, batch_size = Best_Validation_Parameters["Batch_Size"], shuffle = False )
    tf.print(f"Starting the Outer Training For the Best Hyper-Parameter Combination --> {Best_Validation_Parameters}  &  Steps Per Epoch = {Outer_Training_Loader.steps_per_epoch}", "\n")
    tf.print("_____", "\n")
    Best_GNN_Model.fit( Outer_Training_Loader.load(), steps_per_epoch = Outer_Training_Loader.steps_per_epoch, validation_data = Outer_Test_Loader.load(), class_weight = { 0 : 1.0, 1 : 1.5 }, epochs = Best_Validation_Parameters["Num_Epochs"], batch_size = Best_Validation_Parameters["Batch_Size"], callbacks = [Outer_Early_Stopping] ) # , verbose = 2 )
    tf.print(f"Outer Training Done For the Hyper-Parameter Combination --> {Best_Validation_Parameters}  &  Steps Per Epoch = {Outer_Training_Loader.steps_per_epoch}", "\n")
    tf.print(f"Starting Outer Testing Evaluation For the Hyper-Parameter Combination --> {Best_Validation_Parameters}", "\n")
    tf.print("_____", "\n")
    # ROC-AUC Evaluation
    Outer_y_True_Probabilities = []
    Outer_y_Predicted_Probabilities = []
    for Batch in Outer_Test_Loader:
        Inputs, Target = Batch
        Predicted_Probability = Best_GNN_Model( inputs = Inputs, training = False ).numpy()
        Outer_y_True_Probabilities.append( Target.numpy() )
        Outer_y_Predicted_Probabilities.append( Predicted_Probability )
    All_y_True_Probabilities.extend( Outer_y_True_Probabilities )
    All_y_Predicted_Probabilities.extend( Outer_y_Predicted_Probabilities )
    # ECE Evaluation
    ECE_df = pd.DataFrame( columns = ["Original Predicted Probability", "Predicted Probability", "Target", "Result"] )
    for Batch in Outer_Test_Loader:
        Inputs, Target = Batch
        Predicted_Probability = Best_GNN_Model( inputs = Inputs, training = False ).numpy()
        # Threshold Set To 0.5
        Predicted_Value = 1 if Predicted_Probability >= 0.5 else 0
        ECE_df.loc[ len(ECE_df) ] = [Predicted_Probability, Predicted_Value, Target.numpy()[0][0], 1 if Predicted_Value == Target.numpy()[0][0] else 0]
    All_ECEs_df = pd.concat( [All_ECEs_df, ECE_df], ignore_index = True )
tf.print("Outer Loop Done With Training & Testing Processes", "\n")
tf.print("_____", "\n")
tf.print("_____", "\n")
tf.print("_____", "\n")
tf.print("FINISHED", '\n')

1 Epoch Run Time = 30-45mins ↑

Full Run Time = ???s ≡ ???hrs ↑




Apart from the main issue I have - if you find any other mistakes or errors in my code, absolutely any feedback is immensely appreciated!

If you can & are able to, please help me out - I have been unsuccessfully trying to solve this issue all this last month & this is basically my last resource/bullet.

I only have a month or so left of practical work before the deadline of my thesis.

:sos::pray:

additional info : Don’t you get any error message?
tradining setup issue : Did you try running your code in a different setup (e.g. python script instead of notebook, another computer, …) ?
memory overflow : Did you try to reduce your dataset ? batch size ?