TF encountered strange errors when using GPU

Hi all,
My program unexpectedly terminated during training, without any error messages, no red text, just a lone bytecode.Thanks in Advance.

program code and output are as follows:
program code:

import ctypes
import os
cuda_path = r'*path*'
cudnn_path = r'*path*'

ctypes.windll.LoadLibrary(os.path.join(cuda_path, 'cudart64_110.dll'))
ctypes.windll.LoadLibrary(os.path.join(cuda_path, 'cublas64_11.dll'))
ctypes.windll.LoadLibrary(os.path.join(cuda_path, 'cublasLt64_11.dll'))
ctypes.windll.LoadLibrary(os.path.join(cuda_path, 'cufft64_10.dll'))
ctypes.windll.LoadLibrary(os.path.join(cuda_path, 'cusparse64_11.dll'))
ctypes.windll.LoadLibrary(os.path.join(cudnn_path, 'cudnn64_8.dll'))

import tensorflow as tf

import keras
import os
import matplotlib.pyplot as plt
import pickle
import matplotlib
matplotlib.use('TkAgg')
cls_nums = 2
IMG_HEIGHT = 224
IMG_WIDTH = 224
batch_size = 2
root = r"C:\Users\Drwalkinlong\Desktop\tf"
train_dir = os.path.join(root, 'new_datasets', 'train')
validation_dir = os.path.join(root, 'new_datasets', 'validation')

train_image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    horizontal_flip=True,
    vertical_flip=True,
    rescale=1/255,
    rotation_range=20
)
validation_image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1/255,
)

train_data_gen = train_image_generator.flow_from_directory(
    directory=train_dir,
    batch_size=batch_size,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    shuffle=True,
    class_mode='categorical',
)

val_data_gen = validation_image_generator.flow_from_directory(
    directory=validation_dir,
    batch_size=batch_size,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    shuffle=True,
    class_mode='categorical',
)

AlexNet = keras.Sequential([
    keras.layers.Conv2D(filters=96, kernel_size=(11, 11), strides=(4, 4), padding='valid', activation='relu'),
    keras.layers.MaxPool2D(pool_size=(3, 3), strides=2),
    keras.layers.Conv2D(filters=256, kernel_size=(5, 5), strides=1, padding='same', activation='relu'),
    keras.layers.MaxPool2D(pool_size=(3, 3), strides=2),
    keras.layers.Conv2D(filters=384, kernel_size=(3, 3), strides=1, padding='same', activation='relu'),
    keras.layers.Conv2D(filters=384, kernel_size=(3, 3), strides=1, padding='same', activation='relu'),
    keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=1, padding='same', activation='relu'),
    keras.layers.MaxPool2D(pool_size=(3, 3), strides=2),
    keras.layers.Flatten(),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(rate=0.25),
    keras.layers.Dense(4096, activation='relu'),
    keras.layers.Dropout(rate=0.25),
    keras.layers.Dense(units=2, activation='softmax'),
])
AlexNet.build(input_shape=[batch_size, IMG_HEIGHT, IMG_WIDTH, 3])
tf.debugging.set_log_device_placement(True)
losses = keras.losses.CategoricalCrossentropy()
AlexNet.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
    loss=losses, metrics=['acc']
)
history = AlexNet.fit(train_data_gen, epochs=10)

output:

D:\python3.9\python.exe C:\Users\Drwalkinlong\Desktop\tf\04A.py 
Found 17500 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.
2024-05-16 23:23:05.806397: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-16 23:23:06.084890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3584 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2024-05-16 23:23:06.300133: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10

 -1073740791 (0xC0000409)

Are you using an old version of TF?
I mean do you really need to manually import ctypes?
Did you try out running our code without these imports?