When I run the following code:
cnn5 = Sequential()
# input layer
cnn5.add(Conv2D(32, kernel_size=(21,21), strides=(1,1), padding='same', activation='relu', input_shape=(256,256,1)))
# convolutional layer
cnn5.add(Conv2D(64, kernel_size=(15,15), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))
cnn5.add(Conv2D(128, kernel_size=(9,9), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))
# add another layer
cnn5.add(Conv2D(256, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
cnn5.add(MaxPool2D(pool_size=(2,2)))
# flatten output of conv
cnn5.add(Flatten())
# dense connected layers
cnn5.add(Dense(1000))#, activation='relu'))
# output layer
cnn5.add(Dense(10, activation='softmax'))
#Model compiling and fitting
cnn5.compile(optimizer='adam', \
loss='categorical_crossentropy', \
metrics=['accuracy'])
run_cnn5 = cnn5.fit(x_train, y_train_onehot, epochs=20, validation_data=(x_test, y_test_onehot))
I get an accuracy of 9% after the first epoch and an accuracy of 6.5% after all the other 19 epochs without any improvement … The images I use for training are Spectrogramms created by this audio dataset (lewtun/music_genres_small · Datasets at Hugging Face).
How can one determine what kind of an issue a CNN has if the accuracy does not even change by a tiny amount after the first epoch? What could be the issue here? Changing kernel sizes or adding / removing layers or channels does not change anything …