Having problems with training my neural nets

Hey folks,

I’ve seen this article from TensorFlow https://www.tensorflow.org/tutorials/keras/classification
Which does a great job explaining the details in configuring a neural network to classify 10 different labels / classes from the fashion MNIST dataset - this inspired me to design a neural network for music classification.

With the code underneath I want to feed an algorithm with two types of folders that contain two different types of music genres, then create a spectrogram for each of those audio-files, and those spectrogram-images would then be used to train the neural network, just like in the Keras classification example above. So instead of using images of 10 different fashion articles, I am using images of two different types of spectrograms. The only difference is that I want to design my neural network totally linear, so no additional relu-activated dense-layer in the middle. To keep things simple I started with just two folders, so it is a classification task to differ between just two musical genres at the moment, but my goal would be to add more genres later.

import numpy as np 
import librosa
import librosa.display
import datetime
import math
import os
import tensorflow as tf
from pathlib import Path

# Spektrogram
def prepare_song(song_path):
  list_matrices = []
  y,sr = librosa.load(song_path,sr=22050,duration=10)
  D = np.abs(librosa.stft(y))**2
  S = librosa.feature.melspectrogram(S=D, sr=sr)
  list_matrices.append(S)
  return list_matrices

audio_tracks = []
genre = []

#Genre 1
path = '/Users/Laulito/Desktop/Samplepack der Genres/House'
pathlist = Path(path).glob('**/*.wav')
for path in pathlist:
     path_in_str = str(path)
     song_pieces = prepare_song(path_in_str)
     audio_tracks += song_pieces
     genre += ([0]*len(song_pieces)) # puts zeros into target / train--labels array

#Genre 2
path2 = '/Users/Laulito/Desktop/Samplepack der Genres/Drum & Bass'
pathlist2 = Path(path2).glob('**/*.wav')
for path2 in pathlist2:
     path_in_str2 = str(path2)
     song_pieces = prepare_song(path_in_str2)
     audio_tracks += song_pieces
     genre += ([1]*len(song_pieces)) # puts ones into target / train-labels array

# Initialise
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(np.array(audio_tracks), 
                                                    np.array(genre),
                                                    test_size=0.2,
                                                    train_size=0.8,
                                                    random_state=42)

X_val, X_test, y_val, y_test = train_test_split(X_test, 
                                                y_test,
                                                test_size=0.5,
                                                random_state=42)

# Linear Model
from keras import datasets, layers, models
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(128, 440)), # 128x440 is the size of a spectrogram-image 
    tf.keras.layers.Dense(2) #Dense(2) because there are just two genres
])

model.summary()

lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.2,
    decay_steps=15,
    decay_rate=0.9)

model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=[tf.keras.metrics.Accuracy()])

model.fit(x=X_train, y=y_train, epochs=5, validation_split=0.2)

model.evaluate(x=X_test, y=y_test)

That Code was bugged and stopped me at line model.fit(), telling me in the terminal that shape (none, 1) and shape (none, 2) would be incompatible. I guess it has something to do with the last dense-layer tf.keras.layers.Dense(2), creating a shape of (none, 2), but the shape of my label-array was (none, 1). Which surprised me because the target in the Keras example above was also one-dimensional and the last dense-layer was of dimension 10, so their shapes would have been (none, 10) and (none, 1) …

Anyway I modified the code as follows:

a = 0
b = 1

#Genre 1
path = '/Users/Laulito/Desktop/Samplepack der Genres/House'
pathlist = Path(path).glob('**/*.wav')
for path in pathlist:
     path_in_str = str(path)
     song_pieces = prepare_song(path_in_str)
     audio_tracks += song_pieces
     array = [a,b]
     genre += ([array]*len(song_pieces))

#Genre 2
path2 = '/Users/Laulito/Desktop/Samplepack der Genres/Drum & Bass'
pathlist2 = Path(path2).glob('**/*.wav')
for path2 in pathlist2:
     path_in_str2 = str(path2)
     song_pieces = prepare_song(path_in_str2)
     audio_tracks += song_pieces
     array = [b,a]
     genre += ([array]*len(song_pieces))

With this change I at least now got the code running, because now the shape of genre is (none, 2) as well, but it resulted in a model where the loss was “nan” and the accuracy was 0 … I might have messed up something along the way … maybe someone can help me figure out were i went wrong

Hi @Laulito,

Welcome to the TensorFlow Forum!

You can use convolutional layers in your model along with Adam optimizer and BinaryCrossentropy/
SparseCategoricalCrossentropy() loss function as you are trying to do image classification.

Because you want to classify two musical genre images, you can use Binary Image classification model where you can define last dense layer of the model along with some conv2D layers as-tf.keras.layers.Dense(1, activation='sigmoid') and Optimizer = 'Adam', Loss = 'tf.keras.losses.BinaryCrossentropy()'

Please have look at this link for the same example reference. Thank you.

2 Likes

Hi @Renu_Patel,

thank you for your help and the corresponding link!

I was just wondering would it be possible to implement a linear model at all for this task?

To be more specific: the final goal would be to compare the various different models, for example linear, CNN, MLP and so on regarding their performance to fulfil the genre classification task. It is something I have to do for my university, maybe you can help me with the setup for the most basic linear model that I could implement for this genre classification task! Performance does not have to be good at all, but would be interesting to compare the different architectures, because I already had an implementation of a functioning CNN genre classifier at hand, but the linear one is still missing.

Thanks again!