Deep Learning NN for audio: data preparation, loss, evaluation question

Andrii_Tsemko · November 3, 2023, 4:12am

Hi All!

I work with a Audio processing neural networks for a task such as noise suppression.
Currently, I investigate the existed NN for such tasks and try to reproduce whole Training flow to get similar results.

I have some question about data preparation and loss calculation principles, because it hard to find some advices/results about it.

Most of application use STFT or Mel-cepstrum audio representation and process it by NN. But my NN approach works with Time Domain of audio. To simplify, the structure is Next: Audio_Signal_With_Noise → NN → Audio_Signal

SO NN block should process Time Domain audio signal and have an output as new Time Domain signal without noise. Let’s say, that for LOSS function I will use SDR metric.

My questions are next:

Should I train my NN to get output signal normalized to -1:1 range? If yes, what should be the most efficient approach for training: normalize output of a neural network to -1:1 range and than calculate LOSS during the training? Or just pass Normalized X and Y to NN and train it like this?
What about mean of Audio? Should I normalize audio to -1:1 and EXPECT to have a 0-mean signal output, or fix output of NN by substracting the mean like this: nn_output = nn_output - mean(nn_output). Or Traing NN on a data normalized from 0 to 1?

Renu_Patel · December 18, 2023, 12:21pm

Hi @Andrii_Tsemko

Welcome to the TensorFlow Forum!

You can prepare the audio signal data by trimming the noise and then can use into the model. Please refer to this Simple Audio Recognition model for more information. Thank you.

Topic		Replies	Views
Train a tensorflow model to detect silence in .wav file General Discussion model-training , tensorflow	0	162	March 20, 2024
Single channel speech separation using Neural Networks TensorFlow models , pytorch	0	29	January 9, 2025
Simple audio recognition: Recognizing keywords \| TensorFlow Core General Discussion models , help_request , tfcore	8	3691	December 28, 2022
Implementing a CNN LSTM architecture for audio segmentation TensorFlow models , keras , help_request	16	3370	July 1, 2021
Weight normalisation in Custom Layer TensorFlow models , help_request	4	602	January 24, 2024

Deep Learning NN for audio: data preparation, loss, evaluation question

Related topics