How can I classify the language of voice data?

How can I classify the language of voice data? Specifically, I speak English, Japanese, French, Italian, and Russian to the voice data. I want to make a model learn this and create a model that classifies what language the new voice data is speaking. What kind of preprocessing, feature extraction, and model selection should be performed?

Hi @ruorch, The use case you are trying to implement will comes under Audio classification task. Please refer to this tutorial to know how to implement a image classification using CNN’s. Thank You.