I am starting a new project and want to build a custom audio classifier using the great website Teachable machine and his speech-commands model.
I have collected audio samples from a personal user interface, and now i have many wav files available.
My goal is to upload them inside teachable machine website, following the audio zip format, which is just a zip file containing :
all the samples sound files for a class concatenated inside a single webm file
a json file explaining for each sound sample his audio characteristics.
Using a simple Node.js script file, i achieve to concatenate my files, and creating the json file.
The last missing information foreach file is an attribute called “frequencyFrames”. For a sample i have downloaded from TM (and created online using microphone), it is an array of array.
Is someone knows how can i get this information for each wav file ?
can you share the link where you read those instructions?
I think that frequencyFrames might be the framerate of the audio files. This is defined when you record them (eg: 16kHz, 24kHz, …)
The test project is linked to my Google drive, i cannot share it on the web.
And when i export it, i have in the zip file, in the samples.json file, an array for the 8 samples for the first class “le”. And for each sample, the frequencyFrames is an array of array like in the screenshot below :
Not really. I am trying to use Teachable Machine interface with wav files i have collected for a side project, and training a model with these sounds.
There is only one way to import sounds in Teachable machine : previous sounds recorded with the interface and downloaded on my computer for example.
I have tried (explained in my first message) to reproduce the structure of the zip file i got when i download a Teachable machine class.
I have found some interesting classes in tfjs speech commands source code (tfjs-models/speech-commands/src at master · tensorflow/tfjs-models · GitHub), but didn’t find a way to recreate frequencyFrames data.
I have all my wav files offline, and a Node.js script to loop on them and recreate the needed Teachable machine zip file.
Hi Vincent, sorry for the delay but got some clarification for you:
The zip file that is generated includes the audio files solely for the purposes of playback, but the FFT data has already been previously extracted, this is the missing “frequencyFrames”.
heres a comment from the code that is processing it:
/**
* The number of frames of frequency data to represent one sample,
* for speech-commands this is 43, it corresponds to the models input shape
* speech-commands input shape is [null, 43, 232, 1]
*/
Each “sample” is 43 frames consisting of the first 232 numbers in the FFTs array. Though another important detail is these numbers come from the WebAudio API’s Analyser and using something else is likely to shift performance.
My suggestion is, if you want to do with your own audio, it might be simple to:
1 - change their code (it’s open source) and enable a better input
2- try using one of the samples here:
Hi, I am Maxwell
I am currently working on ML audio project, I got struck in same problem Teachable machine is not allowing me to upload my own audio dataset. I have referred your code, I want to clarify that your code will work or not.
Thank you