Is there a quick and easy import tool for custom voice data?
Is there a free local training Speech Recognition to text tool (including exporting model for tf.js) for custom raw voice data?
Can I run tf.js to automatically learn unknown speech sounds and integrate them into existing model examples?
ps: I don’t want to train my custom data through a cloud-based paid service
Welcome to the community. If you just need sound recognition you can try Teachable Machine that makes it easy to recognize short form sounds eg 1 second in length. I have not seen a full voice recognition conversion yet as those tend to be quite large in file size, but sound recognition is most certainly possible. check:
And then select audio project. If you like what it trains in browser you can click download on top right and save the model files generated to your computer. All training is done in browser using TensorFlow.js so no server is used here other than to deliver the initial webpage so your sounds are never sent to a server.
If you want to do voice recognition in JavaScript it actually exists via the WebSpeech API:
You do not need TensorFlow.js to use that. It is part of the browser implementation and will use whatever OS level voice recognition exists.
Because I have a hearing impairment and the recognition rate of such products in real life is very low and there is no self-learning enhanced training feature.
So I want to research if tf.js has a self-learning unsupervised function. And improve the recognition rate.
If there are only short voice commands, it is not helpful for hearing impaired people.
So our short form audio detection would be good to inform you of sounds like a fire alarm, a gunshot, a doorbell etc - things that repeat or distinct. So in that sense it could be useful for that sort of a task to then trigger a push alert on your phone to notify you something needs attention which may otherwise be missed if one can not hear them.
In terms of voice recognition, right now, the API above is the best bet for JavaScript as the on device voice models to the best of my knowledge are Gigabytes in size I believe? Maybe @lgusm knows more on that voice recognition models or knows someone who does?
Thank you for your reply.
The fact is that I need to communicate with normal people.
I can’t use short speech to understand what normal people use to say.
I would like tf.js to provide a voice training version of long sentences.
MY PC is CPU i5-3470, and no GPU.
OS: windows 10 pro
env: miniconda
I wrote the code according to the instruction (GitHub - flashlin/deep_learning)
but it show the error message
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Fused conv implementation does not support grouped convolutions for now.
[[{{node StatefulPartitionedCall/wav2vec2/encoder/pos_conv_embed/conv/Conv1DWithWeightNorm}}]] [Op:__inference_restored_function_body_39909]
Function call stack:
restored_function_body
But if I use google colab,
How do I automatically collect unrecognizable sounds on the client side?
and perform training automatically
Enhance the learning and merge it into the trained model.
Google Colab is typically to try Python code out via browser - it seems lgusm’s suggestion above is Python based not JavaScript - it actually fires up a server to execute so may be trickier than using JS to gather sensor data from device as it is not front end on device.
If you want to do the data collection on the client side you would need to make your own custom version of Teachable Machine so that it could generate data in the right form you could use to retrain the model @lgusm suggested which you could then maybe convert to TensorFlow.js format via our converter? Do you know if that one is compatible for conversion @lgusm or has a JS implementation?
I like how easy Teachable Machine is to use,
However, Teachable Machine has no place to upload Teachable Machine trained models so that I can enhance them.
How do I view the Teachable Machine Audio Project Source Code?
Or can I customize a project?
That model I shared has just been published, there’s no TFJS version yet and it’s a little big (+200MB). I shared it because it’s a state of the art for Automatic Speech Recognition and can give some ideas.
So in terms of uploading previously saved training data to Teachable Machine I believe it does allow you to open arbitrary data saved from other TM produced models etc if you have access to them. You just need to click on the 3 lines at the top left to access the file menu to do so. Eg on this page: Teachable Machine
Check out @lgusm suggestions for acessing the raw code of TM though and there is also a fun codelab on how to make your own Teachable Machine for images here - but as audio classification is an image problem it may also help you out: