At the beginning of this year, I’ve spent a lot of time learning about handwriting and speech recognition, but there wasn’t any out-of-the-box solution in TFJS since most of the solutions rely on the CTC loss calculation algorithm. I’ve found this in the issue list: https://github.com/tensorflow/tfjs/issues/1759 and also some hints that “it would be good to have it in TFJS”.
Since I had time, I prepared a naive implementation of the original paper, which currently fits into the TFJS ecosystem - I mean, it’s callable, it runs one sample ok, it handles batches as well, fits into the layered models’s call hierarchy, and calculates the loss and the gradient so that model.fit() would work.
I’m struggling with the tests. The obvious ones (prediction and the label is the same so it returns the expected zero gradient) are there, but there aren’t any I could find in the Python implementation. So if somebody could help me out with that, I’d be really grateful.
Also, I’m planning to donate the code to the TFJS project, but wouldn’t want to commit faulty code.
Thanks for sharing your enthusiasm and contributing to the TensorFlow.js ecosystem! This sounds like a really interesting project. Just a heads up a lot of our engineers are out of office for the holidays this week so it may be best to pick this up in the new year.
On a somewhat related note however (I know this is not handwriting recognition) I have seen a promising library for OCR come out in TFJS recently from a group of folk over in France. They may have some pointers for guidance for you as similar domain in the meantime.
PS you said you had a demo somewhere - is it somewhere I can try for handwriting / speech recognition?
Thanks Jason, I’ll keep the holiday season in mind. About the OCR stuff - it’s an interesting project (and the documentation is very straightforward, which is great), and they have CTC in place calculating the loss, but they are using it in the “usual” way: build and train the model in python, use inference in TypeScript. So unfortunately this doesn’t help my case.
About the demo - nah, I don’t have it yet, I’m concentrating on the basics for now. I realized, that you can’t really do any recognition task without CTC in place for the learning process (it’s just so powerful), and since I’m working in pure TS, this is something that needs to be programmed from scratch.
I’ll think about a way to share the code as it is. Meanwhile, if somebody has a good idea for testing, don’t hesitate to reply
Thanks for the reply. Ah ok thanks for pointing that out, yes they are converting. It would indeed be nice to have everything in JS. I have added this forum post to the bug that @Bhack linked to as well to see if that nudges anything, but may be worth also following up in the new year once people are likely back.
Please do nudge this thread / me in Jan and I can also follow up internally to see what status is for this one.
Will do. In the meantime, I dug up the repository, and found some Python test cases, so I’ll work with that. Also, as promised, I shared my code on GitHub. Feel free to check it out, any comment is welcomed.
Thank you for sharing! A fitting end of year holiday present to the community I like how open you are in your write up about your learning experience it is very refreshing to read. I think a lot of newer folk could benefit from detailed write ups like this. In terms of finding others from the community who may have interest in looking into this, I have a personal discord that I use for events and our working group where a few folk have gathered to keep in touch that may be worth posting on too just in case anyone there may be interested (no promises but maybe someone is free over the holidays)
Just pushed v0.0.3 to the repository. I’ve documented lot’s of stuff regarding design choices and performance metrics. Feel free to use it. I’ll work on proper running tests (Jasmine), and try to bring it to NPM for easy integration - will be first time, but hey, man’s gotta start somewhere