So, we want to build an application for some machine learning stuff and we are having trouble to define which of the API we should choose so I would be very happy if we could discuss the pros and cons of each choice. The thing is we are quite open minded about the application architecture but we do have some preferences.
First of all, as I understood from my readings, Tensorflow Core is for local application like desktop application and will use the CPU or GPU of the machine that runs it. Is it correct ?
While Tensorflow js allows us to do ML tasks in the browser directly by using the ressources of the machine. OR we can use the server ressources to do the calculations (with node backend) and then it is displayed in the browser. Is it correct ?
Other questions that come to my mind :
Is the Core API on local much faster than the js API doing calculations on server side with Node backend ?
I saw that the Core API can use C++, is it native as python ? Because it would be one of our preferences : the team is used to C++ but not realy to python or javascript.
Is it ok to do a client/server and make the server uses Core API for calculation ? Instead of going for the js API ? Again, it is by commodity. We are used to do application with a server and different client so everything could be in C++ using the Core API ? It would also be nice because in that case we can implement a client/server local for now and if the need come to go online, it would not change much.
An other possibility would be to buy access to a high performance computing cloud simulation so we use the Core API on their server and we have a simple interface on our side i suppose ?
Feel free to do proposition and correct me. I am completly new to all this so it is basicaly all ideas and guess.
Both TensorFlow core and tf.js can use GPU optimizations by using CUDA and WebGL, respectively. However, if using node you can use CUDA. tf.js will usually be around 10-15x slower training models than TensorFlow, though.
I don’t think you can use TensorFlow’s C++ API very well. You can’t really construct models and do that many calculations and is usually reserved for very low-level ops.
It really depends on the size and latency of your model. If you think it is small enough to run on most devices without much lag, then feel free to load the model via tf.js. However, if you feel like your model might be too large or laggy on most devices, I highly suggest you work with a server.
I would also get the opinions of other people on the forum, but that’s my two cents
So to compare TensorFlow.js with TensorFlow Python you must consider the following:
Are you planning to execute client side or server side. If server side then you can compare TensorFlow.js Node with TensorFlow Python directly. They are essentially exactly the same - as they are both just wrappers around the C++ core API that TensorFlow itself is written in. For that reason they both have CUDA and AVX support and inference speeds are exactly the same for the model inference alone HOWEVER if you use TensorFlow.js Node you can have potentially 2x faster end to end processing times when you take into account the pre / post processing that can occur as the JavaScript is faster than Python with the JIT compiler. This is documented by Hugging Face here:
In this server side case you would then need to expose an API to send requests and receive results from the server to update the client GUI etc. You may want to use WebSockets for this that Node.js has excellent support and easy to use in a few lines of code with Socket.IO or similar - another reason to use Node. Of course server side execution is fine if you do not need the benefits for client side execution in the browser detailed below.
If you are considering client side inference then your only option is TensorFlow.js in the web browser. In this case you gain the following advantages over server side (Python or C++) execution:
Lower latency - no round trip time to server and back again to wait for result to come back from server. On a mobile connection this can be quite significant if in a field or such with lower connectivity.
Privacy - all inference is performed on device with direct access to sensors like camera, mic etc. No data is sent to cloud for inference preserving user privacy of sensor data etc. We have seen a lot of startups use TensorFlow.js in fitness, healthcare, and sports precisely for this reason.
Lower cost - no need to hire expensive GPUs and keep them running 24/7 to provide a service. You can instead just use a CDN to deliver static assets which is much cheaper
The reach and scale of the web - anyone anywhere can try your demo without needing to setup a serverside linux environment, with TF and CUDA installed correctly, and all the dependencies you may have, and also then need to clone your github etc etc. A user can simply visit a website and it just works.
Zero install - no app install required creating a frictionless experience for end user that can run on desktop, mobile, tablet or anywhere that a webpage can be opened.
It should be noted that if you are running client side in browser then the performance you will get will depend on the device you are running on. However for our new models that is usually pretty fast! For example our MoveNet pose estimation model can run at over 120 FPS on a 1070 GPU via WebGL in the browser! Most webcams only run at 30 FPS so this is more than enough horsepower for almost any task you throw at it for this model.
If you are interested in learning more about the differences and go from zero to hero with TensorFlow.js then I just launched a new free course on EdX you can sign up to right now - Chapters 1 and 2 will probably answer most of your initial questions and you can get through those pretty fast. Enjoy! https://www.edx.org/course/google-ai-for-javascript-developers-with-tensorflowjs
Also some preprocessing layers now could be embedded in the model itself like with Keras preprocessing layers so that they are just native compositional ops.
I don’t know if with TF.js node we have also support for non-CUDA/Nvidia GPU (e.g. AMD/Rocm) like with python.
It does depend on the model you are using of course yes. The model referenced above was for natural language - Hugging Face’s DistilBERT implementation and the performance boost I was referring to comes from the pre/post processing only when written in the language of choice out of Py or JS, where the JIT compiler can have an advantage over executing plain old Python code that does the same thing. This is not for training / graph execution.
Yes, also on inference, when we are in a server/client setup we need to consider pros and cons of other solution like TFX Serving related to the specific context/use case.
Very much depends on the skillset of end user and their needs of course. I am unsure if this user wants near real-time or not so was giving view of what would enable fastest response from my prior dabbling in this area.
My perspective above comes from the pov of a Web Engineer which is where most of my background is based. Most TensorFlow.js folk I have worked with so far tend to favour that sort of setup if keeping things server side if they need something to work closer to real time. I believe over 50% of people use Node as their serverside framework of choice as per the StackOverflow survey in 2020.
Also I much prefer web sockets over REST based APIs which TF Serving uses if you want any sort of real time stream of data being classified. However if you do not need low latency and high throughput then of course anything will do that can take data in and call a binary of your choice whether Node or Python or C++ and stuff like TFX is set up to make a nice pipeline for that if ones does not have a web engineering team to set up a custom pipeline.
I wrote this article a long time ago before Cloud AutoML was a thing for how to train and deploy ML models in the cloud from front end etc using REST approach which may be of interest to folk:
If I rewrote this today, I would use web sockets instead, hence the rec for that. Depends how much control the user needs over their end to end pipeline etc. Certainly more work to maintain a custom one so if you do not need faster results etc then an off the shelf packaged solution is probably just fine.
Firstly, thanks a lot for all your responses. It confirms some reading and explains some others so great !
To give you a bit of context because I did not explained it at all and it could give a direction on which architecture choose : We want some image processing to identify some defect on materials. The use case must be versatile.
We must be able to train the model whenever we want with our datas.
We must be able to do all kind of prediction. From "feeding the trained model with only one image " to “feed the trained model with a constant flux of images coming from a live video and saving the predictions for each frame (some queue will do just fine)”.
Yet, does this kind of work require a lite or heavy model; I have no idea. I think the API Object Detection from Tensorflow was mentionned but I don’t know much more.
So, about the client or server side ? Well… BOTH haha. As we are used to software development in C++, the idea was to do a basic server/client software running all in local on the same machine to begin with. Everything would be completly transparent for the end user. This way, we would have something working localy with an easy way to upgrade to local network or even internet. Docker makes it easy to use containers sharing the ressources of servers.
From my understanding, the “classic” Tensorflow seems to fit the need.
tfjs client side could be an easy good start but is less “upgradable”, feels like there would be a lot of change if we want to go server/client after.
tfjs using node on server could be a solution but Bhack raised an interesting point about threads and anyway we are not realy a web development team so why bother if tensorflow is pretty much equivalent.
If you have better ideas or if you think i missed a point, I am all ears obviously !
Assuming we are going for Tensorflow, i have some questions concerning the best/easy API to use.
It seems we are going to use the C++ API or the python API. The C++ seems to be supported by tensorflow but there is no API stability promises and is less complete than the python one. Do you know what they mean when they say that the python API is “more complete and easiest” ? Is a lot of functionnalities missing on the C++ API ? Is there restrictions ? Is it harder because it is on a lower level but can do the same ? Integrating the C++ API in a C++ developed software seem to be a fair idea but i also have read a lot about trouble installing and using it on a Windows10 x64 environment which seems odd. + what Abhiraam_Eranti mentionned is concerning. If someone can clarify those points it would be nice so we understand what is going to hit us.
We could probably use the python API without too much trouble and do a whole python software but then there is the question concerning code protection. Since python is an interpreted language, it is very easy to reuse for our client. Compiled language like C++ is not totaly reverse engineering proof but it is certainly a different kettle of fish. The only way we know would be to do a SaaS so all the important code is on our side but then it forces us to handle and maintains servers with GPUs or to be client of a high performance computing cloud simulation.
Hm, I’m not sure to understand. Are you recommending me to define precisely what i want to do, check if a topic is already discussing it and if yes then I will be able to do it ?
Because what you linked are either problematics bring by users or explainations on how to start, right ?
I’m not at the step where i try to fix problems, I am at the step where I try to understand, to define what we should use to avoid as much as possible future problems. Like will i get stuck if i start to use C++ or C API over Python API ? If yes, why would that be ? Can we handle it ? That’s why we are openminded about the architecture of our application and the languages we will use. So basicaly if everyone is like “don’t do C++ because this, this and this” well surely I will do python.
If you follow that thread you can start to check some of the technical solutions available when you need to inference inside a c++ APP.
One of these, a third party lib (cppflow) it was created to simplify the inference over the low level TF C API so exploring that library you could verify yourself how much time you want spent in the case you want to go on low level c API yourself directly on TF.
More in general if you don’t start to set some constrain for your project it is hard to make some choiche (hw support, model type/overhead and ops coverage for the specific runtime etc.)
I think you could start to train your model in python then when you have identified what kind of model could solve your task you could start to reason about the best solutions for the deployment.
If you need to setup a production support infra you could evaluate Kuberflow/TFX.
Originally TensorflowJs was designed to be used in a browser, but this does not dispense (I insist on upstream training), the model thus trained can be imported and therefore used. It is necessary to have a server
Thanks for clarification Bhack.
I started to use the C API to do binding between language instead of using directly the C++ API and it works but realy it does not seems to be the best way to go. The little explanation I found about what is already implemented screams that we are going to be stuck soon enough when we will need to develop something else.
It is sad that i can’t find clear explanation on what are the limitations of C++ API compared to python API tho.
I think in the end we will end up with some client / server stuff in python and provide our solution as a SaaS. Seems the easiest way to go without too much trouble.
Still I have to say i’m realy surprised to see a provided C++ API so hard to install or use with so little explanation to understand it.
At this point it would be good to “get your hands dirty” and program a simple version of what you would like to achieve. I find that there is a big gaps between what people think this technology can do, and what it can actually achieve with and without an expert programmer.
For image processing, I would start by playing around with the Colab-based examples here:
Indeed, TFJS while it allows training in browser - is not something you want to do for complex networks like a deep CNN which will train but very slow. If wanting to use TensorFlow.js then we recommend Node.js for training on the server side, and then inference in the browser as inference is fast for many models now in JS. The only exception to this is transfer learning which can be very fast as already proven by websites like Teachable Machine that retrains new model training heads live in your browser in seconds.
AFAIU, TF that most users interact/code with uses Python as a wrapper around C++. For more background in the internals, check out tf.functions, graphs, and eager mode in Introduction to graphs and tf.function | TensorFlow Core and Better performance with tf.function | TensorFlow Core). For more reading, you can also check out XLA - TF’s compiler for matrix-on-matrix multiplications on accelerators (tip consider @tf.function(jit_compile=True)), and TensorFlow Eager (research paper) that’s covered in one of the above-mentioned docs.
In the end, maybe it’s about the end-goal (the what and the why), while the how is the tool of choice.
There are a lot of examples on the web/on GitHub/in various docs/in up-to-date (bestselling) ML books for approaching different ML tasks that are in Python, AFAIK, irrespective of the ML framework of choice.
The main thing to remember is that “training a model” and “using the model” are two totally separate processes.
Train your model using python. Don’t try to train a model in c++. Save the trained model as tf.saved_model. The saved model format is language/platform agnostic.
To use the model either use tensorflow-serving, or use the c++, js, python, or tflite apis load and run the model.
This example loading with c++ is incomplete … I think I need to merge this PR
TensorFlow provides a C API that can be used to build bindings for other languages. The API is defined in c_api.h and designed for simplicity and uniformity rather than convenience.
Unless your goal is to build a c++ tensorflow training library it’s likely safer & less work to learn enough python
Why?
Anything is possible. But, AFAIK there isn’t even a gradient function in the C++ api, all of that is implemented in the python layer.
If you need to drive some training from a lower-level language your best bet will be to build in python and export the a saved_model with a train_step(inputs, labels) signature.