Load AND save model+weights TF2 C++ API

GosuPaper · April 28, 2022, 11:39am

Hi everyone,

I am building a C++ application including Tensorflow 2.6 in the aim to do classification or detection.

I managed to install the C++ API from source using Bazel.

I then started with classification. I managed to do training and prediction. It can tells if it is a dog or a cat most of the time which was already a great achievement starting from 0.

My problem now is to save and load this model.
I have tried to used WriteBinaryProto and ReadBinaryProto on .pb file but from my understanding, it only save the “architecture” of the model, like his composition ?
I have read about stuff concerning freezing a model to save the weights or I don’t really know what precisely… The trained part I assume. (If someone can clarify it would be appreciated indeed).
But freezing model seems to be in the past, at least for TF2 with python.

So I am not sure, is freezing a model still the way in TF2 using C++ API ? If not, can someone describe what i should do or at least give me ideas/explanations on how it works now ? I am a bit dry on this one.
I also read about checkpoint or something like that but did not manage to grasp it and use it.
To conclude, I also saw stuff about tensorflow::ops::Restore and tensorflow::ops::Save. But again no example and I am having trouble to make it works.
In the end, I find myself with 3 ideas but nothing that I managed to use haha.

Thank you for your help and ideas.

Bhack · April 29, 2022, 2:21pm

Why you need to train the model in c++?

If I remember correctly we have only saved_model loader API in c++:

GosuPaper · May 2, 2022, 6:08am

I need to do as much as possible in C++ because we need a compiled solution so end-user can’t play with code.

I have found example of CNN trained and saved with the C++ API in TF1.x using frozen graph and checkpoint. From my understanding, frozen graph are not the way in TF2.x but checkpoint might be. Anyway, I don’t see why the possibility to save model and parameters would have been totaly removed. There has to be a solution.

Bhack · May 2, 2022, 10:40am

It was already suggest that TF Is not a train ready c++ library:

https://tensorflow-prod.ospodiscourse.com/t/discuss-pros-cons-between-tensorflow-core-and-tensorflow-js-please/8138/30?u=bhack

What is your real problem here? Do you need to obfuscate your training code to the customer?

Bhack · May 2, 2022, 10:49am

If you need on device finetuning instead you could use:

RFC: On-device training with TensorFlow Lite

tensorflow:master ← miaout17:tflite-training-rfc

opened 04:52PM - 07 Jun 21 UTC

miaout17

+309 -0

We're sharing this RFC to reflect our newest thoughts of implementing on-device …training in TensorFlow Lite. We didn't setup a timeline to close the comments. We want to surface the RFC early for transparency and get feedback. | Status | Draft | | :------------ | :----------------------------------------------------------------------------------------------------------- | | **Author(s)** | Yu-Cheng Ling (ycling@google.com), Haoliang Zhang (haoliang@google.com), Jaesung Chung (jaesung@google.com) | | **Sponsor** | Jared Duke (jdduke@google.com) | | **Updated** | 2021-06-04 | ## Introduction TensorFlow Lite is TensorFlow's solution for on-device machine learning. Initially it only focused on *inference* use cases. We have increasingly heard from users regarding the need for on-device *training*. This proposal lays out the concrete plan & roadmap for supporting training in TensorFlow Lite.

GosuPaper · May 2, 2022, 11:26am

Yes I know that ! Except it is… I manage this morning by cheating a bit eheh.

As i said, i read about freezing model in previous version of TF1.X. This is not working anymore as it appears it is not even included in the “build from source” way. BUT I have changed a bit the header and the corresponding source file I found in the git repository (seems like they are not built but still there in git repo TF2.x) and instead of trying to build them from source… I have added those 2 directly to my C++ project.

Surprise surprise, it works. I can do train, save, load and inferences without any trouble using only C++.

Still, as a developer, it feels a lot like cheating and that can’t be a good practice… Like no way wtf. I can’t believe they removed such important feature that was working. And certainly not without allowing to do it an other way… Would be very odd.

Of course, I am still listening to any proposition/solution that could load and save using TF2.x C++ library without tricks !

Anyway, thank for your help Bhack. Was not what I was looking for in this case but it is very nice of you to give me other possibilities. Oh and… Yes, obfuscating is a possibility but we are not very confident in the security it provide…

Mark_Daoust · May 2, 2022, 1:16pm

Ah, that makes more sense.

If you can load a Saved Model and run a particular signature, that should be all you need to make this work in a less hacky way. Follow that “On-Device training” tutorial, and just skip the “convert to tensorflow-lite” part. In python you build a model with signatures like “initialize”, “train_step”, “save”, “load”, “inference” and then in your target environment you call those as needed.

GosuPaper · May 2, 2022, 1:48pm

From what I am seeing in the tutorial, it is using checkpoint. It is a solution i tried without success… I did not manage to make it with C++.
I found something like that during my research which seems pretty close to the tutorial…

// save
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().save_tensor_name()}, nullptr);

// restore
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().restore_op_name()}, nullptr);

saver_def.filename_tensor_name is supposed to be the name of the tensor you must feed with a filename when saving/restoring.
saver_def.restore_op_name is supposed to be the name of the target operation you must run when restoring.
saver_def.save_tensor_name is supposed to be the name of the target operation you must run when saving.
But something was not working, no .ckpt files were created. Maybe I should try to replace the op_name with the signatures you suggested… I don’t know because I did not found any tips on this matter. I will try but with little hope haha.

craigacp · May 3, 2022, 3:18pm

We use the C API to save models in TF-Java, but it’s quite involved. You can trace things through from here which is our top level save method - java/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java at master · tensorflow/java · GitHub.

Bhack · May 3, 2022, 3:49pm

Thanks, I suppose we are near to @markdaoust’s advice :

gammatrix5 · August 19, 2022, 12:34am

ODT_novice · August 18, 2023, 2:42am

Dear markdaoust,
I am trying to implement on-device training by invoking my train.tflite using tensorflowlite_jni.so.
I added these two libraries in the CMakeLists.txt file:
add_library( tensorflowlite_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_jni.so )

add_library( tensorflowlite_flex_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_flex_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_flex_jni.so )

I used the following command to invoke the train signature in my train.tflite file:
TfLiteSignatureRunnerInvoke(train_model_info.signature_info[2].runner);

However, I encountered the following error:
Select TensorFlow op(s), included in the given model, is(are) not supported by this interpreter.
Make sure you apply/link the Flex delegate before inference.
For the Android, it can be resolved by adding “org.tensorflow:tensorflow-lite-select-tf-ops” dependency.
Node number 1409 (FlexBroadcastGradientArgs) failed to prepare.

I am unable to use libtensorflowlite_flex_jni.so to support this operation.
If I directly remove libtensorflowlite_jni.so from my project, I encounter the following error:
undefined reference to `TfLiteSignatureRunnerInvoke’

I would like to know how to run these two libraries together if I want to use the C API for on-device training. How can I make libtensorflowlite_flex_jni.so support the operations that libtensorflowlite_jni.so does not support?

Thank you.

Topic		Replies	Views
On-device training for fashion mnist General Discussion tflite	4	658	May 11, 2023
TensorFlow C++ API: Guides/Documentation? General Discussion education , help_request , tfcore	1	4862	September 6, 2021
Error when load tflite model with c++ General Discussion tflite , help_request	1	2466	November 30, 2022
Model zoo TF1 frozen graph conversion fails or creates a .tflite file that cannot be used inside android General Discussion tflite , model_garden , help_request	3	2342	December 23, 2021
Tensorflow C++ api General Discussion api , tensorflow	1	251	June 17, 2024

Load AND save model+weights TF2 C++ API

Related topics