Load AND save model+weights TF2 C++ API

Hi everyone,

I am building a C++ application including Tensorflow 2.6 in the aim to do classification or detection.

I managed to install the C++ API from source using Bazel.

I then started with classification. I managed to do training and prediction. It can tells if it is a dog or a cat most of the time which was already a great achievement starting from 0.

My problem now is to save and load this model.
I have tried to used WriteBinaryProto and ReadBinaryProto on .pb file but from my understanding, it only save the ā€œarchitectureā€ of the model, like his composition ?
I have read about stuff concerning freezing a model to save the weights or I donā€™t really know what preciselyā€¦ The trained part I assume. (If someone can clarify it would be appreciated indeed).
But freezing model seems to be in the past, at least for TF2 with python.

So I am not sure, is freezing a model still the way in TF2 using C++ API ? If not, can someone describe what i should do or at least give me ideas/explanations on how it works now ? I am a bit dry on this one.
I also read about checkpoint or something like that but did not manage to grasp it and use it.
To conclude, I also saw stuff about tensorflow::ops::Restore and tensorflow::ops::Save. But again no example and I am having trouble to make it works.
In the end, I find myself with 3 ideas but nothing that I managed to use haha.

Thank you for your help and ideas.

Why you need to train the model in c++?

If I remember correctly we have only saved_model loader API in c++:

I need to do as much as possible in C++ because we need a compiled solution so end-user canā€™t play with code.

I have found example of CNN trained and saved with the C++ API in TF1.x using frozen graph and checkpoint. From my understanding, frozen graph are not the way in TF2.x but checkpoint might be. Anyway, I donā€™t see why the possibility to save model and parameters would have been totaly removed. There has to be a solution.

It was already suggest that TF Is not a train ready c++ library:

https://tensorflow-prod.ospodiscourse.com/t/discuss-pros-cons-between-tensorflow-core-and-tensorflow-js-please/8138/30?u=bhack

What is your real problem here? Do you need to obfuscate your training code to the customer?

If you need on device finetuning instead you could use:

See also our thread at:

Yes I know that ! Except it isā€¦ I manage this morning by cheating a bit eheh.

As i said, i read about freezing model in previous version of TF1.X. This is not working anymore as it appears it is not even included in the ā€œbuild from sourceā€ way. BUT I have changed a bit the header and the corresponding source file I found in the git repository (seems like they are not built but still there in git repo TF2.x) and instead of trying to build them from sourceā€¦ I have added those 2 directly to my C++ project.

Surprise surprise, it works. I can do train, save, load and inferences without any trouble using only C++.

Still, as a developer, it feels a lot like cheating and that canā€™t be a good practiceā€¦ Like no way wtf. I canā€™t believe they removed such important feature that was working. And certainly not without allowing to do it an other wayā€¦ Would be very odd.

Of course, I am still listening to any proposition/solution that could load and save using TF2.x C++ library without tricks !

Anyway, thank for your help Bhack. Was not what I was looking for in this case but it is very nice of you to give me other possibilities. Oh andā€¦ Yes, obfuscating is a possibility but we are not very confident in the security it provideā€¦

Ah, that makes more sense.

If you can load a Saved Model and run a particular signature, that should be all you need to make this work in a less hacky way. Follow that ā€œOn-Device trainingā€ tutorial, and just skip the ā€œconvert to tensorflow-liteā€ part. In python you build a model with signatures like ā€œinitializeā€, ā€œtrain_stepā€, ā€œsaveā€, ā€œloadā€, ā€œinferenceā€ and then in your target environment you call those as needed.

From what I am seeing in the tutorial, it is using checkpoint. It is a solution i tried without successā€¦ I did not manage to make it with C++.
I found something like that during my research which seems pretty close to the tutorialā€¦

// save
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().save_tensor_name()}, nullptr);

// restore
tensorflow::Tensor checkpointPathTensor(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensor.scalar<std::string>()() = "some/path";
tensor_dict feed_dict = {{graph_def.saver_def().filename_tensor_name(), checkpointPathTensor}};
status = sess->Run(feed_dict, {}, {graph_def.saver_def().restore_op_name()}, nullptr);

saver_def.filename_tensor_name is supposed to be the name of the tensor you must feed with a filename when saving/restoring.
saver_def.restore_op_name is supposed to be the name of the target operation you must run when restoring.
saver_def.save_tensor_name is supposed to be the name of the target operation you must run when saving.
But something was not working, no .ckpt files were created. Maybe I should try to replace the op_name with the signatures you suggestedā€¦ I donā€™t know because I did not found any tips on this matter. I will try but with little hope haha.

We use the C API to save models in TF-Java, but itā€™s quite involved. You can trace things through from here which is our top level save method - java/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java at master Ā· tensorflow/java Ā· GitHub.

1 Like

Thanks, I suppose we are near to @markdaoustā€™s advice :slight_smile: :

Dear markdaoust,
I am trying to implement on-device training by invoking my train.tflite using tensorflowlite_jni.so.
I added these two libraries in the CMakeLists.txt file:
add_library( tensorflowlite_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_jni.so )

add_library( tensorflowlite_flex_jni SHARED IMPORTED )
set_target_properties( tensorflowlite_flex_jni PROPERTIES IMPORTED_LOCATION ${JNI_DIR}/${ANDROID_ABI}/libtensorflowlite_flex_jni.so )

I used the following command to invoke the train signature in my train.tflite file:
TfLiteSignatureRunnerInvoke(train_model_info.signature_info[2].runner);

However, I encountered the following error:
Select TensorFlow op(s), included in the given model, is(are) not supported by this interpreter.
Make sure you apply/link the Flex delegate before inference.
For the Android, it can be resolved by adding ā€œorg.tensorflow:tensorflow-lite-select-tf-opsā€ dependency.
Node number 1409 (FlexBroadcastGradientArgs) failed to prepare.

I am unable to use libtensorflowlite_flex_jni.so to support this operation.
If I directly remove libtensorflowlite_jni.so from my project, I encounter the following error:
undefined reference to `TfLiteSignatureRunnerInvokeā€™

I would like to know how to run these two libraries together if I want to use the C API for on-device training. How can I make libtensorflowlite_flex_jni.so support the operations that libtensorflowlite_jni.so does not support?

Thank you.