Hello everybody!
When I use ModifyGraphWithDelegate() function for acceleration with GPU, I see it take up too much time about 3-4s. Is there any way to replace the ModifyGraphWithDelegate() function or reduce the time when using the ModifyGraphWithDelegate() function?
I used :
-
TF-Lite 2.65
-
OS: ubuntu 18.04
-
Chip: ARM-(GPU) Mali-G series
This is my snippet code:
{
unique_ptrtflite::FlatBufferModel m_model;
unique_ptrtflite::Interpreter interpreter;
string modelFileName = “Path to model.tflie”;
m_model = tflite::FlatBufferModel::BuildFromFile(modelFileName.c_str());
if (m_model == nullptr) {
fprintf(stderr, “Failed to load model\n”);
exit(EXIT_FAILURE);
}TfLiteGpuDelegateOptionsV2 options = TfLiteGpuDelegateOptionsV2Default();
options.inference_priority1 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY;
options.inference_priority2 = TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE;
options.inference_priority3 = TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION;
options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT;
options.inference_preference = TFLITE_GPU_INFERENCE_PREFERENCE_FAST_SINGLE_ANSWER;
options.experimental_flags |= TFLITE_GPU_EXPERIMENTAL_FLAGS_CL_ONLY;auto theGpuDelegate = tflite::Interpreter::TfLiteDelegatePtr(TfLiteGpuDelegateV2Create(&options), TfLiteGpuDelegateV2Delete);
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder(*m_model.get(), resolver)(&interpreter);auto start = chrono::steady_clock::now();
if(interpreter->ModifyGraphWithDelegate(theGpuDelegate.get()) != kTfLiteOk) throw std::runtime_error(“Fail modify graph with GPU delegate”);
auto end = chrono::steady_clock::now();if (interpreter->AllocateTensors() != kTfLiteOk) throw std::runtime_error(“Fail to allocate tensors”);
cout << " Init time in milliseconds: "
<< chrono::duration_castchrono::milliseconds(end - start).count()
<< " ms" << endl;
};
Thank you.