Embeding the TensorFlow model vs asking to TensorFlow serving server

byounguk.lee · July 20, 2023, 8:39am

Hi, I want to use TensorFlow model in my application server.
I have to decide how I use it.

[My question]
My questions is whether “Asking TensorFlow Serving server is always better than Embedding the TensorFlow model in my application server”

[In detail]
I found that there are two ways to use TensorFlow model in Production environments.

(1) Load the saved model in my application server. And use it for inferencing in my application server

(2) Make the TensorFlow serving server such as TensorFlow Serving. And Make my application server use the TensorFlow Serving Server.

When I google it, I found most people use the way (2)

I thought its because there are some advantages for the way (2)

a. Acceleration by the dedicated hardware
b. Easily constructing CI/CD TensorFlow models by leveraging features of TensorFlow Serving
c. Easily scaled out when there are not enough resources at TensorFlow Serving server.
d. … and so on.

[My question agian]
But I think (1) is also good because (2) makes additional network latencies although (2) can use some advanced networking such gRPC.

Could you share your experience for using TensorFlow Model in Production environment?

Topic		Replies	Views
How to deploy tf-serving for maximum throughput for inference on metal and kubernetes? General Discussion tf-serving	1	1247	September 19, 2024
Questions about serving tensorflow saved model locally General Discussion models , help_request	1	532	September 19, 2021
How to properly deploy Keras models for inference in Python? General Discussion models , keras , help_request	7	2191	March 31, 2022
Distribute Tensorflow - Training Neural Networks on Client Device General Discussion tf-serving	1	326	July 7, 2023
Legitimate method to run quantized model on server? General Discussion models , tflite	1	690	January 30, 2024

Embeding the TensorFlow model vs asking to TensorFlow serving server

Related topics