TensorFlow Serving parallelism

Mehran_S · January 11, 2023, 10:55pm

TensorFlow serving provides two parameters to utilize the CPU (tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism). Tuning these parameters can have great impact on the model server performance (throughput, latency). I couldn’t find a good documentation for them in TensorFlow Serving website. My main question is:

How these thread pools relate to rest_api_num_threads . Are the thread pools shared between ops of all the requests on the model server?

Jetti_Bharat · March 27, 2025, 6:49pm

Hello @Mehran_S

Thank you for using TensorFlow

The tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism are parameters used by TensorFlow during model execution, rest_api_num_threads affects the number of threads available for handling incoming API requests in Serving. The thread pools for these are not directly shared.
Tuning tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism will directly may improve model inference performance, while rest_api_num_threads handles multiple concurrent requests from requests.

Topic		Replies	Views
Details of inter_op and intra_op parallelism threads General Discussion tf-serving	0	751	December 19, 2022
Tensorflow serving GRPC mode General Discussion models , serving	0	1658	August 26, 2022
About parallel processing in TensorFlow General Discussion help_request	2	397	March 3, 2024
How to deploy tf-serving for maximum throughput for inference on metal and kubernetes? General Discussion tf-serving	1	1233	September 19, 2024
Blog post about load-testing with TFServing and FastAPI on k8s Show and Tell fastapi , education , tf-serving , deployment , load-test	0	2097	July 18, 2022

TensorFlow Serving parallelism

Related topics