Details of inter_op and intra_op parallelism threads

Mehran_S · December 19, 2022, 9:30am

There exist two configuration parameters in TensorFlow serving to utilize CPU called tensorflow_intra_op_parallelism and tensorflow_inter_op_parallelism which tuning them can have great impact on the model server performance (throughput, latency). I could not find a good documentation for them in TensorFlow serving website. My main question is:

How these thread pools related with rest_api_num_threads. Are the thread pools shared between different requests going to the model server?

Topic		Replies	Views
TensorFlow Serving parallelism General Discussion models , tf-serving	1	750	March 27, 2025
Tensorflow serving GRPC mode General Discussion models , serving	0	1658	August 26, 2022
About parallel processing in TensorFlow General Discussion help_request	2	397	March 3, 2024
How to deploy tf-serving for maximum throughput for inference on metal and kubernetes? General Discussion tf-serving	1	1233	September 19, 2024
Blog post about load-testing with TFServing and FastAPI on k8s Show and Tell fastapi , education , tf-serving , deployment , load-test	0	2097	July 18, 2022

Details of inter_op and intra_op parallelism threads

Related topics