TensorFlow serving provides two parameters to utilize the CPU (tensorflow_intra_op_parallelism
and tensorflow_inter_op_parallelism
). Tuning these parameters can have great impact on the model server performance (throughput, latency). I couldn’t find a good documentation for them in TensorFlow Serving website. My main question is:
- How these thread pools relate to rest_api_num_threads . Are the thread pools shared between ops of all the requests on the model server?