There exist two configuration parameters in TensorFlow serving to utilize CPU called tensorflow_intra_op_parallelism
and tensorflow_inter_op_parallelism
which tuning them can have great impact on the model server performance (throughput, latency). I could not find a good documentation for them in TensorFlow serving website. My main question is:
- How these thread pools related with rest_api_num_threads. Are the thread pools shared between different requests going to the model server?