I would like to know how can I calculate the throughput of CNN models.
In the context of serving a model it is impacted by different factors:
The throughput will depend on:
- The hardware that you’re running on
- What else is running on that hardware
- Whether you’re doing online or batch inference
- The configuration of your serving framework (such as TF Serving or the BulkInferrer component)
If you can pick a single set of hardware and configuration then you can make comparisons of different models. This is not unlike other types of throughput estimation.