Parallel inferencing - image classification model

IamExperimenting_Now · June 27, 2022, 3:05am

I have built an image classification model using TF2. I need to do batch prediction. I’m using p3.8xlarge ec2 instance which has 4 GPU and 32vCPU core. My model size is 40MB.

I have a question, is it possible to attach the model to 4GPU? like, replicate the model architecture to 4GPU and run the inference parallelly.

Let’s say, If I pass 16 samples to the prediction endpoint machine, in that machine model has been loaded to the 4GPU instances. can I split the 16 samples into 4x4 and do the prediction at the same time?

Ramyar_Jahani · June 28, 2022, 7:21am

Out of the box, MLServer includes support to offload inference workloads to a pool of workers running in separate processes.

Topic		Replies	Views
How to do parallel batch inferencing? General Discussion models , keras , help_request , tensorflow	1	3196	July 11, 2022
Running multiple inference models in parallel on 1 GPU General Discussion models , gpu , help_request	3	4981	April 18, 2022
How to do batch prediction in pyspark? General Discussion models , help_request	1	778	July 9, 2022
Running out of GPU memory with just 3 samples of 640x480x3 images General Discussion tfjs , gpu , help_request	11	9441	October 20, 2021
OOM while training LLM General Discussion tfkeras , gpu , llm	1	143	April 24, 2024

Parallel inferencing - image classification model

Related topics