Serving image models can be hard. One major problem is sending raw images in the request payloads. Then there’s a problem of training/serving skew.
In my latest blog post, I show how to locally deploy a ViT (Base-16) from Transformers that takes care of the above issues:
- We compress the image as a bae64 encoded string, thereby reducing the size of the payload considerably.
- We embed the preprcoessing and postprocessing ops within the serving model to reduce training/serving discrepancy.
Next up, we’ll learn how to scale these kinds of deployments with Docker and Kubernetes. If you’re like me, a fan of true serverless infra, there will be a piece on doing this stuff with Vertex AI too. Stay tuned for that