My worker microservice uses TFJS to predict video frames using a container running on a cluster of VMs on Google Kubernetes Engine (GKE). I’m using a gpu-enabled container which is built on top of the tensorflow/tensorflow-nightly-gpu image. That image is 2.67 GB! and it takes several minutes to start up after my worker VM is ready. It looks like the NVIDIA CUDA libs are the bulk of that, at 1.78 GB + 624 MB.
Can I minimize the CUDA installation in any way since I’m only using TFJS for prediction/inference, not training, and using the tfjs-node-gpu
WebGL-enabled backend? Are there any smaller base images that will support TFJS prediction?