I feel like this might not be the right place to post this, but I don’t know where else to go. If you have any suggestions about where I should be posting, please let me know.
I am working on a project where I am running 7 instances of a MoveNet model to analyze live camera footage from 7 cameras (!).
- libtensorflow-gpu-windows-x86_64-2.7.0
- cuda_11.2.0_460.89_win106
- cppflow with a toolkit called openFrameworks
I have run it on 2 different setups:
-
Asus ROG laptop: NVIDIA GeForce RTX 2060, Intel Core i7-9750 @ 2.6Ghz, 16GB memory
I get around 12fps from each instance -
Custom Tower: NVIDIA Quadro RTX 5000, 2x Intel Xeon E5-2640 @ 2.5Ghz (yes, the mobo has 2 cpu sockets!), 16GB memory
I get around 3fps from each instance
I would have expected the Quadro RTX 5000 to blow the RTX 2060 out of the water. According to userbenchmark.com, you should expect +53% performance from the Quadro 5000. But, in practice, it’s less than a third as fast. So clearly there are other factors at play here that affect performance more than the GPU, even though I am using the “GPU only” version of Tensorflow. I’d very much like to understand what these other factors are so that I can design the optimal hardware to ultimately run my software. Obviously the CPU in #2 is inferior, but I didn’t think that should have as much of an impact because I’m using the GPU version of Tensorflow.
Thanks in advance!