My code runs fine on my machine, doing signal filtering and inference in about 2 minutes. The same code takes about 8 minutes on GCP. Everything is slower, including e.g. calls to scipy.signal functions. The delay seems to be in PyCapsule.TFE_Py_Execute. Tensorflow 2.15.1 on both machines, numpy, scipy, scikit-learn, nvidia* are the same versions. The only difference I see that might be relevant is the version of python on GCP is from conda-forge.
Any insights greatly appreciated!
My machine (i9-13900k, RTX A4500):
└─ 82.053 RawClassifier.classify ../../src/module/classifier.py:209
├─ 71.303 Model.predictions ../../src/module/model.py:135
│ ├─ 43.145 Model.process ../../src/module/model.py:78
│ │ ├─ 24.823 load_model keras/src/saving/saving_api.py:176
│ │ │ [5 frames hidden] keras
│ │ └─ 17.803 error_handler keras/src/utils/traceback_utils.py:59
│ │ [22 frames hidden] keras, tensorflow, <built-in>
│ ├─ 15.379 Model.process ../../src/module/model.py:78
│ │ ├─ 6.440 load_model keras/src/saving/saving_api.py:176
│ │ │ [5 frames hidden] keras
│ │ └─ 8.411 error_handler keras/src/utils/traceback_utils.py:59
│ │ [12 frames hidden] keras, tensorflow, <built-in>
│ └─ 12.772 Model.process ../../src/module/model.py:78
│ ├─ 6.632 load_model keras/src/saving/saving_api.py:176
│ │ [6 frames hidden] keras
│ └─ 5.580 error_handler keras/src/utils/traceback_utils.py:59
Compared to GCP (8 vCPU, T4):
└─ 262.203 RawClassifier.classify ../../module/classifier.py:212
├─ 226.644 Model.predictions ../../module/model.py:129
│ ├─ 150.693 Model.process ../../module/model.py:72
│ │ ├─ 25.310 load_model keras/src/saving/saving_api.py:176
│ │ │ [6 frames hidden] keras
│ │ └─ 123.869 error_handler keras/src/utils/traceback_utils.py:59
│ │ [22 frames hidden] keras, tensorflow, <built-in>
│ ├─ 42.631 Model.process ../../module/model.py:72
│ │ ├─ 6.830 load_model keras/src/saving/saving_api.py:176
│ │ │ [2 frames hidden] keras
│ │ └─ 34.270 error_handler keras/src/utils/traceback_utils.py:59
│ │ [16 frames hidden] keras, tensorflow, <built-in>
│ └─ 33.308 Model.process ../../module/model.py:72
│ ├─ 7.387 load_model keras/src/saving/saving_api.py:176
│ │ [2 frames hidden] keras
│ └─ 24.427 error_handler keras/src/utils/traceback_utils.py:59
And more detail on the GCP run. Note the next to the last line that calls PyCapsule.TFE_Py_Execute:
├─ 262.203 RawClassifier.classify ../../module/classifier.py:212
│ ├─ 226.644 Model.predictions ../../module/model.py:129
│ │ ├─ 226.633 Model.process ../../module/model.py:72
│ │ │ ├─ 182.566 error_handler keras/src/utils/traceback_utils.py:59
│ │ │ │ ├─ 182.372 Functional.predict keras/src/engine/training.py:2451
│ │ │ │ │ ├─ 170.326 error_handler tensorflow/python/util/traceback_utils.py:138
│ │ │ │ │ │ └─ 170.326 Function.__call__ tensorflow/python/eager/polymorphic_function/polymorphic_function.py:803
│ │ │ │ │ │ └─ 170.326 Function._call tensorflow/python/eager/polymorphic_function/polymorphic_function.py:850
│ │ │ │ │ │ ├─ 141.490 call_function tensorflow/python/eager/polymorphic_function/tracing_compilation.py:125
│ │ │ │ │ │ │ ├─ 137.241 ConcreteFunction._call_flat tensorflow/python/eager/polymorphic_function/concrete_function.py:1209
│ │ │ │ │ │ │ │ ├─ 137.240 AtomicFunction.flat_call tensorflow/python/eager/polymorphic_function/atomic_function.py:215
│ │ │ │ │ │ │ │ │ ├─ 137.239 AtomicFunction.__call__ tensorflow/python/eager/polymorphic_function/atomic_function.py:220
│ │ │ │ │ │ │ │ │ │ ├─ 137.233 Context.call_function tensorflow/python/eager/context.py:1469
│ │ │ │ │ │ │ │ │ │ │ ├─ 137.230 quick_execute tensorflow/python/eager/execute.py:28
│ │ │ │ │ │ │ │ │ │ │ │ ├─ 137.190 PyCapsule.TFE_Py_Execute <built-in>
│ │ │ │ │ │ │ │ │ │ │ │ └─ 0.040 <listcomp> tensorflow/python/eager/execute.py:54