Hello,
I’m using a server with 72 CPU cores (no GPUs). I install the pip install "intel-tensorflow-avx512==2.8.0"
package as the server supports AVX512, and also set export TF_ENABLE_ONEDNN_OPTS=1
(I guess this isn’t required anyway with the Intel pkg). The driver setup works, i.e., the 72 CPUs work at 100%, when computing a toy example in python
import tensorflow as tf
A = tf.random.normal([int(1e5), int(1e5)])
B=tf.multiply(A,A)
However, when I execute the actual training script (It’s too long to post it here) nothing happens. It’s a Keras Functional API model, trains as Siamese net, the Dataset generator connects to Cassandra, etc.
htop
shows that 72 processes are allocated but only 1 process is running (i.e., 71 processes are idle at 0%). At least all the big tf.keras.layers.Dense
layers should perform parallel matrix multiplication.
Is there some sort of TF command, function or method I need to trigger within the python script?