Parallel computations in general tensorflow without model.fit context

Cedric_Van_Heck · June 5, 2023, 2:49am

Goal

What I in the end want to accomplish is something with the functionality from the following:

def test_parallel():
    print(“here1”)
    time.sleep(5)
    print(“here2”)

from multiprocessing import Process
processes = []
for _ in range(3):
    p = Process(target=test_parallel, args=args, kwargs=kwargs)
    p.start()
    processes.append(p)
for process in processes:
    process.join()

With as expected output

here1
here1
here1
here2
here2
here2

Issue

Both multiprocessing and joblib seem to have issues in combination with tensorflow therefore I’m looking for a Tensorflow alternative or another solution.

more extensive explanation

I have a few objects of a class with a tensorflow Model as property “self.model”. Assuming I want to perform a certain function, “simulate” on each object, this would result in something like:

using Joblib

raises an error because obj is not pickle-able and in the args

Parallel(n_jobs=-1)(delayed(obj.simulate)(obj, args, kwargs) for obj in list_of_objects)

Using multiprocessing

I’ve obtained deadlock situations when using tensorflow functionalities and using more than a single processor. More extensive explanation of the issue here;

github.com/tensorflow/tensorflow

multiprocessing stuck after usage of tensorflow functionality

opened 05:20PM - 31 May 23 UTC

closed 02:10AM - 06 Jul 23 UTC

cevheck

stat:awaiting response type:bug stale TF 2.11

<details><summary>Click to expand!</summary> ### Issue Type Bug ### Have y…ou reproduced the bug with TF nightly? No ### Source binary ### Tensorflow Version v2.11.0-rc2-17-gd5b57ca93e5 2.11.0 ### Custom Code Yes ### OS Platform and Distribution Linux Ubuntu 20.04 ### Mobile device _No response_ ### Python version 3.9 ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version _No response_ ### GPU model and memory _No response_ ### Current Behaviour? In a project where I wanted to implement multiprocessing in combination with a Tensorflow function, the processes kept getting stuck. After some debugging I was able to create a minimal example as seen below or as provided as .txt doc. [minimal_example_MPlock.txt](https://github.com/tensorflow/tensorflow/files/11616923/minimal_example_MPlock.txt) In words; when I do some random transpose operation in the process there is no problem at all. However afterwards if I use any functionality from tensorflow and repeat the code that used to work, all of a sudden it gets stuck on the last matrix inverse (which is quite a big one, but shouldn't be any problem). What you can see, and is probably part of the issue, is that each time the function is called there's a bunch of tensorflow warnings. So far in all of my code I could always just ignore them, but to be sure I added them here in the logs aswell. Thanks in advance! ### Standalone code to reproduce the issue ```shell from multiprocessing import Process import numpy as np import tensorflow as tf import time def test_inverse(): Pxk = tf.eye(2) Pwk = tf.eye(259) print("here1") Pxk = tf.transpose(Pxk) print("here2") Pwk = np.transpose(Pwk) print("here3") Pwk = tf.transpose(Pwk) print("here4") processes = [] for _ in range(3): p = Process(target=test_inverse, args=[], kwargs={}) time.sleep(1) p.start() processes.append(p) for process in processes: process.join() ### works perfectly fine import time time.sleep(5) print("using ANY tensorflow function") a = tf.math.add(2,5) processes = [] for _ in range(3): p = Process(target=test_inverse, args=[], kwargs={}) p.start() processes.append(p) for process in processes: process.join() ``` ### Relevant log output ```shell 2023-05-31 19:20:48.218419: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-31 19:20:48.286362: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-05-31 19:20:48.620505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-05-31 19:20:48.620539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-05-31 19:20:48.620543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-05-31 19:20:50.183101: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2023-05-31 19:20:50.183138: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist 2023-05-31 19:20:50.183724: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. here1 here2 here3 here4 2023-05-31 19:20:51.221681: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2023-05-31 19:20:51.221727: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist 2023-05-31 19:20:51.222470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. here1 here2 here3 here4 2023-05-31 19:20:52.223862: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2023-05-31 19:20:52.223908: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist 2023-05-31 19:20:52.224710: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. here1 here2 here3 here4 using ANY tensorflow function 2023-05-31 19:20:57.355866: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2023-05-31 19:20:57.355928: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cedric-Z590-UD-AC): /proc/driver/nvidia/version does not exist 2023-05-31 19:20:57.356675: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. here1 here1 here2 here3 here2 here3 here1 here2 here3 ``` </details>

Context / First attempt

I have a workflow where I use tensorflow functions and for example @tf.function functionality to speed up my matrix computations. Now I want to extend this workflow to parallel computation. A small example and (bad) first attempt using the “test_parallel” defined earlier:

strategy = tf.distribute.MirroredStrategy()
idx = [i for i in range(3)]
dataset = tf.data.Dataset.from_tensor_slices(idx)
dist_dataset = strategy.experimental_distribute_dataset(dataset)
with strategy.scope():
    for x in dist_dataset:
        strategy.run(test_parallel, args=(x, args, kwargs))

However the output here is of the form

here1
here2
here1
here2
here1
here2

instead of

here1
here1
here1
here2
here2
here2

and hence my attempt clearly failed.

Conclusion

In general I want to perform some multiprocessing on tasks utilizing tensorflow functionalities. I was wondering if Tensorflow offers something alike that I somehow looked over. Another solution would be to obtain a working version with joblib or the python multiprocessing libraries.

Thanks in advance!
Cedric

Jetti_Bharat · February 26, 2025, 5:34am

Hello @Cedric_Van_Heck

Thank you for working on tensorflow,
In this issue the mirror strategy in tensorflow works as sequentially, once for each element and processes one after other. The other example with python multiprocessing is very close to parallel processing.
For you to achieve this in tensorflow please configure multi gpu and multi workers for desired output. Here is the gist.

Topic		Replies	Views
Ubuntu 21.08 TF GPU problem General Discussion gpu	6	4252	March 22, 2023
Total loss is increasing to 10 digits after some steps General Discussion models , datasets	3	1094	January 18, 2023
Runinng tf.distribute.MultiWorkerMirroredStrategy TensorFlow distributed-training	0	206	February 6, 2025
Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice General Discussion build	10	10746	April 10, 2023
Tensorflow freezes while trying to load or create model General Discussion models , install , gpu , help_request	3	2523	May 26, 2022