Is is possible to parallelize sparse-dense matrix mul on gpus and tpus?

dontknow · July 31, 2024, 11:14am

Hi.

a) does the following run in parallel on all available gpus??

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    strategy.run(tf.sparse.sparse_dense_matmul, args=(sparseMat, denseVec)), axis=None)

b) if not, is it possible to run sparse, dense matrix multiplication in parallel in tensorflow on all gpus???

c) does the following run in parallel on all available tpus??

strategy = tf.distribute.TPUStrategy(resolver)
with strategy.scope():
    strategy.run(tf.sparse.sparse_dense_matmul, args=(sparseMat, denseVec)), axis=None)

d) if not, is it possible to run sparse dense matrix multiplication in parallel in tensorflow on all tpus???

Thank you.
Don’t Know.

Aniket_Dubey · August 9, 2024, 6:41am

Hi @dontknow ,
Welcome to the Google AI forum .

The provided code using tf.distribute.MirroredStrategy() does not automatically parallelize tf.sparse.sparse_dense_matmul across GPUs. The strategy.run method does distribute the computation to each GPU, but it does not handle the distribution of a single sparse-dense matrix multiplication operation across multiple GPUs. For parallel execution across GPUs, you would need to manually distribute data and handle aggregation.

To perform sparse-dense matrix multiplication in parallel on all GPUs, you need to manually implement a parallelization strategy. This typically involves distributing the input data across GPUs, performing computations in parallel, and then aggregating the results. TensorFlow’s standard API does not directly support this for sparse-dense operations across multiple GPUs.

Using tf.distribute.TPUStrategy() with strategy.run does not automatically parallelize tf.sparse.sparse_dense_matmul across all TPU cores. While strategy.run distributes computation across TPU cores, the tf.sparse.sparse_dense_matmul function itself does not natively support multi-core parallel execution in this context without further optimization.

Sparse-dense matrix multiplication can be parallelized on TPUs, but it generally requires custom implementation or optimization to leverage TPU cores efficiently. TensorFlow provides support for sparse operations on TPUs, but distributing a sparse-dense matrix multiplication across TPU cores may need additional steps or use of TPU-specific operations and libraries.

However, there’s an important caveat for both GPU and TPU implementations: The tf.sparse.sparse_dense_matmul operation itself may not be optimized for parallel execution on multiple devices.

Thank You .

Sarah_More · January 8, 2025, 11:10am

**strategy = emphasized texttf.distribute.TPUStrategy(resolver)
with strategy.scope():
strategy.run(tf.sparse.sparse_dense_matmul, args=(sparseMat, denseVec)), ax

is=None)**

Topic		Replies	Views
Parallelising custom function in tensorflow using graph execution General Discussion tfrt , gpu , xla , help_request , tfcore	13	2883	January 21, 2022
Distribute on GPU data creation of random variable General Discussion distributed-training , gpu	0	351	October 17, 2022
Multi-GPU inference - am I doing it right? TensorFlow models , gpu , help_request	2	1341	April 11, 2023
Multi-GPU doesn't work for model(inputs) nor when computing the gradients General Discussion datasets , help_request	9	1938	August 18, 2021
Model parallelism in Keras General Discussion distributed-training , keras , education	4	5115	August 20, 2023

Is is possible to parallelize sparse-dense matrix mul on gpus and tpus?

Related topics