How to set the device for the tensor?

Stonepia · June 17, 2021, 2:45pm

I am curious about how to set the Device in TF. I want to implement a custom distributed data parallel algorithm, and I want to say , for example, split input tensor x into three parts and transfer it to three devices.

so basically, I want to

x0, x1, x2 = tf.split(x, num_or_size_splits=3, axis=1)
x0 = x0.to('device:0')
x1 = x1.to('device:1')
x2 = x2.to('device:2')

But this seems quite impossible in TF.

I found one is about colocation_graph, should I use that?

lgusm · June 17, 2021, 4:14pm

you can do that using the with(device)

does it help?

Stonepia · June 18, 2021, 12:57am

Thanks for the reply, sorry for this unclear question. The with context manager only work for python, IMHO.

However, if I want to implement a data parallel, I would have to rewrite the default TF’s pass, in that case, how would I handle this in C++? Because as far as I know, TF’s tensor does not have the device’s information.

lgusm · June 18, 2021, 9:43am

Humm, I don’t know.

Is this for the training step?
I lack the background but maybe this Distributed training with TensorFlow | TensorFlow Core might be able to give some insights

Bhack · June 18, 2021, 2:11pm

Are you looking for creating your own custom distributed strategy?

Cause I don’t think that we officially support this:

https://github.com/tensorflow/tensorflow/issues/32355

Stonepia · June 20, 2021, 2:13pm

Thanks for the reply! Yes, I am trying to create my own custom distributed strategy, but it seems that doing this in TF is causing a lot of trouble…

Bhack · June 20, 2021, 3:57pm

You can try to look at

github.com

tensorflow/tensorflow/blob/c35883e15a675767c15b8f3c5ed619bd9e051af4/tensorflow/python/distribute/distribute_lib.py#L16:L25


      
          #
          #     http://www.apache.org/licenses/LICENSE-2.0
          #
          # Unless required by applicable law or agreed to in writing, software
          # distributed under the License is distributed on an "AS IS" BASIS,
          # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          # See the License for the specific language governing permissions and
          # limitations under the License.
          # ==============================================================================
          # pylint: disable=line-too-long
          """Library for running a computation across multiple devices.
          
          The intent of this library is that you can write an algorithm in a stylized way
          and it will be usable with a variety of different `tf.distribute.Strategy`
          implementations. Each descendant will implement a different strategy for
          distributing the algorithm across multiple devices/machines.  Furthermore, these
          changes can be hidden inside the specific layers and other library classes that
          need special treatment to run in a distributed setting, so that most users'
          model definition code can run unchanged. The `tf.distribute.Strategy` API works
          the same way with eager and graph execution.

Topic		Replies	Views
Distribute on GPU data creation of random variable General Discussion distributed-training , gpu	0	351	October 17, 2022
Model parallelism in Keras General Discussion distributed-training , keras , education	4	5115	August 20, 2023
Parallelising custom function in tensorflow using graph execution General Discussion tfrt , gpu , xla , help_request , tfcore	13	2885	January 21, 2022
Device-agnostic code and using Mac M1 and M2 Chips on TensorFlow General Discussion gpu , help_request	1	486	June 29, 2023
All PerReplica Tensors on device GPU:0, backing_device is correct General Discussion distributed-training , gpu	1	300	September 29, 2023

How to set the device for the tensor?

Related topics