Weird error on MirroredStrategy

Hi community,
I have an issue which is going me crazy. Please, can you confirm if it’s an error, or just a dream…

I created a code to use MirroredStrategy to test my 2xGPU configuration.
My desired num_features = 13 and num_output=4,

But any time I raise down num_features (or num_output) from 25 to any number lower, NCCL does not start, and I have to kill manually the process, cause it gets halted. It’s not about configuration, drivers, or version stuff. Cause the code works. I tried different machines, even Vast.ai service.

My real data in a future should be num_features 13, but it cannot modify it. Can you explain why does it happend, or how to fix it?

Thanks

import numpy as np
import tensorflow as tf
from tensorflow.keras.optimizers import Adam

if not tf.config.experimental.list_physical_devices('GPU'):
    raise ValueError('No GPU Detected.')

# Definir una MirroredStrategy.
strategy = tf.distribute.MirroredStrategy()

print('Device nums: {}'.format(strategy.num_replicas_in_sync))

# Random data
num_samples = 24000
num_features = 25
num_output=4

X_random = np.random.random((num_samples, num_features))
y_random = np.random.randint(num_output, size=num_samples)

# Create the model with MirroredStrategy
with strategy.scope():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(128, activation='relu', input_shape=(num_features,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(num_output, activation='sigmoid')
    ])

    optimizer=Adam(learning_rate=0.001)
    model.compile(loss='mse', optimizer=optimizer)
    
#Train the model
model.fit(X_random, y_random, epochs=5, batch_size=64)
1 Like