Failing to implement SCAFFOLD in tensorfow

For my current reasearch, I want to add the Federated learning strategy SCAFFOLD from Karimireddy to my experiment. I am using tensorflow for my reasearch, however I can only find implementations in pytorch online, and I ran into some issues implementing it myself.

Below I attach the algorithm from the paper:

Basically, it just adds (code: line 10 -ci+c) some additional values (control variates) to the gradient before applying it to the model. There is a local control variates (c_i) which is different between all clients, and a global control variates (c) which gets updated every round after model aggregation.

Here the model weights also gets aggregated same as in FedAvg.

I tried to write this in a optimizer in tensorflow, due to the length of the code, I added it in pastebin: https://pastebin.com/ZtX4DvhG

Everything works and converges when directly using tf.SGD optimizer with the same learning rate. But when I use my implementation, it doesn’t converge and the model weights explode after few rounds. (approx. 5-7 aggregations). the weight and loss becomes: NaN.

On the client side, I do the following (I simplified it to keep it short):

# The optimizer Scaffold implementation is in the pastebin link above.
optimizer = Scaffold(learning_rate=hyperparams.SGD_learning_rate)
# load global model weights and global/local control variates to the optimizer
optimizer.set_controls(global_model.weights, hyperparams.scaffold)
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
model.fit()

# compute new local control variates store locally for next round
local_controls = optimizer.get_new_client_controls(
    global_model.get_weights(),
    model.get_weights(),
    option=option,
)

# compute local control variates and send to server
local_controls_diff = (
    [new - old for new, old in zip(local_controls, old_local_controls)]
    if old_local_controls
    else local_controls
)

The aggregation looks like this:

total_client_count = self.total_client_count
selected_clients_count = len(client_parameters)
global_params = self.global_weights
global_controls = self.global_controls
global_lr = self.global_lr

delta_weights = [
    [ c_layer - g_layer  for g_layer, c_layer in zip(global_params, client_i) ]
    for client_i in client_parameters
]

delta_avg_weights = [
    reduce(np.add, layer_updates) / selected_clients_count
    for layer_updates in zip(*delta_weights)
]

delta_controls = [
    [ c_layer - g_layer for g_layer, c_layer in zip(global_controls, client_i) ]
    for client_i in client_contorls            
]

delta_avg_controls = [
    reduce(np.add, layer_updates) / selected_clients_count
    for layer_updates in zip(*delta_controls)
]

# calc new global weights for next round
# x = x + lr_g * delta_x
new_global_weights = [
    global_layer + global_lr * delta_avg
    for global_layer, delta_avg in zip(global_params, delta_avg_weights)
]        

# clac new global control variates for next round
# c = c + |S|/N * delta_ci
new_global_controls = [
    global_layer + (selected_clients_count/total_client_count) * delta_avg
    for global_layer, delta_avg in zip(global_controls, delta_avg_controls)
]                

The new model weights and global control variates will be sent to each client.

I can only find implementations in Pytorch online (Scaffold-Federated-Learning/ScaffoldOptimizer.py at main · ki-ljl/Scaffold-Federated-Learning · GitHub, GitHub - Xtra-Computing/NIID-Bench: Federated Learning Benchmark - Federated Learning on Non-IID Data Silos: An Experimental Study (ICDE 2022)), and I ran into some issues implementing it myself.

Of course if there is any TensorFlow implementation available, I won’t need my own implementation, however I couldn’t find any implementation in TensorFlow.

After aggregation, the new global model and global control variates will be sent to each client.
But my implementation doesn’t delivery the expected results, I tried debugging it in depth but didn’t find the issue. I don’t know what else or how I can further debug it. I suspect the issue should lay in my optimizer implementation rather than in other parts, but I check multiple times and didn’t find anything.

HI AJ_LMU,

I implemented the Federated learning strategy SCAFFOLD in TensorFlow under the Intel OpenFL framework, which is also expected to use the SCAFFOLD optimizer. But I failed.

So I would like to ask if you have implemented the SCAFFOLD optimizer successfully, or if you have used other methods to implement it in Tensorflow successfully.

Have anyone implemented it correctly till now? Because I tried to implement it and control variate keep on exploding, which ultimately creates the problem of getting stuck to 7-10% overall accuracy, i tried using clipping the variables, tried using both options of calculating variables but nothing. If someone have implemented it and are able to run experiments correctly in Non-IID condition please share.