Im working on trying to fine tune a 7 Billion parameter model in TF keras but I’m having the problem that I can’t distribute the model across multiple GPUs.
I read some about the parameter server strategy but I can’t really get it to work the way I want to.
If anyone has any insights they would be greatly appreciated.