The idea came up to make a neural network to remove text from images (manga, comics).
But, unfortunately, I couldn’t find any examples online for generating a part of an image where the rest of the background and the mask of a remote object serve as additional content for the neural network.
Are there any examples (preferably with a detailed explanation) of what architecture can be used?
I looked towards GAN but it only generates a full picture