Hello. I’m new to NLP and I’m trying to follow Tensorflow’s tutorial on Image Captioning (Image captioning with visual attention | Text | TensorFlow), but I ran into an problem when trying to preprocess the images with InceptionV3. As said in the tutorial, I have a preprocessing function
Then I use it to get a BatchDataset (I get a <BatchDataset shapes: ((None, 299, 299, 3), (None,)), types: (tf.float32, tf.string)>)
# Get unique images
encode_train = sorted(set(img_name_vector))
# Feel free to change batch_size according to your system configuration
image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)
image_dataset = image_dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16)
Up to this point, everything works, but then when I try
for img, path in image_dataset:
#Do something
Either nothing happens or the kernel dies. Is there a way to fix or circumvent that issue?
Thanks for the reply, @lgusm .
I don’t think this is a memory issue. I purposefully reduced the number of images down to 64 so that such a problem can’t happen. I also checked my task manager and didn’t see any problem there either.
Could there be a problem with the tensorflow version that I’m using? It’s not likely the problem, but I think I have an earlier version than the one used on colab.
you may try running the tutorial with v2.5 on your local machine and check if that fixes your issue.
The notebook example is runnable in Colab end-to-end and Colab is loaded with the current latest version (TF 2.5).
This might be due to the compute/etc requirements of the example, but we don’t have all information about your setup to be able to completely debug this. As you’re probably aware, it’s not a small dataset for a typical demo:
“… large download ahead . You’ll use the training set, which is a 13GB file…”
And the Caching the features extracted from InceptionV3 step can be compute intensive. It comes with a warning in the tutorial:
“You will pre-process each image with InceptionV3 and cache the output to disk. Caching the output in RAM would be faster but also memory intensive, requiring 8 * 8 * 2048 floats per image. At the time of writing, this exceeds the memory limitations of Colab (currently 12GB of memory).”
Also keeping in mind that, as the doc says:
“Performance could be improved with a more sophisticated caching strategy (for example, by sharding the images to reduce random access disk I/O), but that would require more code.”
Maybe some or all of that are contributing to the issue.
Let us know if upgrading to TF 2.5 fixes anything.
Yes, the amount of memory needed is probably very large, which is why I restricted the number of images to 64 just to see if the rest of the code works.
I managed to upgrade tensorflow to the latest version (TF2.5), and now if I write
for img, path in image_dataset:
pass
the kernel doesn’t die anymore, but I get an error:
I fixed the problem, so in case someone has the same problem, here’s how I solved it:
I found out that the file “captions_train2014.json” contains image IDs that do not exist in the “train2014” folder, so when trying to iterate over the images, the error occured. More exactly, there are 82783 different IDs, but I have only 74891 images. I fixed that by verifying if the image path exists before opening the image. I have no idea why that works in collab though (but maybe my download just went wrong).