How to mange image data for Deep Learning?

I would like to learn what are common best practice ways to manage image data for Deep Learning.

Currently my scenario is the following:

  • Retrieving different collections of images
  • Labeling the image using a labeling tool like CVAT
  • Question: how to manage the data after the labeling step?

What is then the best way to manage the data from different sources that has been labeled? Do you store it to a database or export it just to the file system? How do you manage the data after it has been labeled?

What are your workflows for retrieving, labeling and managing image data?

Hi @ai2ys ,

After labeling images with tools like CVAT, store data on file systems for smaller datasets or databases for larger ones. Organize images in structured directories with metadata files. Use data versioning tools like DVC to track changes. Convert labels to standard formats (COCO, PASCAL VOC, YOLO) for compatibility.

Create preprocessing pipelines for data loading and augmentation. Implement regular backups and consider cloud storage for collaboration.

Workflow involves:
1. Image retrieval :collect images from various sources Store raw images in a structured file system or cloud storage .
2.Labeling: CVAT
3.Post-labeling management: (storage, versioning, validation)
4.Preprocessing and training : TensorFlow
5.Continuous improvement :updating dataset
Hope it helps ,

Thank You .

1 Like