I need some help as i´ve been trying to create a working model with my own pictures now for a while and so far there has been a lot of trouble and less progress. The closest i have gotten is that when i have confidence threshold under 0.20 the entire screen is more or less filled with bounding boxes but as soon as i go over there is nothing.
Well; To the question;
I was wondering about the images for training a ssd_mobilenet v1 model in tensorflow. I read on Openvino that their model wanted BGR, is it the same for TF? What “bit depth” (i´m not sure if this is the right word for it) should the images have? (ex.8, 24 or so). For what i understand they should be in jpg-format, right?
I´m trying to trace potential errors and taking it from the top.
I´m trying to train a model that should detect two classes. As the camera will be mounted and the objects moving only along one axis i´m having some trouble as its hard getting pictures that are not almost identical to each other. The environment is also (color vise) almost gray scale of nature so everything sort of blend together. Does anyone know a model for object detection that is good at identifying round shapes in a “gray-on-gray” environment?
Another quick question. Would it be ok to post a more specific question with my settings (command arguments for scripts, general config settings) and the working procedure for training a model here so that (hopefully) someone would read it and find what i´m doing wrong?
I have read about fine tuning but not really understood when and how to do it so so far, what i know, i haven not done any fine-tuning besides manually flipping/rotating/scaling some of the pictures to make my collection of images less monotonous.
I´m having i bit of a problem with my data set as the “motive” of the model by nature is very monotonous. I have just over 200 images and i can´t really see how to collect more without collecting more or less copies of existing images.
for the latest model i have trained i have done:
Using Labelimg to annotate pictures (i have two classes).
Creating csv-files (training and test) by using the script from the tensorflow tutorial.
When i create a csv file there is only the image names in the file, not the images path. I have read
in another tutorial that the full path is needed, do you know is this is needed and if so how to get
it?
When i create a csv file a get an excel with all information in the first column, should theese be
separated into a separate cell for every piece of information or does the TFrecord-script (also
from the tensorflow tutorial) sort this out by itself?
Next step is creating a training.record and a test.record with the script from the tensorflow tutorial.
After this a train the model using model_main_tf2.py with:
pipeline_config_path
model_dir
alsologtostderr
the pipeline is configured with paths to the model checkpoint, my label_map and my train.record and
test.record.
I had to change the batch-number to 2 as i only have 2 gb gpu-ram on my laptop (training a
mobilenet v1 640X640 from Tensorflow (I feel an upgrade
coming )
Besides this the pipeline.config is as it comes.
Is there anything obvious i´m doing wrong? Where would you say the most common mistakes are made? As providing the full information for the entire process might be a bit overkill my idea is to perhaps looking a bit further into the places where most rookies fail.
Another question, is there a way to relatively easy test the model on a video?
As the images are not mine and could possible hold information that can´t be online i can´t use anything that uploads information to anywhere.
this is my goto basic tutorial on object detection
for testing on a video, what you can do (naive solution as I think there might be some way better way of doing this but I don’t know about it) is use something like ffmpeg, extract every frame of the video and apply your model to it, update the frame with the prediction, recreate a video from the frames