I would like to create annotated image datasets to use on Ultralytics HUB.
Also, I would like to make some apps to use with the Stream Realtime feature.
We can all achieve the ASI-Godsend sooner like this.
I would like to create annotated image datasets to use on Ultralytics HUB.
Also, I would like to make some apps to use with the Stream Realtime feature.
We can all achieve the ASI-Godsend sooner like this.
This is available today. See the “live” examples in the cookbook.
I’m not sure which feature in AI Studio you think will accomplish this.
Most if not all of the code required to get the bounding boxes is in this cookbook: Google Colab
I didn’t check whether it is what the Ultralytics datasets expect, the Gemini box_2d
has the property
“Just be careful, the y coordinates are first, x ones afterwards contrary to common usage.” (quoting directly from the cookbook). Your app can flip the coordinates if necessary.
I’ll update my answer - the code you would need to automatically classify objects in, say, 500 images is in fact in the cookbook (you would only need to supply an outer loop). The model isn’t there yet. It does Ok for images with few objects in them. The recommended system instruction limits making bounding boxes to 25. I think a better limit is about 10. Give flash 2.0 experimental too many, and the quality degrades.
A picture is worth a 1000 words, they say. This is what the model came up with:
There are 13 cupcakes in the sample image. The model generated 12 bounding boxes with labeled descriptions. So, missed one. One bounding box is not in the least enclosing the object it is supposed to represent (the bottom row googly-eyed cupcake), it went off to the side. Another is partially enclosing and half-off. You get 9, 10 reasonably good bounding boxes. So, you would either need to regenerate until you get a good set (that’s manual supervision), or accept that your training data will have inaccurate entries.