Hi all! I usually work with web application development, but I’ve been asked by a potential customer if it’s possible to create a CNN which is able to recognize thousands of individual fish. I know that e.g. Google Photos does the same for faces and even individual pets, so I know that it’s possible. What I don’t know is if its possible to do so without Google’s resources?
All examples I’ve read/watched so far in regards to face recognition is about how the CNN can recognize the faces of persons it already knows (i.e. you teach it from 20 labeled images that George W. Bush looks like this, and then it can identify him on any photo after that.) But how does Google do it when they don’t know the names of the persons (images are not labeled), and they still manage to group the images of each individual person/pet so that the user can label them afterwards?
My guess is that they’ve trained their CNN on millions of images, and if the image at hand doesn’t match any of the pre-trained individuals, they put it in a new group called “Unknown person nr xxx”. If they later come across a second image of the same face, it will be placed in that same group. Is this a correct assumption? If yes, does this mean that to be able to identify thousands of individual fish, you need to train a model with xxx images of yyy individual fish before the model is ready for use? If we assume that individual fish are as diverse as human faces, how many training images would one need per fish, and how many individual fish, if you wanted an accuracy > 80%?