In tf.image.non_max_suppression, where the documentation says that:
this algorithm is invariant to orthogonal transformations and translations of the coordinate system
Does this means that the result shouldn’t change if the input is given as [x1,y1,x2, y2] instead of the required [y1, x1, y2, x2]?