Hello,
I am currently porting different object detection models onto Android using tflite, for RT-Detrv2, Yolo8 and Yolo11 it works flawlessly. YoloX on the other hand encounters a strange issue, where despite the preprocessing pipeline in android yielding the exact same as the one in Python , the output is different. This was tested by both visualizing it as a bitmap and checking reference values at multiple points in the tensor.
All the Code that is referenced here, as well as the rest of the code for testing can be found on this github page: GitHub - RNoahG/YoloXPythonAndroid: Implementation of Yolox inference on both Android and Python.
Dependencies for Python are in Requirements.txt, while the Android app should be capable of building in Android Studio due to imports being handled in the gradle.build.
Here is the python code that matters
if len(img.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
else:
padded_img = np.ones(input_size, dtype=np.uint8) * 114
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(img,(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR,).astype(np.uint8)
padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
interpreter = tf.lite.Interpreter(model_path=MODEL_PATH)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], img[None, :, :, :])
This produces the correct output, while this Kotlin part for Android does not.
(Ive copied together some code that would be separated by function calls across the files MainActivity.kt and YoloDetector.kt)
val imgStream = assets.open("TestImages/$imagePath")
val decode = BitmapFactory.decodeStream(imgStream)
val imgmat = Mat()
Utils.bitmapToMat(decode,imgmat)
val imgmat3 = Mat()
Imgproc.cvtColor(imgmat,imgmat3,Imgproc.COLOR_RGBA2BGR)
val resizedmat = Mat()
val paddedmat = Mat()
val size = Size((1920F*ratio).toDouble(),(1080F*ratio).toDouble())
val scalar = Scalar(114.0,114.0,114.0)
Imgproc.resize(imgmat3,resizedmat,size, 0.0, 0.0,INTER_LINEAR)
Core.copyMakeBorder(resizedmat,paddedmat,0,(imsize-(1080*ratio)).toInt(),0,0,Core.BORDER_CONSTANT,scalar)
val array = ByteArray(paddedmat.rows()*paddedmat.cols()*paddedmat.channels())
paddedmat.get(0,0,array)
val buffer = ByteBuffer.allocateDirect(array.size)
buffer.order(ByteOrder.nativeOrder())
buffer.put(array)
buffer.rewind()
val tensorbuf = TensorBuffer.createFixedSize(intArrayOf(1,tensorWidth,tensorHeight,3), DataType.UINT8)
tensorbuf.loadBuffer(buffer,intArrayOf(1,tensorWidth,tensorHeight,3))
//This just converts the datatype to float32 instead of uint8
val proctensor = tensorproc.process(tensorbuf)
val imageBuffer = proctensor.buffer
val output = TensorBuffer.createFixedSize(intArrayOf(1, numChannel, numElements), DataType.FLOAT32)
interpreter.run(imageBuffer, output.buffer)
If you were to ravel the img Numpy array, the values are the exact same as in the buffer for the Kotlin part, which leads me to believe that in the creation of the tensor there must be an error. Endianess was accounted for already, so I believe the discrepancy stems from the way Tensorflow converts the buffer into the correct tensor shape. Sadly, I am not well versed enough with the inner workings of Tensorflow. Does anyone have some insight on the topic?
Best Regards
Noah