How to prepare .wav and .amr file for yamnet.tflite model in kotlin or java. I have checked the example project on Github but it has only real-time classification using the mic, but I need to know how to prepare the wav and amr file for this model. thanks
Take a look at this article where there is an explanation of android usage of Yamnet model. Also at the end there is a github link. I hope you find it useful.
Sir, Thanks for your answer. I also tried your provided source but it’s also using only a mic and is very hard to understand, I’m using this project as an example.
I’m using this code to prepare the wav file, please check out this
object AudioConverter {
fun readAudioSimple(path: File): FloatArray {
val input =
BufferedInputStream(FileInputStream(path))
val buff = ByteArray(path.length().toInt())
val dis = DataInputStream(input)
dis.readFully(buff)
// remove wav header at first 44 bytes
return floatMe(shortMe(buff.sliceArray(buff.indices)) ?: ShortArray(0)) ?: FloatArray(
0
)
}
fun FloatArray.sliceTo(step: Int): List<FloatArray> {
val slicedAudio = arrayListOf<FloatArray>()
var startAt = 0
var endAt = 15600
val stepSize = if (step != 0) (15600 * (1f / (2 * step))).toInt() else 0
while ((startAt + 15600) < this.size) {
if (startAt != 0) {
startAt = endAt - stepSize
endAt = startAt + 15600
}
slicedAudio.add(this.copyOfRange(startAt, endAt))
startAt = endAt
}
return slicedAudio
}
private fun shortMe(bytes: ByteArray): ShortArray {
val out = ShortArray(bytes.size / 2)
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(out)
return out
}
private fun floatMe(pcms: ShortArray): FloatArray {
val floats = FloatArray(pcms.size)
pcms.forEachIndexed { index, sh ->
// The input must be normalized to floats between -1 and +1.
// To normalize it, we just need to divide all the values by 2**16 or in our code, MAX_ABS_INT16 = 32768
floats[index] = sh.toFloat() / 32768.0f
}
return floats
}
}
I’m Student, please help me. I really need this solution.
The above solution only works with specific wave files( that matches the model input specifications like byte rate and channel). My question is how to process wave files that do not match the required input specification. How I can input this file. I have tried so many codes and libraries but lost.
I read your article. I think you can explain a little bit more about the library FFmpegKit…provide some links for the user so they can decide to use it or not. The issue with custom libraries is that someday the authors stop supporting them and they do not work with future android APIs.
I see that you are using TensorFlow AudioClassifier… Have you tried directly the conversion the library provides?
Hello sir, how to resample wav audio from 16000hz to 8000hz, because i need preprocess the audio to classify with my tflite model, in the jupyter notebook i use librosa before predict the audio, how i do that in android? i try with your medium post and change the execute parameter of FFmpegKit from 16000 to 8000, but i think it didnt work well, is there any solution?