I am working on the simple_audio command recognition model, provided by tensorflow. I have successfully converted the model into a tflite model. But, the accuracy of the converted model has downgraded drastically. I used the same test dataset for both the models, but the latter performed poorly. I have also tried quantisation-aware training, but there isn’t much change. I used signal.stft for deriving the spectrogram of the given audio file as I cannot use tf.stft in the inference code. I have tried a lot of different ways to debug it, but am facing issues.
The only difference I can think of is in the process of generating the spectrogram.
IN TF Model its done as this:
def get_spectrogram(waveform):
zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)
waveform = tf.cast(waveform, tf.float32)
equal_length = tf.concat([waveform, zero_padding], 0)
spectrogram = tf.signal.stft(
equal_length, frame_length=255, frame_step=128)
spectrogram = tf.abs(spectrogram)
return spectrogram
While in tflite model its written as:
def get_spectrogram1(mySound):
y=np.shape(mySound)
res = int(''.join(map(str, y)))
zero_padding=np.zeros((16000) - res,dtype=np.float32)
equal_length=np.concatenate((mySound,zero_padding),axis=0)
f,t,Zw= signal.stft(equal_length, nperseg=247,noverlap=122)
Zw=np.absolute(Zw)
return Zw
Can someone point out some options or suggestions?