Creating dataset with audio and its captions

Callum_Matthews · October 4, 2021, 1:42pm

I have audio files (in .wav) and their corresponding captions. I’m slowly researching and creating a model to transcribe the audio into text. The audio files are mostly short length, average of 12 seconds duration. But how would I do that? Is there a way to create a custom TF dataset that has can take audios in a column and its captions in another?

lgusm · October 5, 2021, 10:44am

I’d format my dataset similar to others that have similar objective like the librispeech: librispeech | TensorFlow Datasets

That will help you train a model later as there are many examples already based on the the librispeech dataset

8bitmp3 · October 7, 2021, 9:51pm

@Callum_Matthews Building a custom automatic speech recognition (ASR)/speech-to-text dataset is probably quite challenging. Would it help to look at the source code of some pre-made TensorFlow Datasets, such as Librispeech or speech_commands?

Dataset: librispeech | TensorFlow Datasets
Source code: datasets/tensorflow_datasets/audio/librispeech.py at master · tensorflow/datasets · GitHub
Dataset: https://www.tensorflow.org/datasets/catalog/speech_commands
Source code: datasets/tensorflow_datasets/audio/speech_commands.py at master · tensorflow/datasets · GitHub

Topic		Replies	Views
Fine-tuning speech to text model General Discussion datasets , help_request	1	1437	November 30, 2021
Anyone worked with speech recognition using cnn? General Discussion models , datasets , help_request	14	3059	October 1, 2021
[Voice Recognition] How can I use the model? General Discussion models , help_request	1	663	October 6, 2021
Audio metadata problem with speech recognition General Discussion help_request	3	495	November 15, 2021
Audio Processing and ASR Processing using Tensorflow TensorFlow keras , model	1	407	May 15, 2024

Creating dataset with audio and its captions

Related topics