Can I import a hugging face dataset into Google AI Studio

Hello. I have a dataset that has the path of a voice file in the input and the transcribed text of that voice in the output in hugging face. How can I import that? Is there any way to do this?

Hi @mohsaleh04

There is no option to directly import dataset from hugging face to AI Studio.

You can download the data locally and upload from local system.

Welcome to the forum. The audio files need to be in Google cloud so that Gemini can access them. One way to accomplish that is to move them, sequentially, through your client. That involves some scripting code. For example, in Python, I would create a pandas dataframe. Then, define a move_row function. In move_row, download the current row’s audio file to a temporary file on your client, and use the File API to upload_file the temporary file (Using files  |  Gemini API  |  Google AI for Developers). The upload_file operation returns a uri, which I would save in the current row of the pandas dataframe. The transcribed text from hugging face would go into the next column in the dataframe.

Run the move_row in a loop over all rows, and you will have moved the audio files where they can be processed by Gemini. You only need to provide the uri’s when prompting it, and you will have them in your dataframe.

Hope that helps.

1 Like

For fine-tuning with Gemini-1.5-flash in python, Can I use this method?
Or this method is just useful for Structured Prompt?

It doesn’t matter whether you want to use chat prompting with few-shot in-context learning, structured prompt or fine tuning, in all cases the media files can’t be in any other cloud other than Google cloud for Gemini to be able to use them.

1 Like

Ok, Thank you for your response.

There’s one more thing you should be aware of. The File API is self-cleaning (the uploads are automatically deleted two days later). And it is free. If you intend to keep using the media files for longer, you probably want to use Google Drive or some other Google storage solution, which might be subject to getting billed for it.

You can do it from here:
https://console.cloud.google.com/vertex-ai/model-garden
but from my experience it will cost you a lot even if you are not using it, not recommended, I would suggest you first to asses it locally and once you are happy with it, you might want to add it to vertex also make sure you are choosing the relevant instance for your needs: