How to upload files with media.upload in REST API?

I’ve figured out how to upload files through the file API in REST:

curl -H "Content-Type: text/plain" -H "X-goog-api-key: key" -X POST -d @test.txt "https://generativelanguage.googleapis.com/upload/v1beta/files" > response.json

And when I ask Gemini about the contents of the file, it does seem to be able to access it.

Now, how do I upload a file with the metadata included? Like a custom ID or display name? I understand that I have to add a JSON File object, but how do I do so while also including the actual file data? It seems the Content-Type header also specifies the File type?

I’ve been able to do a multipart/related upload, with the first part containing the metadata. You also need to set the header:

"X-Goog-Upload-Protocol": "multipart"

But it hasn’t been very helpful. It won’t let me set the displayName, and there aren’t many other fields that are useful to set. Still trying to figure this out further myself.

Thanks a bunch! Have you been able to set a custom file ID through this method?

To be honest, I haven’t tried. My use cases don’t need it.

I see. Thanks for the info!

Hi! I’ve got it kinda working.

It seems the name field does not work, as it just throws an error, no matter what I do:
CreateFileRequest.file.name: File name may only contain lowercase alphanumeric characters or dashes (-) and cannot begin or end with a dash.

But, displayName does work!

For those interested, this is part of my code:

GeminiManager.cs

public async Task<TResponse> Request<TResponse>(IGeminiMultiPartPostRequest request)
{
    string requestEndpoint = request.EndpointUri;
    string requestData = request.GetUtf8EncodedData(MultiPartFormDataSeperator);

    using UnityWebRequest webRequest = UnityWebRequest.Post(requestEndpoint, requestData, $"multipart/related; boundary={MultiPartFormDataSeperator}");
    webRequest.SetRequestHeader("X-Goog-Upload-Protocol", "multipart");

    return JsonConvert.DeserializeObject<TResponse>((await ComputeRequest(webRequest)).downloadHandler.text);
}

private async Task<UnityWebRequest> ComputeRequest(UnityWebRequest webRequest)
{
    webRequest.SetRequestHeader("X-goog-api-key", _geminiApiKey);
    UnityWebRequestAsyncOperation operation = webRequest.SendWebRequest();
    while (!operation.isDone)
        await Task.Yield();

    if (webRequest.result != UnityWebRequest.Result.Success)
        throw new GeminiRequestException(webRequest);

    Debug.Log("Gemini API computation succeeded.");
    return webRequest;
}

GeminiFileUploadRequest.cs

public string GetUtf8EncodedData(string dataSeperator)
{
    StringBuilder data = new($"--{dataSeperator}\r\n");

    data.Append("Content-Disposition: form-data; name=\"metadata\"\r\n");
    data.Append("Content-Type: application/json; charset=UTF-8\r\n\r\n");
    data.Append($"{JsonConvert.SerializeObject(this)}\r\n");

    data.Append($"--{dataSeperator}\r\n");
    data.Append("Content-Disposition: form-data; name=\"file\"\r\n");
    data.Append($"Content-Type: {ContentType}\r\n\r\n");
    data.Append($"{Encoding.UTF8.GetString(RawData)}\r\n");
    data.Append($"--{dataSeperator}--\r\n");

    return data.ToString();
}

For the full code please check out UGemini: A Unity/C# wrapper for the Gemini API on GitHub.

1 Like

You got displayName working? Interesting!
I may have to revisit that.

Would love to see what the JSON part you’re sending looks like.

The code is open source! It is written in C# with JSON serialization by Newtonsoft.Json (aka Json.Net). The code specifically for the media.upload request is available in GeminiFileUploadRequest.cs and GeminiFileUploadMetaData.cs.

The generated JSON should look something like {"file":{"displayName":"Hello World"}}

If you had got name working in your code, please do let me know! I’m making a package to help developers use Gemini in their Unity apps.

Hi @uralstech & @afirstenberg,

I tried translating the File Upload request to Scala but still getting bad request. I am not sure what I am doing wrong. here is my code. Hoping to get some direction and pointers. Thank you.

object FileAPIClient extends NooteJsonProtocol with SprayJsonSupport {

  case class Status(code: Option[Int], message: Option[String], details: Option[List[JsValue]])
  case class VideoMetadata(videoDuration: Option[String])
  case class FileMetadata(
                           name: Option[String],
                           displayName: Option[String],
                           mimeType: Option[String],
                           sizeBytes: Option[String],
                           createTime: Option[String],
                           updateTime: Option[String],
                           expirationTime: Option[String],
                           sha256Hash: Option[String],
                           uri: Option[String],
                           state: Option[String],
                           error: Option[Status],
                           videoMetadata: Option[VideoMetadata]
                         )
  case class FileWrapper(file: FileMetadata)
  case class FileUploadResponse(file: FileMetadata)
  case class FileDeleteResponse()
  case class FileDetails(file: FileMetadata)
  case class FileList(files: List[FileMetadata], nextPageToken: Option[String])

  given statusFormat: RootJsonFormat[Status] = jsonFormat3(Status.apply)
  given videoMetadataFormat: RootJsonFormat[VideoMetadata] = jsonFormat1(VideoMetadata.apply)
  given fileMetadataFormat: RootJsonFormat[FileMetadata] = jsonFormat12(FileMetadata.apply)
  given fileWrapperFormat: RootJsonFormat[FileWrapper] = jsonFormat1(FileWrapper.apply)
  given fileUploadResponseFormat: RootJsonFormat[FileUploadResponse] = jsonFormat1(FileUploadResponse.apply)
  given fileDeleteResponseFormat: RootJsonFormat[FileDeleteResponse] = jsonFormat0(FileDeleteResponse.apply)
  given fileDetailsFormat: RootJsonFormat[FileDetails] = jsonFormat1(FileDetails.apply)
  given fileListFormat: RootJsonFormat[FileList] = jsonFormat2(FileList.apply)

  private def getMimeType(filePath: Path): String = {
    val fileName = filePath.getFileName.toString.toLowerCase
    fileName match {
      case name if name.endsWith(".txt") => "text/plain"
      case name if name.endsWith(".html") => "text/html"
      case name if name.endsWith(".jpg") || name.endsWith(".jpeg") => "image/jpeg"
      case name if name.endsWith(".png") => "image/png"
      case name if name.endsWith(".pdf") => "application/pdf"
      case _ => "application/octet-stream"
    }
  }

  def uploadFile(filePath: String, config: Config)(using context: ActorContext[?], materializer: Materializer): Future[FileUploadResponse] = {
    import context.executionContext
    import context.system

    val apiKey = config.getString("noote.gemini-api-key")

    val path = Paths.get(filePath)
    val fileSource = FileIO.fromPath(path)
    val fileName = path.getFileName.toString
    val mimeType = getMimeType(path)

    val fileMetadata = FileMetadata(
      name = None,
      displayName = Some(fileName),
      mimeType = Some(mimeType),
      sizeBytes = None,
      createTime = None,
      updateTime = None,
      expirationTime = None,
      sha256Hash = None,
      uri = None,
      state = None,
      error = None,
      videoMetadata = None
    )
    val fileWrapper = FileWrapper(fileMetadata)
    val requestJson = fileWrapper.toJson

    val formData = Multipart.FormData(
      Multipart.FormData.BodyPart.fromPath(
        "file",
        ContentTypes.`application/octet-stream`,
        path
      ),
      Multipart.FormData.BodyPart(
        "metadata",
        HttpEntity(ContentTypes.`application/json`, requestJson.prettyPrint)
      )
    )

    val request = HttpRequest(
      method = HttpMethods.POST,
      uri = "https://generativelanguage.googleapis.com/upload/v1beta/files",
      entity = formData.toEntity(),
      headers = List(RawHeader("Authorization", s"Bearer $apiKey"))
    )

    Http().singleRequest(request).flatMap { response =>
      println("[Received Response]" + response.entity)
      Unmarshal(response.entity).to[FileUploadResponse]
    }.recoverWith { case ex: Unmarshaller.UnsupportedContentTypeException =>
      Http().singleRequest(request).flatMap { response =>
        Unmarshal(response.entity).to[String].map { body =>
          throw new RuntimeException(s"Failed to upload file: $response")
        }
      }
    }
  }
}

@uralstech & @afirstenberg The name will not give you anything however much you try. It’s more of like a primary key auto generated from your displayName. What I know is it’ll be uniquely generated from the displayName and most importantly add some alphanumeric part to the slugged displayName. So instead of focusing on name which is going to be auto-generated, just focus on the displayName, because even if you explicitly manage to set name (which you won’t), you’ll have to append the alphanumeric characters to it too

I don’t know Scala but I can help you explain what is required of the media upload if you can describe for me the steps your code is undertaking. Am not in need of the code explanation, just the logic you used. I have a working SDK in PHP where I implemented it and it works, but you can describe for me and I see the issue

Hi @derrick ,

certainly, This code only focuses on uploading the file to Files API.
I am retrieving the file from the path as a source and getting the mime type for the file.

Once I have that, I am building the body for the request that should contain fileMetaData, setting name to None since I will get the identifier in response. Then comes building the Multipart Form data to send the file with one part as a binary stream of the file retrieved from path with meta data containing the File wrapper which is just meta data. both are sent as a Post request. One thing to note that has changed is the headers earlier I was using Authorization Bearer but I am using

 headers = List(RawHeader("x-goog-api-key", s"$apiKey"), RawHeader("X-Goog-Upload-Protocol", "multipart"))

To summarize - Get the File from Path → Build the request body with metadata and file as binary stream and send it as a post request

1 Like

@Synduex The media upload even though not fully documented at Method: media.upload  |  Google AI for Developers  |  Google for Developers has two parts. The part you’re trying to achieve is part 1 found at that API reference. It won’t return anything in body, and you should not send file stream for the first part. What happens is that first part only accepts the meta data which is optional AND THESE HEADERS:

return [
   'X-Goog-Upload-Protocol' => 'resumable',
   'X-Goog-Upload-Command' => 'start',
   'X-Goog-Upload-Header-Content-Length' => [THE_FILE_SIZE]
   'X-Goog-Upload-Header-Content-Type' => [THE_MIME_TYPE],
];

Forgive me for the php but am sure you are able to send these headers in Scala as well.
WHAT YOU DO FOR THAT REQUEST:

  1. Calculate the total byte size of the file.
  2. Obtain the mime type. I doubt application/octet-stream will be accepted.
    Google will return empty body for that initial request, but the header will contain a url you should upload the file to. That will take you to second part.
    The header you need will be in X-Goog-Upload-URL.

Part 2 is some how tricky but I’ll try and explain from what I did.

We already have the url to upload to, this is an equivalent in my php code:

        $handle = fopen($filePath, 'rb');
        $chunkSize = self::CHUNK_SIZE;
        $offset = 0;

        while (!feof($handle)) {
            $chunkData = fread($handle, $chunkSize);
            $end = $offset + strlen($chunkData);
            $command = ($end < $fileSize) ? 'upload' : 'upload, finalize';
            $chunkRequest = new UploadMediaChunkRequest($uploadUrl, $chunkData, $offset, $command);
            $response = $this->connector->send($chunkRequest);

            if ($response->failed()) {
                fclose($handle);

                throw new Exception('Chunk upload failed: ' . $response->body());
            }

            $offset = $end;
        }
        fclose($handle);

This is what happens:

  1. Open the file to be uploaded in binary mode.
  2. Define chunk size (8 * 1024 * 1024) (8mbs) to be uploaded
  3. Initialize an offset variable to keep track of the position in the file.
  4. Start a loop that continues until you’ve read the entire file:
    a. Read a chunk of data from the file.
    b. Calculate the end position of this chunk.
    c. Determine the upload command:
    • Use ‘upload’ if there’s more data to come.
    • Use ‘upload, finalize’ for the last chunk.

Send a POST request to the upload URL you got from step 1 with the following:

  • The chunk data in the request body.
  • Pass these headers with the request:
return [
    'Content-Length' => strlen($this->chunkData),
    'X-Goog-Upload-Offset' => [THE_OFFSET],
    'X-Goog-Upload-Command' => [THE_COMMAND],
];

Content-Length in my request is calculating the length of the chunk which is the body of the request.

The last loop that sends command upload, finalize will receive the File instance.

Not sure that’s going to help you enough. I got the idea from this cookbook (shell+curl): cookbook/quickstarts/file-api/sample.sh at main · google-gemini/cookbook · GitHub

Here’s an AI generated pseudocode representation of my code that might help you translate to Scala:

uploadUrl = // URL obtained from initial request
fileSize = // Total file size
chunkSize = 8 * 1024 * 1024  // 8MB chunks
offset = 0

openFile(filePath)
while (not endOfFile) {
    chunkData = readChunk(chunkSize)
    endPosition = offset + chunkData.length
    
    command = if (endPosition < fileSize) "upload" else "upload, finalize"

   size = readSizeofChunk(chunkData)
    
    response = sendRequest(
        method: POST,
        url: uploadUrl,
        headers: {
            "Content-Length": size,
            "X-Goog-Upload-Command": command,
            "X-Goog-Upload-Offset": offset
        },
        body: chunkData
    )
    
    if (response.isError) {
        handleError(response)
    }
    
    offset = endPosition
}
closeFile()

return finalResponse  // Contains uploaded file metadata (File instance)

Note that I didn’t use PUT, both parts used POST.

Recap: Part 1 uses the endpoint in Method: media.upload  |  Google AI for Developers  |  Google for Developers but Part 2 makes request to url returned to you in Part 1.

Haven’t tried my luck with defining a larger chunk size to speed things up which could also break. Be sure to check and terminate upload early if an error occurs.

Also note I didn’t stream both requests.

4 Likes

Hi @derrick,

Thanks for the clarification and providing so much information. I have few doubts

  1. When we send the first request - are we sending it to this - uri (https://generativelanguage.googleapis.com/upload/v1beta/files)?
  2. Does the first request needs to have anything in body or just the headers?
  3. Is the URI same for the second request?

Also I am curious am I the only one who’s finding this unnecessarily complicated?

Yes, first request should go to https://generativelanguage.googleapis.com/upload/v1beta/files.
The sole purpose of first request seems to be to let the receiving server know what context it should have about the file (optional) you’re trying to upload and it wants to know its size as well. If you wish to to provide the metadata then the first request’s body will be:

'file' =>[
   'displayName' => 'Your readable display name'
];

The DOC states it’s optional but I think providing it will help you with identifying the file among those uploaded. The HEADERS shouldn’t miss. I forgot to include Content-Type: application/json since my code defines it somewhere else then it’s merged.

The endpoint for the second request is the upload url you received from request one’s header response. So, it’s not the same.

About the complication I think a lot of people are having this issue. For instance the endpoint for this media upload messed up my sdk structure as I was expecting [BASE_URL]/v1beta/[ENDPOINT] which was surfacing in all endpoints then it came with BASE_URL]/upload/v1beta/[ENDPOINT]. My guess is that without over-simplified SDKs for the Gemini API many will find it tiresome to integrate the Gemini API into their projects.

1 Like

Hi @derrick,

Thanks for the clarification, I agree with the endpoint complication part. Even i am thinking how my backend will adapt to these new revelations. haha I hope it works like magic after all the pain :smiley:

Hi @derrick,

I tried the first part but I am still getting the Bad request- here is the request is the entity problem here? Both with and without entity I receive bad request.

val request = HttpRequest(
      method = HttpMethods.POST,
      uri = "https://generativelanguage.googleapis.com/upload/v1beta/files",
      entity = HttpEntity(ContentTypes.`application/json`, {}.toJson.prettyPrint),
      headers = List(
        RawHeader("x-goog-api-key", apiKey),
        RawHeader("X-Goog-Upload-Protocol", "resumable"),
        RawHeader("X-Goog-Upload-Command", "start"),
        RawHeader("X-Goog-Upload-Header-Content-Length", fileSize.toString),
        RawHeader("X-Goog-Upload-Header-Content-Type", mimeType),
        RawHeader("X-Goog-Upload-Protocol", "multipart")
      )
    )

I think it’s best to give your upload a displayName, if not I guess you’d have your body as:

{
  "file": {}
}

the file is present but empty. What’s your reason for not giving displayName?

Sure that header above should be present? I don’t have it on mine

Oh I got the first part working. This is how it looks like

    val fileMetadata = FileMetadata(
      name = None,
      displayName = Some(fileName),
      mimeType = Some(mimeType),
      sizeBytes = None,
      createTime = None,
      updateTime = None,
      expirationTime = None,
      sha256Hash = None,
      uri = None,
      state = None,
      error = None,
      videoMetadata = None
    )

    val request = HttpRequest(
      method = HttpMethods.POST,
      uri = "https://generativelanguage.googleapis.com/upload/v1beta/files",
      entity = s"""
          |{ "file": ${fileMetadata.toJson.prettyPrint}
        }