Sending Files With Prompt: Gemini AI API

Hi everyone,

I am stuck in a problem, I want to send files with prompts. I am using Scala and Akka. I am able to send small jpegs as base64 and it is working but sending images other than that or files such pdfs, docs, etc are returning a Bad Request or no response.

how can i send the file?

1 Like

Welcome to the forum. The base64-encoded approach works well for small content, as you have already discovered. It is called sending media inline. The name comes from the description of the object you are supplying in the Part ( REST Resource: cachedContents  |  Google AI for Developers  |  Google for Developers ). Further down in that description you see that instead of inlineData you can send fileData. Now, to make file data, you need the File API here: Method: media.upload  |  Google AI for Developers  |  Google for Developers

Python and several other popular languages have library support for dealing with the File API, if you can’t find it in the library you are using, then the REST API is the way to go.

Hope this helps!

Hi @OrangiaNebula,

Yes you are correct, the multipart form I am trying to build is wrong. I checked out the resources you provided. I tried building the multipart function but failed :D. I am not understanding the file part of my Multipart form data.

Please correct me if I am wrong

For files that cannot be sent inline

  • File needs to be upload to Google Cloud
  • The prompt is sent with the File Uri

Is this correct?

Right, the large file (too large to be sent inline) is first uploaded using the File API. Then, when you formulate the query, at the location in the prompt where the file is supposed to show up, you give it a Part that contains the uri that you got from the File API.

Files uploaded to the File API are free (no storage fee). They live there for two days and they are automatically cleaned up. From this description it should be obvious that this is not your Google Drive, it is of course in the Google cloud. You can delete them faster if you want, there is another File API function to do that adjacent in the documentation. That might be useful because there is an overall storage cap per project, so if you have many big files you could run into the maximum. It’s pretty generous, I have not come across “you have used up all your space” type restriction myself.

Thank you I got the gist of the process. For File API I am a bit confused after going through the documentation.

There is this JSON - REST Resource: files  |  Google AI for Developers  |  Google for Developers

I believe I have to convert the ByteString to Something in that File but which field I am not sure.

It has not been easy for anyone and the documentation is, well, confusing. Here is an example of people struggling and in the end successful with the File API (with enough code to help): How to upload files with media.upload in REST API?

For languages with library support, it is easier. The Python cookbook has some tidbits that are generally useful too: cookbook/quickstarts/File_API.ipynb at main · google-gemini/cookbook · GitHub

Hope that helps.

1 Like

Hi Orangia,

Thanks for the sharing the resources. I tried translating from C# to Scala but still getting bad request. Any pointers or suggestions would be great.

object FileAPIClient extends NooteJsonProtocol with SprayJsonSupport {

  case class Status(code: Option[Int], message: Option[String], details: Option[List[JsValue]])
  case class VideoMetadata(videoDuration: Option[String])
  case class FileMetadata(
                           name: Option[String],
                           displayName: Option[String],
                           mimeType: Option[String],
                           sizeBytes: Option[String],
                           createTime: Option[String],
                           updateTime: Option[String],
                           expirationTime: Option[String],
                           sha256Hash: Option[String],
                           uri: Option[String],
                           state: Option[String],
                           error: Option[Status],
                           videoMetadata: Option[VideoMetadata]
                         )
  case class FileWrapper(file: FileMetadata)
  case class FileUploadResponse(file: FileMetadata)
  case class FileDeleteResponse()
  case class FileDetails(file: FileMetadata)
  case class FileList(files: List[FileMetadata], nextPageToken: Option[String])

  given statusFormat: RootJsonFormat[Status] = jsonFormat3(Status.apply)
  given videoMetadataFormat: RootJsonFormat[VideoMetadata] = jsonFormat1(VideoMetadata.apply)
  given fileMetadataFormat: RootJsonFormat[FileMetadata] = jsonFormat12(FileMetadata.apply)
  given fileWrapperFormat: RootJsonFormat[FileWrapper] = jsonFormat1(FileWrapper.apply)
  given fileUploadResponseFormat: RootJsonFormat[FileUploadResponse] = jsonFormat1(FileUploadResponse.apply)
  given fileDeleteResponseFormat: RootJsonFormat[FileDeleteResponse] = jsonFormat0(FileDeleteResponse.apply)
  given fileDetailsFormat: RootJsonFormat[FileDetails] = jsonFormat1(FileDetails.apply)
  given fileListFormat: RootJsonFormat[FileList] = jsonFormat2(FileList.apply)

  private def getMimeType(filePath: Path): String = {
    val fileName = filePath.getFileName.toString.toLowerCase
    fileName match {
      case name if name.endsWith(".txt") => "text/plain"
      case name if name.endsWith(".html") => "text/html"
      case name if name.endsWith(".jpg") || name.endsWith(".jpeg") => "image/jpeg"
      case name if name.endsWith(".png") => "image/png"
      case name if name.endsWith(".pdf") => "application/pdf"
      case _ => "application/octet-stream"
    }
  }

  def uploadFile(filePath: String, config: Config)(using context: ActorContext[?], materializer: Materializer): Future[FileUploadResponse] = {
    import context.executionContext
    import context.system

    val apiKey = config.getString("noote.gemini-api-key")

    val path = Paths.get(filePath)
    val fileSource = FileIO.fromPath(path)
    val fileName = path.getFileName.toString
    val mimeType = getMimeType(path)

    val fileMetadata = FileMetadata(
      name = None,
      displayName = Some(fileName),
      mimeType = Some(mimeType),
      sizeBytes = None,
      createTime = None,
      updateTime = None,
      expirationTime = None,
      sha256Hash = None,
      uri = None,
      state = None,
      error = None,
      videoMetadata = None
    )
    val fileWrapper = FileWrapper(fileMetadata)
    val requestJson = fileWrapper.toJson

    val formData = Multipart.FormData(
      Multipart.FormData.BodyPart.fromPath(
        "file",
        ContentTypes.`application/octet-stream`,
        path
      ),
      Multipart.FormData.BodyPart(
        "metadata",
        HttpEntity(ContentTypes.`application/json`, requestJson.prettyPrint)
      )
    )

    val request = HttpRequest(
      method = HttpMethods.POST,
      uri = "https://generativelanguage.googleapis.com/upload/v1beta/files",
      entity = formData.toEntity(),
      headers = List(RawHeader("Authorization", s"Bearer $apiKey"))
    )

    Http().singleRequest(request).flatMap { response =>
      println("[Received Response]" + response.entity)
      Unmarshal(response.entity).to[FileUploadResponse]
    }.recoverWith { case ex: Unmarshaller.UnsupportedContentTypeException =>
      Http().singleRequest(request).flatMap { response =>
        Unmarshal(response.entity).to[String].map { body =>
          throw new RuntimeException(s"Failed to upload file: $response")
        }
      }
    }
  }
}

I don’t know the languages in question, but a few thoughts based on me implementing the File API:

  • You probably don’t want Multipart.FormData, since that usually has a very different encoding. More likely to work is Multipart.Related.
  • I always see it as the metadata portion first, and then the file content. But I’m not sure that’s required.
  • You don’t need to specify all that metadata. In fact, you don’t need to specify any. (The mime type is required, but you can specify that as the mime type of the data part in the HTTP body.)
    • That suggests that you may not even need to use mutlipart anything and can just send the data.
  • If you can use Vertex AI and Google Cloud Storage instead, that will be easier since there are usually libraries that support Google Cloud Storage already.

If something isn’t working, seeing the error may be able to help us further. Meanwhile, I’ll read over the code and my notes for the last time I did this with REST. It is possible.

Hi @afirstenberg,

You are correct, unfortunately cannot use Vertex AI since I am using Gemini from server side.

Thanks to @derrick I was able to figure out and have it working.

These are the steps I followed:

  1. Send the Post request to upload the file
  2. In response headers you receive X-Goog-Upload-Url
  3. Use the Url to send the file in the next request in chunks of size (8388608) or 8Mb

Sample script as reference: cookbook/quickstarts/file-api/sample.sh at main · google-gemini/cookbook · GitHub

I’m not sure what you are building and if this will work for you, but if you simply need the context from the files you are uploading, then consider context caching. Then you could use a language with a simpler Gemini API SDK like Python to upload your prompts/files and create a cache. Then simply use the cache id in your C# program via REST to call the model with the needed context. The only issue is it cost money, however, google is giving everyone $150 in free credits to start.

Hi @deon,

I can’t use the Gemini SDK, It is not supported in Scala. I have the interactions on the backend with Akka so I have battle it out with the REST API documentation.