Error using image and a prompt

Hi,

I uploaded an image file using the Google Generative Language API. After that, I tried calling the Google Gemini API with the file URI and a prompt, but I got an error.

I have no idea what I’m doing wrong. Please help!

File upload response: {
  "file": {
    "name": "files/6khds6cjzzwg",
    "mimeType": "application/json",
    "sizeBytes": "28631",
    "createTime": "2024-11-03T19:32:55.946958Z",
    "updateTime": "2024-11-03T19:32:55.946958Z",
    "expirationTime": "2024-11-05T19:32:55.928008659Z",
    "sha256Hash": "MTA0MDc0MmJiNTRmY2RlNjg5MDI3ZmFhM2RmOTg4NGJhOGY5NzRiMTkwYjRmZTA5OTkyMDYxNGUzNDAzYmI4NA==",
    "uri": "https://generativelanguage.googleapis.com/v1beta/files/6khds6cjzzwg",
    "state": "ACTIVE"
  }
}

std::string request_url =

“https ://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:generateContent?key=” + api_key;

Generation Payload:

{
  "contents": [
    {
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/6khds6cjzzwg",
            "mimeType": "image/jpeg"
          }
        }
      ],
      "role": "user"
    },
    {
      "parts": [
        {
          "text": "Analyze the content of this image."
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain",
    "temperature": 1,
    "topK": 40,
    "topP": 0.95
  }
}

Error response:

{
  "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "status": "INVALID_ARGUMENT"
  }
}

Thanks!

Welcome to the forums!

The biggest issue I see is that you two entries in the contents array and each has a “user” role. I think what you want is a single Content entry with one role (the “user” role) but multiple parts: a fileData part and a text part.

This might look something like this:

  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/6khds6cjzzwg",
            "mimeType": "image/jpeg"
          }
        },
        {
          "text": "Analyze the content of this image."
        }
      ]
    }
  ],

I’m a little surprised, however, that this is the error it gives. I would have expected a different one about having two “user” roles in a row.

Some other things to verify:

  • Make sure you’re using one of the most recent models (such as “gemini-1.5-flash-002”), since some older models don’t support “fileData” or “responseMimeType”. It looks like you should be good, however.
  • As you’re testing, make sure you’re not using an expired file. (But, again, you should be good on that for now.)

Good luck!

Hi, thanks for your advice but I tried it and it didn’t help or did I miss something? It seems the situation stayed the same.

File upload response: {
  "file": {
    "name": "files/x70ty3yjsjn0",
    "mimeType": "application/json",
    "sizeBytes": "96846",
    "createTime": "2024-11-03T22:12:40.495423Z",
    "updateTime": "2024-11-03T22:12:40.495423Z",
    "expirationTime": "2024-11-05T22:12:40.475363037Z",
    "sha256Hash": "ZTc2Y2U5OGUwZGM3MjYxY2VjNjY1ZDUxZjM0MDQ4NzViZTU5NjA4ZTRhMWQxYzA1NTdkZmNmMjJkM2NkMDExOA==",
    "uri": "https://generativelanguage.googleapis.com/v1beta/files/x70ty3yjsjn0",
    "state": "ACTIVE"
  }
}

Sending request to Gemini API: {
  "contents": [
    {
      "parts": [
        {
          "fileData": {
            "fileUri": "https://generativelanguage.googleapis.com/v1beta/files/x70ty3yjsjn0",
            "mimeType": "image/jpeg"
          }
        },
        {
          "text": "Analyze the content of this image."
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain",
    "temperature": 1,
    "topK": 40,
    "topP": 0.95
  }
}
Gemini API Response: {
  "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "status": "INVALID_ARGUMENT"
  }
}

Hmmm. I just tested the same JSON you provided with two differences (I used my own image and it was a png file) and I didn’t have any problems.

Looking at your original post, I see that you specify the URL as:

“https ://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:generateContent?key=” + api_key;

and I note that you have a space in between the “https” and the colon. It seems odd that would be the problem, but you may want to check.

Can you show exactly how you’re making this call?

For the record, here is most of my bash script that I used to test it:

fileUri="https://generativelanguage.googleapis.com/v1beta/files/n3a8606l03b7"
mimeType="image/png"

#model=gemini-1.5-flash-002
model=gemini-1.5-flash-8b

curl \
  -X POST "https://generativelanguage.googleapis.com/v1beta/models/${model}:generateContent?key=${API_KEY}" \
  -H 'Content-Type: application/json' \
  -d @<(echo '{
  "contents": [
    {
      "parts": [
        {
          "fileData": {
            "fileUri": "'${fileUri}'",
            "mimeType": "'${mimeType}'"
          }
        },
        {
          "text": "Analyze the content of this image."
        }
      ],
      "role": "user"
    }
  ],
  "generationConfig": {
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain",
    "temperature": 1,
    "topK": 40,
    "topP": 0.95
  }
}')

Hi, thanks for your response.

It’s not because of the whitespace in the url, I just had to add that myself because the forum didn’t let me post more than 2 url:s in single message because I’m a new user on the forum.

Here’s how I make the call, I use libcurl and nlohmann json library.

using json = nlohmann::json;

GeminiApiClient::GeminiApiClient(Logger& logger, MediaProcessor& media_processor, DatabaseManager& database_manager)
	: logger_(logger), media_processor_(media_processor), database_manager_(database_manager)
{
	// Initialization if necessary
}

std::string GeminiApiClient::sendPhotoApiRequest(const std::string& photo_path) {
	try {
		std::string api_key = getApiToken("GOOGLE_GEMINI_API_KEY");

		// Upload the file and get fileUri
		std::string file_uri = uploadFile(photo_path, "image/jpeg");
		if (file_uri.empty()) {
			logger_.log(LogLevel::ERR, "Failed to upload image to Gemini.");
			return "[Error in Photo Analysis]";
		}

		// Prepare the request payload
		json payload = {
			{"contents", json::array({
				{
					{"role", "user"},
					{"parts", json::array({
						{
							{"fileData", {
								{"fileUri", file_uri},
								{"mimeType", "image/jpeg"}
							}}
						},
						{
							{"text", "Analyze the content of this image."}
						}
					})}
				}
			})},
			{"generationConfig", {
				{"temperature", 1},
				{"topK", 40},
				{"topP", 0.95},
				{"maxOutputTokens", 8192},
				{"responseMimeType", "text/plain"}
			}}
		};

		std::string request_url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:generateContent?key=" + api_key;

		HttpClient client(request_url);
		client.addHeader("Content-Type: application/json");

		// Log the request payload
		logger_.log(LogLevel::DEBUG, "Sending request to Gemini API: " + payload.dump(2));

		std::string response = client.fetchData(payload.dump());

		// Log the response
		logger_.log(LogLevel::DEBUG, "Gemini API Response: " + response);

		// Parse the response
		json responseJson = json::parse(response);

		if (responseJson.contains("error")) {
			logger_.log(LogLevel::ERR, "API Error: " + responseJson["error"]["message"].get<std::string>());
			return "[Error in Photo Analysis]";
		}

		std::string analysis = responseJson["candidates"][0]["content"].get<std::string>();

		return analysis;

	}
	catch (const std::exception& e) {
		logger_.log(LogLevel::ERR, "Error in sendPhotoApiRequest: " + std::string(e.what()));
		return "[Error in Photo Analysis]";
	}
}
#include <curl/curl.h> //in http_client.h

HttpClient::HttpClient(const std::string& url) : url(url) {
    curl_global_init(CURL_GLOBAL_DEFAULT);
}

void HttpClient::addHeader(const std::string& header) {
    headers.push_back(header);
}
std::string HttpClient::fetchData(const std::string& jsonPayload) {
    std::string responseData;
    CURL* curl = curl_easy_init();
    if (curl) {
        setCommonOptions(curl, responseData);

        // Set POST-specific options
        curl_easy_setopt(curl, CURLOPT_POST, 1L);
        curl_easy_setopt(curl, CURLOPT_POSTFIELDS, jsonPayload.c_str());

        CURLcode res = curl_easy_perform(curl);
        if (res != CURLE_OK) {
            std::cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << std::endl;
        }

        curl_easy_cleanup(curl);
    }
    return responseData;
}

I have to look into this tomorrow, maybe there is something wrong how the call itself is made.

I am not terribly familiar with libcurl, and it has been a very long time since I have done C/C++ programming. However a couple of things jump out at me:

  • You’re not explicitly setting the content-type to “application/json” anywhere. The documentation on CURLOPT_POSTFIELDS says that if you don’t, and you’re doing a POST, it uses “application/x-www-form-urlencoded” which is very much not what you want. It looks like you’re calling addHeader(), but then ot doing anything with the headers attribute.
  • You’re clearly using the URL, tho it isn’t clear (to me) how that is getting set. But I believe you are.

Hi, sorry this httpclient function got left out from my earlier post, it’s called inside fetchData, it should set the content-type header.

void HttpClient::setCommonOptions(CURL* curl, std::string& responseData) {
    curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data_callback);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &responseData);

    struct curl_slist* header_list = nullptr;
    for (const auto& header : headers) {
        header_list = curl_slist_append(header_list, header.c_str());
    }
    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, header_list);
}

But your reply got me thinking if my solution is really working or if the payload is still sent in some bad form like urlencoded. I have to test more.

Regarding the url, that one is set in the contructor
inside the GeminiApiClient::sendPhotoApiRequest function:

std::string request_url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:generateContent?key=" + api_key;

HttpClient client(request_url);

I really appreciate your efforts and help! Thanks a lot!

Might make sense to setup ncat or something to listen for connections and point it there to see what is actually being sent.

I’ll be honest - I’m running out of ideas.
Good luck!

Good idea with ncat. Thanks a lot for your help so far, it’s a hairy issue, and your support was welcome.

I made a PowerShell 7 script, and it gives me the same error!

However if I delete these 6 lines, it works. I think there’s something wrong with the request payload.

              @{
                    fileData = @{
                        fileUri  = $fileUri
                        mimeType = $mimeType
                    }
                },
# ================================================
# PowerShell Script to Upload Files and Generate Content
# ================================================

# ===========================
# Configuration
# ===========================

# Your API Key
$API_KEY = "API_KEY"

# Define the model you want to use
$model = "gemini-1.5-flash-8b"  # Update as needed

# Define the files to upload with their descriptions and MIME types
$Files = @(
    @{
        Description = "Describe the image"
        FileName   = "C:\Users\lemmski\workspace\powershell scripts\my_image.png"  # Ensure this file exists, provide the full path
        MimeType   = "image/png"
    }
    # Add more files here if needed
)

# Initialize an array to store the URIs of uploaded files
$FileUris = @()

# ===========================
# Function to Upload a Single File
# ===========================

function Upload-File {
    param (
        [string]$ApiKey,
        [string]$Description,
        [string]$FilePath,
        [string]$MimeType
    )

    # Check if the file exists
    if (-Not (Test-Path $FilePath)) {
        Write-Error "File '$FilePath' does not exist."
        exit 1
    }

    # Get the file size in bytes
    $FileSize = (Get-Item $FilePath).Length

    # Create the JSON metadata
    $Metadata = @{
        file = @{
            display_name = $Description
        }
    } | ConvertTo-Json -Compress -Depth 4

    # Convert JSON metadata to bytes
    $MetadataBytes = [System.Text.Encoding]::UTF8.GetBytes($Metadata)

    # Read the binary file data
    $FileBytes = [System.IO.File]::ReadAllBytes($FilePath)

    # Combine metadata and file bytes
    $CombinedBytes = $MetadataBytes + $FileBytes

    # Initialize HttpClient
    $HttpClient = New-Object System.Net.Http.HttpClient

    # Prepare the request
    $Request = New-Object System.Net.Http.HttpRequestMessage
    $Request.Method = [System.Net.Http.HttpMethod]::Post
    $Request.RequestUri = "https://generativelanguage.googleapis.com/upload/v1beta/files?key=$ApiKey"

    # Set the required headers
    $Request.Headers.Add("X-Goog-Upload-Command", "start, upload, finalize")
    $Request.Headers.Add("X-Goog-Upload-Header-Content-Length", "$FileSize")
    $Request.Headers.Add("X-Goog-Upload-Header-Content-Type", "$MimeType")
    $Request.Content = [System.Net.Http.ByteArrayContent]::new($CombinedBytes)
    $Request.Content.Headers.ContentType = [System.Net.Http.Headers.MediaTypeHeaderValue]::Parse("application/json")

    try {
        # Send the request
        $Response = $HttpClient.SendAsync($Request).Result

        # Ensure the request was successful
        if (-not $Response.IsSuccessStatusCode) {
            $ErrorContent = $Response.Content.ReadAsStringAsync().Result
            Write-Error "Failed to upload file '$FilePath'. Status Code: $($Response.StatusCode). Response: $ErrorContent"
            exit 1
        }

        # Parse the response content
        $ResponseContent = $Response.Content.ReadAsStringAsync().Result | ConvertFrom-Json

        # Extract the file URI from the response
        if ($ResponseContent.file -and $ResponseContent.file.uri) {
            return $ResponseContent.file.uri
        }
        else {
            Write-Error "Unexpected response while uploading '$FilePath'."
            exit 1
        }
    }
    catch {
        Write-Error "Exception occurred while uploading file '$FilePath': $_"
        exit 1
    }
    finally {
        # Dispose HttpClient
        $HttpClient.Dispose()
    }
}

# ===========================
# Upload All Files
# ===========================

foreach ($file in $Files) {
    Write-Output "Uploading '$($file.FileName)'..."
    $uri = Upload-File -ApiKey $API_KEY `
                      -Description $file.Description `
                      -FilePath $file.FileName `
                      -MimeType $file.MimeType
    $FileUris += $uri
    Write-Output "Successfully uploaded. URI: $uri`n"
}

# ===========================
# Generate Content Request
# ===========================

# Ensure at least one file URI is available
if ($FileUris.Count -lt 1) {
    Write-Error "No files were uploaded successfully. Cannot proceed with generateContent request."
    exit 1
}

# Assuming you are using the first uploaded file
$fileUri = $FileUris[0]
$mimeType = $Files[0].MimeType

# Construct the JSON payload for content generation
$GenerateContentBody = @{
    contents = @(
        @{
            role  = "user"
            parts = @(
                @{
                    fileData = @{
                        fileUri  = $fileUri
                        mimeType = $mimeType
                    }
                },
                @{
                    text = "Analyze the content of this image."
                }
            )
        }
    )
    generationConfig = @{
        maxOutputTokens  = 8192
        responseMimeType = "text/plain"
        temperature      = 1
        topK             = 40
        topP             = 0.95
    }
} | ConvertTo-Json -Compress -Depth 5

# Optional: Output the JSON payload for debugging
# Write-Output "Generated JSON Payload:"
# Write-Output $GenerateContentBody

# Initialize HttpClient for generateContent request
$HttpClientGen = New-Object System.Net.Http.HttpClient

# Prepare the generateContent request with corrected variable interpolation
$GenerateRequest = New-Object System.Net.Http.HttpRequestMessage
$GenerateRequest.Method = [System.Net.Http.HttpMethod]::Post
$GenerateRequest.RequestUri = "https://generativelanguage.googleapis.com/v1beta/models/${model}:generateContent?key=${API_KEY}"
$GenerateRequest.Content = [System.Net.Http.StringContent]::new($GenerateContentBody, [System.Text.Encoding]::UTF8, "application/json")

try {
    # Send the generateContent request
    Write-Output "Sending generateContent request..."
    $GenerateResponse = $HttpClientGen.SendAsync($GenerateRequest).Result

    # Ensure the request was successful
    if (-not $GenerateResponse.IsSuccessStatusCode) {
        $ErrorContent = $GenerateResponse.Content.ReadAsStringAsync().Result
        Write-Error "Failed to generate content. Status Code: $($GenerateResponse.StatusCode). Response: $ErrorContent"
        exit 1
    }

    # Parse and output the response
    $GenerateResponseContent = $GenerateResponse.Content.ReadAsStringAsync().Result | ConvertFrom-Json
    Write-Output "Generate Content Response:"
    $GenerateResponseContent
}
catch {
    Write-Error "Exception occurred while generating content: $_"
    exit 1
}
finally {
    # Dispose HttpClient
    $HttpClientGen.Dispose()
}

Very strange.
When I did the bash script, I manually got the URL from the file upload and put it in, and it worked ok. (And I used a jpeg. But png should work fine.)

I wonder if something about how you’re getting the URL? What happens if you enter it manually?