JSON responses and PlainText Responses (with JSON) doesn't have proper double quotations escaping

Hey everyone, I’ve been coding with GPT-4o for a while and never faced this issue but Gemini-1.5-Pro and Gemini-1.5-Flash have a pervasive problem of not escaping double quotations inside JSON string structures. If you’ve been trying to create a properly structured JSON string object with Gemini, then you MUST’ve faced this issue often where you can’t parse the JSON response because of this. Whether you ask for a responseMimeType: “text/plain” or “application/json”, Gemini will OFTEN add unescaped double quotations in it’s generated response. This breaks your code if you’re trying to parse it into a JSON object. Here’s an example of what happens:

{"type": "it is a "pizza""} → “pizza” should’ve been escaped like \“pizza\”.

So I created a simple NodeJS function that helps clean up the JSON strings by escaping values inside of it.

Here’s the code for anybody else who faced this issue:

function escapeJsonString(jsonString) {
    // Helper function to escape unescaped double quotes in a string
    function escapeUnescapedQuotes(str) {
        return str.replace(/(?<!\\)"/g, '\\"');
    }

    // Helper function to process JSON-like strings
    function processJsonString(str) {
        const regex = /("(?:[^"\\]|\\.)*?"|"(?:[^"\\]|\\.)*?")/g;
        return str.replace(regex, (match, p1) => {
            if (p1) {
                const content = p1.slice(1, -1); // Remove the enclosing double quotes
                const escapedContent = escapeUnescapedQuotes(content); // Escape inner quotes
                return `"${escapedContent}"`;
            }
            return match;
        });
    }

    // Process the input JSON string to escape unescaped quotes in string values
    const escapedJsonString = processJsonString(jsonString);

    // Parse the escaped JSON string to ensure it's valid JSON
    let jsonObj;
    try {
        jsonObj = JSON.parse(escapedJsonString);
    } catch (e) {
        throw new Error('Invalid JSON string after escaping');
    }

    // Convert the JSON object back to a string
    return JSON.stringify(jsonObj, null, 4);
}

Hope this helps. Also hoping that Google Gemini team looks into this issue such that future devs don’t face it.

P.S. GPT-4o JSON responses have a slightly similar issue of adding spaces inside of JSON key-value pair’s keys instead. For example {"type ": "pizza"} → "type " should be “type”. I am also running a cleanup function for Gemini responses just in case to address that issue as well. Here’s the code for that issue:

const trimJSKeys = (obj) => {
    // Helper function to handle the replacement of keys
    const handleKeyReplacement = (parent, key) => {
        const cleanKey = key.trim();
        if (cleanKey !== key) {
            parent[cleanKey] = parent[key];
            delete parent[key];
        }
        trimJSKeys(parent[cleanKey]); // Recursively clean new object key if needed
    };

    // Check if it's an array and recursively call for each element
    if (Array.isArray(obj)) {
        obj.forEach(element => trimJSKeys(element));
    }
    // Otherwise, process each key in the object
    else if (obj !== null && typeof obj === 'object') {
        Object.keys(obj).forEach(key => {
            handleKeyReplacement(obj, key);
        });
    }
};

LLMs are great for generating unordered text but formatted text like JSON does have certain issues that the model creators could fix ASAP.

1 Like