Hey everyone, I’ve been coding with GPT-4o for a while and never faced this issue but Gemini-1.5-Pro and Gemini-1.5-Flash have a pervasive problem of not escaping double quotations inside JSON string structures. If you’ve been trying to create a properly structured JSON string object with Gemini, then you MUST’ve faced this issue often where you can’t parse the JSON response because of this. Whether you ask for a responseMimeType: “text/plain” or “application/json”, Gemini will OFTEN add unescaped double quotations in it’s generated response. This breaks your code if you’re trying to parse it into a JSON object. Here’s an example of what happens:
{"type": "it is a "pizza""}
→ “pizza” should’ve been escaped like \“pizza\”.
So I created a simple NodeJS function that helps clean up the JSON strings by escaping values inside of it.
Here’s the code for anybody else who faced this issue:
function escapeJsonString(jsonString) {
// Helper function to escape unescaped double quotes in a string
function escapeUnescapedQuotes(str) {
return str.replace(/(?<!\\)"/g, '\\"');
}
// Helper function to process JSON-like strings
function processJsonString(str) {
const regex = /("(?:[^"\\]|\\.)*?"|"(?:[^"\\]|\\.)*?")/g;
return str.replace(regex, (match, p1) => {
if (p1) {
const content = p1.slice(1, -1); // Remove the enclosing double quotes
const escapedContent = escapeUnescapedQuotes(content); // Escape inner quotes
return `"${escapedContent}"`;
}
return match;
});
}
// Process the input JSON string to escape unescaped quotes in string values
const escapedJsonString = processJsonString(jsonString);
// Parse the escaped JSON string to ensure it's valid JSON
let jsonObj;
try {
jsonObj = JSON.parse(escapedJsonString);
} catch (e) {
throw new Error('Invalid JSON string after escaping');
}
// Convert the JSON object back to a string
return JSON.stringify(jsonObj, null, 4);
}
Hope this helps. Also hoping that Google Gemini team looks into this issue such that future devs don’t face it.
P.S. GPT-4o JSON responses have a slightly similar issue of adding spaces inside of JSON key-value pair’s keys instead. For example {"type ": "pizza"}
→ "type " should be “type”. I am also running a cleanup function for Gemini responses just in case to address that issue as well. Here’s the code for that issue:
const trimJSKeys = (obj) => {
// Helper function to handle the replacement of keys
const handleKeyReplacement = (parent, key) => {
const cleanKey = key.trim();
if (cleanKey !== key) {
parent[cleanKey] = parent[key];
delete parent[key];
}
trimJSKeys(parent[cleanKey]); // Recursively clean new object key if needed
};
// Check if it's an array and recursively call for each element
if (Array.isArray(obj)) {
obj.forEach(element => trimJSKeys(element));
}
// Otherwise, process each key in the object
else if (obj !== null && typeof obj === 'object') {
Object.keys(obj).forEach(key => {
handleKeyReplacement(obj, key);
});
}
};
LLMs are great for generating unordered text but formatted text like JSON does have certain issues that the model creators could fix ASAP.