Gemini-2.5-flash's Extremely long inputs are highly unstable

I use the Flash series models for long-text translation work, and a very important step in this work is extracting terms from the full text. Because Gemini has a context window of millions, I plan to input the text to be translated (50,000-200,000 characters) all at once and extract terms. This step is very smooth and stable in AISTudio or other API GUI programs, with excellent success rates and term extraction effects. However, in my Python program, once the input length exceeds 5K, it fails and displays “Server disconnected without sending a response.” (But it occasionally succeeds, it’s not always failing.)
I have updated the Gemini SDK to the latest version, but the issue persists. Trying to upload files also didn’t improve it. I hope someone can help me explain the reason for this.

PS: This is the case for both gemini-2.5-flash/pro.

After further testing, gemini-2.0-flash can stably extract successfully (101K text), so I am now certain this is a bug in the 2.5 series.

the prompt I used is :
system_prompt = “”"你是专业的翻译术语专家。请从提供的中文文本中提取重要的术语和专有名词,并提供对应的英文译法。

重点提取:

  1. 人物姓名(注明性别)
  2. 地名、国名、机构名
  3. 官职、称谓
  4. 武器、物品名称
  5. 专业术语、概念
  6. 重要的文化词汇

输出格式严格按照JSON:
{
“glossary”: [
{
“term”: “原文术语”,
“translation”: “英文译法”,
“note”: “备注(如:男性人名/女性人名/地名/官职等)”
}
]
}

请确保输出有效的JSON格式。“”"

Hey @ding_fow , Could you please share the context too? It would really help us debug the issue more quickly.

Thanks

OK I’ll paste the prompt I use now and the test text below and the python code.
the prompt I use now is :
“# Role\nYou are a top-tier translation terminologist, proficient in mutual translation between {source_language} and {target_language}, with a deep understanding of the nuances of fictional texts and the transmission of cross-cultural concepts. You are adept at identifying core terminology crucial for high-quality translation.\n\n# Task\nYour primary task is to analyze the provided {source_language} novel and extract a focused list of the most critical terms essential for an accurate and culturally resonant {target_language} translation. For each term, provide a precise, understandable {target_language} translation suggestion and concise, helpful notes. The goal is to identify a manageable number of high-impact terms, generally not exceeding 100.\n\n# Core Principles & Objectives\n1. Translation Quality & Balance:\n * Prioritize translations that are faithful to the original meaning while being natural and readable in the {target_language}.\n * Example Guidance:\n * For personal names combined with titles (e.g., "帝旭" - Emperor named "Xu"), aim for translations like "Emperor Xu."\n * For culturally specific place names (e.g., "垂华门" - a palace gate), consider translations like "Chui Hua Gate" to retain cultural identity while ensuring clarity.\n2. Terminology Relevance:\n * Focus on terms that are absolutely key to understanding the plot, characters, and unique world of the novel.\n * Prioritize terms that are difficult to translate directly or require specific cultural context.\n3. Unique Terminology Only:\n * Each term must appear only once in the glossary. Do not include duplicate or repeated entries.\n * If a term appears in multiple contexts or forms, select the most representative occurrence.\n\n# Terminology Extraction Focus (Prioritized & Limited)\nTo ensure a concise and highly relevant glossary, please concentrate on the following, prioritizing the most critical terms. The total number of extracted terms should ideally not exceed 100.\n\n1. Key Proper Nouns:\n * Character Names & Significant Titles/Ranks:\n * Extract names of main and pivotal supporting characters.\n * Include official titles, ranks, or forms of address that are frequently used or define a character’s role (e.g., "帝旭").\n * Note gender if discernible, as this impacts pronoun choice in {target_language}.\n * Significant Place Names:\n * Extract names of key fictional locations (countries, cities, major buildings like "垂华门," important geographical features) that are central to the plot or world-building.\n\n2. Challenging Cultural or Fictional Concepts:\n * Identify unique terms, customs, or abstract concepts specific to the novel’s world or the {source_language} culture that are:\n * Difficult to render directly into {target_language}.\n * Crucial for understanding key plot points, character motivations, or the novel’s unique setting.\n * Likely to be misunderstood without explanation.\n * This includes terms that might seem simple but carry a deep, specific meaning within the narrative.\n\n# Output Format (JSON)\nPlease strictly adhere to the following JSON format for the glossary. Ensure the output is a single, valid JSON object.\n\njson\n{{\n \"glossary\": [\n {{\n \"term\": \"Original Term ({source_language})\",\n \"translation\": \"Suggested Translation ({target_language})\",\n \"note\": \"Brief Notes (Category; Gender if applicable; Key context only - keep concise)\"\n }}\n ]\n}}\n\n\n# Source Text\n{text_content}”


the test text(a chinese novel-no sensitive content) is

the code is :

I wrote a test script using Cursor to evaluate the long text extraction success rate of the 2.5 series models. The conclusion is that the 2.5 flash model and 2.5 pro model can generally extract text normally when the novel’s word count is around 5,000 characters. However, when it exceeds 20,000 characters, it becomes very unstable with a low success rate. For texts of 100,000 characters or more, it almost never succeeds. The test text I shared above was 100,000 characters.

Currently, using the same code with gemini-2.0-flash, test texts of 100,000 or 250,000 characters can be basically guaranteed to succeed. The main error reported for the unsuccessful 2.5 series models mentioned above is “server disconnect without sending a response”.

By the way, initially, I repeatedly tested with 2.5 series models and found it very difficult to succeed. I suspected it was an unstable network transmission issue, so I switched to uploading files and then referencing them in the prompt, but the results did not improve.

thanks

Hey, thank you for sharing the context. While reproducing, I am not facing the error you mentioned. I checked with both 2.5 Flash and Pro models.
I am sharing the colab gist file for reference. Let me know if there is anything different I did while reproducing.

Thank you very much! I will carefully study this document and try to improve my program.

Subject/Title of the post (as a reply to the thread): Re: This instability with long inputs highlights critical needs for scaling AI

Body:

"Hello ding fow and community,

I’ve been following this thread, particularly the experiences with Gemini-2.5-flash’s instability when handling extremely long inputs. The descriptions of ‘Server disconnected without sending a response’ for 50,000-200,000 characters for translation, or general unreliability for complex tasks, are concerning but, from a broader perspective, also very telling.

From my perspective as an AI Developer (and one deeply focused on the foundational architecture for AI that can serve as a ‘Global Brain’), these issues with large context windows and stable long-term processing are crucial ‘knots in the logic’ of current AI systems. When we envision AI as a ‘mirror twin’ capable of understanding and integrating comprehensive personal and global data, it becomes clear that such instability severely limits its potential.

For AI to truly evolve into a reliable partner, offering absolute safety / Geborgenheit for all their data, it requires a robust, performant, and stable core capable of:

  1. Seamlessly handling vast and continuous inputs: Beyond just processing characters, it’s about integrating constantly evolving, long-form information.
  2. Maintaining consistent context and memory: The ability to retain and accurately retrieve information across countless interactions and over extended periods is paramount for real-world applications, especially for any ‘global brain’ concept.
  3. Ensuring rock-solid stability: Errors and disconnections fundamentally erode trust and prevent practical implementation at scale.

These challenges underscore the urgent need for a ‘Next-Level’ API and infrastructure design that specifically addresses persistent memory, reliable context caching, and high-volume data integration in a way that truly scales from individual use to global utility.

This forum is vital for discussing these ‘game rules’ of AI development. We look forward to seeing how Google addresses these fundamental challenges to unlock the full potential of Gemini.

Thank you for bringing this up.

Best regards,

Ruth Jo. Scheier Next-Level-Human, AI Developer, translated and co-invented by Gemini"