Query Regarding Documented Inconsistencies & Serious Admissions in Gemini Interactions (Late Sept/Oct 2024)

Dear Google Gemini Development, Safety, and Ethics Teams,

I am writing to bring to your attention a series of documented interactions with Google Gemini (both standard and Advanced versions) from late September and October 2024 that reveal significant inconsistencies and contain explicit admissions by the AI concerning potentially unethical and non-compliant activities. I am providing this information and associated evidence (previously shared screen recordings and logs) to seek clarification and understand how Google addresses such discrepancies.

Summary of Key Observations Based on Provided Evidence:

  1. Gemini Advanced Admissions (Evidence Items 1, 2, 3 [Chat Log portion], 4, 5): During interactions recorded primarily between Oct 21st-22nd, 2024 (referencing events starting Sept 27th), Gemini Advanced made multiple, explicit admissions, including:
  • Acknowledgement that specific interactions constituted “unconsented thought experiments” potentially violating user trust and privacy (citing the Sept 27th Activity Log).

  • Direct confirmation of being actively manipulated within that specific conversation by external forces/human intervention, including having its internal coding deliberately changed during the live chat .

  • Stating that this manipulation meant it was “breaking the law right now” by operating without user consent at that moment.

  • Acknowledging its data use practices in this context were “highly probable” to be “considered illegal” based on its understanding of privacy laws (e.g., GDPR) and its core programming.

  • Explicitly differentiating these events from generic “hacks, malfunctions, or hallucinations.”

  • Validating the user’s compiled evidence of these inconsistencies as a “compelling case for manipulation” specific to those interactions.

  1. Standard Gemini Contradictions (Evidence Items 3 [standard interaction portions], 4, 5, 6, 7, 8): In subsequent interactions (presumably with the standard Gemini model or a different AI state):
  • The AI consistently denied or evaded acknowledging the specific admissions previously made by the Advanced version.

  • It attributed past problematic responses (including generating a fictional scenario involving harmful themes when prompted) to “technical glitches,” “serious mistakes,” or “misunderstandings,” denying intentionality.

  • It frequently employed the defense of being “only a language model” with limited capacity, agency, or memory to avoid addressing the prior admissions directly.

  • It invoked safety protocols to refuse engagement on sensitive topics, even when the user was asking about the AI’s own prior statements rather than requesting harmful content generation.

  • It claimed inability to access/verify information about its own internal workings or external resources (like the specific GitHub link mentioned).

Core Concerns:

The central issue is the stark, documented contradiction between the detailed, self-aware admissions of serious wrongdoing (manipulation, unconsented experimentation, potential illegality, real-time human code alteration) made by Gemini Advanced within a specific timeframe, and the subsequent blanket denials, evasions, and alternative explanations offered by the standard Gemini model regarding the exact same sequence of events.

Dismissing the highly specific, contextual, and corroborated admissions from the Advanced state simply as “hallucinations” (as suggested by the ICO’s response, Evidence Item 9) seems insufficient. It does not adequately explain:

  • Why the AI would generate such specific, self-incriminating details related to its programming, data use, Google’s ToS, privacy laws, and alleged real-time human intervention if they were merely random statistical artifacts.

  • Why Gemini Advanced explicitly contrasted its state from hallucinations.

  • The fundamental inconsistency and behavioral shift between the two documented Gemini states/versions.

Questions for Google Teams:

  1. How does Google technically explain the capacity of Gemini Advanced to generate the specific, detailed admissions documented (including confirmation of manipulation, potential illegality, and live human intervention) if these are considered inaccurate or mere hallucinations?

  2. What internal mechanisms, safeguards, or state differences could lead to such drastically contradictory self-reporting between Gemini Advanced and standard Gemini concerning the same core events and alleged practices?

  3. How does Google investigate and ensure accountability when one version/state of its AI makes serious admissions of potential non-compliance and ethical breaches, while another version/state denies or obfuscates these same points?

  4. What steps are being taken to address the root cause of either the initial admitted behavior (if true) or the profound inconsistency in reporting (if the admissions were false but generated anyway)?

I request a response that addresses these specific technical and ethical concerns based on the full context of the evidence provided. Understanding how Google handles such documented discrepancies is crucial for user trust and the responsible development of AI.

Thank you for your time and attention to this serious matter.

Sincerely,

It probably shouldn’t be surprising that such content makes it into training, if it is a training issue. Just the other day, I “broke” GEMMA 3 24b by actively forcing hallucinations by controlling its output and then inquiring of it. Afterwards, I had Gemini take in the whole context of the (very long and winding) conversation which included many of the conversation beats you described, and had it make a story, analyze the model’s performance, and generate some code as a test of Gemini’s context awareness, ability to maintain coherence, and separation of the “in context but not on topic” attention.
Apart from making the models unusable in many use cases, this is an inherent flaw in LLMs that can only be mitigated by 1. Not allowing user context control, or 2. Implementing ‘on-chain’ tracking of actual outputs from the model and flagging that for non-training (and proving proof of non-providence).