Is there a workaround for the 'Recitation' response?

I am trying to get gemini-1.5-flash to translate text and it is giving me a Recitation response.

Recitation issue

If you see the model stops generating output due to the RECITATION reason, this means the model output may resemble certain data. To fix this, try to make prompt / context as unique as possible and use a higher temperature.

Array
(

    (
        [0] => Array
            (
                [finishReason] => RECITATION
                [safetyRatings] => Array
                    (
                        [0] => Array
                            (
                                [category] => HARM_CATEGORY_HATE_SPEECH
                                [probability] => NEGLIGIBLE
                            )

                        [1] => Array
                            (
                                [category] => HARM_CATEGORY_DANGEROUS_CONTENT
                                [probability] => NEGLIGIBLE
                            )

                        [2] => Array
                            (
                                [category] => HARM_CATEGORY_HARASSMENT
                                [probability] => NEGLIGIBLE
                            )

                        [3] => Array
                            (
                                [category] => HARM_CATEGORY_SEXUALLY_EXPLICIT
                                [probability] => NEGLIGIBLE
                            )

                    )

                [citationMetadata] => Array
                    (
                        [citationSources] => Array
                            (
                                [0] => Array
                                    (
                                        [startIndex] => 52
                                        [endIndex] => 1908
                                        [uri] => https://www.sefaria.org/II_Samuel.24.14-17?with=Midrash
                                    )

                            )

                    )

                [avgLogprobs] => NaN
            )

    )

[usageMetadata] => Array
    (
        [promptTokenCount] => 1600
        [totalTokenCount] => 1600
    )

“Modified by moderator”

The recitation url is to text which translates a portion of the text, but not all of it. Why would Google let it’s AI search the Internet for this text instead of translating it as I requested?

Is there a workaround for this?

Google knows a lot about the contents of the internet. They also have a $400B valuation product that survives purely on censoring and striking user content from multimodal sources. Just about everything one scrapes from the internet was written by someone in the last 95 years, and somebody holds copyright, just like these original words I write now are automatically under my copyright.

“Recitation” is about the output looking like training corpus information or reproducing faithfully and predictably other text.

There is no allowance for fair use, no understanding that 2000 year old bible verses might be public domain, so when you start producing something extremely predictable, this unintelligent system kicks in.

So: you have to produce an output that is significantly different and synthesized, transformative. Ask for each line of text to be placed in a JSON array of strings or such, before the pattern could get the output shut down.

1 Like

Wow. Just plain Wow. You ask the model to look for something on the Internet, and it tells you it can’t – all the time monitoring everything you give it against Internet data.

I could understand if I were asking it to translate a passage from a Stephen King novel, but a Biblical passage written in Hebrew by a Rabbi who died 2,000 years ago?

From Sefaria, the source Google acknowledges for the texts: Use and Reuse of Sources - Copyright Status | Sefaria

Most texts on Sefaria have a Creative Commons license, a notice that the text is “Public Domain”, or a copyright notice. Texts in the public domain do not need attribution. Texts with a CC BY license must be attributed to the publisher. While there is no legal requirement to credit Sefaria, we always appreciate being acknowledged.

How do you create anything in a world where everything is copyrighted or trademarked?

Or is it that Google is angling to become the sole source for search for any and everything?

Wow.

It is just to avoid litigious action.

OpenAI and Microsoft have similar shutoffs on their output when they are producing things largely unaltered from training corpus.

It also depends on the Google model, it seems. Gemini 1.5 Pro produced 3000 tokens of this, but Gemini 1.5 Flash 002 got shut off:

I hear you and I understand the intent. The problem is that it is going to curtail the most useful benefit of these machines: Research of historical literature and texts.

I mean, I know everyone is blown away with creating videos and images and the automation possibilities in robotics and solving previously unsolvable science and physics and mathematical problems…

But when you can’t even get it to give you the US Constitution or a Biblical verse… What happens when they start cutting off everything that exists on a website somewhere for fear of copyright infringement?

All the information will be controlled by Big Tech – and they will then charge the rest of us a subscription fee to be able to access what we used to be able to access for free. AI was supposed to be able to increase the availability of knowledge, not kill it.