Handling greek-like characters in Google Document AI (OCR ) with java

Isabel_Sanchez_Benit · October 24, 2025, 11:21pm

While working with Google Document AI for text extraction (OCR), i encountered an issues:

Sometimes the OCR returns characters that look correct, but actually belong to a diferent alphabet.

For example

Or in other cases:

ORIGINAL: ELIZABETH

After converting to lowercase: elizaβετη

Do you have any tips to homologate these cases?

Shivam_Singh2 · October 27, 2025, 9:16am

Hi @Isabel_Sanchez_Benit

Thank you for bringing this to our attention.
Could you please share the full payload details along with some sample of the code which you are using?

Isabel_Sanchez_Benit · October 27, 2025, 4:23pm

code:
        GoogleCredentials credentials = getGoogleCredentials();



        DocumentProcessorServiceSettings settings =

                DocumentProcessorServiceSettings.newBuilder()

                        .setEndpoint(endpoint).

                        setCredentialsProvider(FixedCredentialsProvider.create(credentials))

                        .build();



        try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create(settings)) {



            String name = String.format("projects/%s/locations/%s/processors/%s", projectId, locationId, invoiceId);



            ByteString content = ByteString.copyFrom(image);



            RawDocument document =

                    RawDocument.newBuilder().setContent(content).setMimeType(extension).build();



            ProcessRequest request =

                    ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();



            log.info("Send request google cloud");

            ProcessResponse result = client.processDocument(request);



            Document documentResponse = result.getDocument();

            String value = null;



            for (Document.Entity entity : documentResponse.getEntitiesList()) {

                if(entity.getMentionText().isEmpty()){

                    value = entity.getNormalizedValue().getText();

                } else {

                    value = entity.getMentionText();

                }

                info.put(entity.getType(),value);

            }

resource:

Shivam_Singh2 · November 14, 2025, 8:49am

Hello,

According to the AI API documentation, the value stored in the info map is expected to be plain text (String value). Could you please provide a code snippet demonstrating how you are retrieving data from the info map and describe how you are processing this data?"

Topic		Replies	Views
Invoice extractor using gemini pro Gemini API	2	255	May 28, 2024
Structured output from API using responseSchema - need help! Gemini API api , gemini-api	2	247	November 20, 2024
Cannot get Gemini models to follow prompt instructions Gemini API gemini-15 , prompt	5	627	October 9, 2024
Data Extraction Accuracy Issues from Documents due to Image Orientation and OCR Gemini API api , gemini-flash	17	458	August 11, 2025
Why superscript in references not showing correctly Gemini API feedback , bug	7	285	May 15, 2024

Handling greek-like characters in Google Document AI (OCR ) with java

Related topics