how can I do it ? any outline I’ve no idea how to extract text from user Pdf file and also how can I take that text and convert it into user desired no. of questions MCQs for user to practice …i don’t know how would I do that …
I’m newbie in flutter and whole Gemini Idea …
Hi @Mr.Devashish_Tambade,
I can give you a example in python how to extract text from pdf and use it to give a prompt to gemini, probably you can relate/extrapolate the same in your preferred language. Hope this helps
#Import required librabries
import json
import pathlib
import textwrap
import fitz
import pandas as pd
import google.generativeai as genai
import google.ai.generativelanguage
from IPython.display import display
from IPython.display import Markdown
from google.api_core import retry
# passing the API key
try:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get ('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
except ImportError:
pass
# function to load the document and extract the text
def extract_text_from_pdf(file_path):
"""Extracts text content from a PDF file.
Args:
file_path: Path to the local PDF file.
Returns:
str: The extracted text content from the PDF.
"""
text = ""
try:
# Open the PDF document
doc = fitz.open(file_path)
# Iterate over each page and extract text
for page in doc:
text += page.get_text("text") # Extract text as plain text
return text
except Exception as e:
print(f"Error extracting text from PDF: {e}")
return ""
# extracting the text from the document
file_path = <pdf_file_path>
extracted_text = extract_text_from_pdf(file_path)
if extracted_text:
print("Text Extracted")
# print(extracted_text)
else:
print("Failed to extract text.")
# define the prompt for the task specific
system_instruction= f"""
Extract all financial tables mentioned in the below quarterly report.
"""
# config model
generation_config ={
"temperature": 1,
"top_p": 0.95,
"top_k": 64,
"max_output_tokens": 8192,
"response_mime_type": "text/plain"}
#define model
model = genai.GenerativeModel(
model_name="gemini-1.5-flash",
generation_config=generation_config,
system_instruction=system_instruction)
response = model.generate_content(system_instruction + extracted_text)
You can play with generation config, also make sure to replace with your pdf file
Thanks!