Feature Request: Add lxml & pdfplumber to Code Execution sandbox

Hi Gemini team and community,

I’ve been working extensively with Gemini Code Execution and want to share
a proposal that I believe can significantly expand the platform’s capabilities
for enterprise use cases.

Current Context

Gemini Code Execution already includes an excellent document processing stack:

  • python-docx, python-pptx, openpyxl (Office)
  • PyPDF2, reportlab (PDF)
  • pandas, numpy (data analysis)

However, I’ve identified two libraries that would perfectly complement this
stack and unlock critical capabilities that currently force users to process
documents outside of Gemini.

The Proposal

1. lxml — High-performance XML/HTML processing and validation

  • Validates Office documents against official Microsoft XSD schemas
  • Full XPath support for complex document searches
  • Robust HTML parsing for web scraping
  • 50M+ downloads/month, used by 10,000+ packages
  • Mature: 18+ years in production

2. pdfplumber — Advanced structured data extraction from PDFs

  • Automatic table extraction from PDFs
  • Exact text coordinates for layout analysis
  • Form field detection
  • 2M+ downloads/month, 5,000+ GitHub stars
  • Built on top of pdfminer (already stable ecosystem)

Why This Matters

Current problem:
Users can CREATE Office documents with python-docx/python-pptx, but cannot
VALIDATE them. This results in corrupted documents that won’t open in
Word/PowerPoint — causing frustration and loss of trust in the platform.

With lxml:

# Full flow: Create → Validate → Guarantee quality
from docx import Document
import lxml.etree as ET

doc = Document()
# ... create content
doc.save("report.docx")

# Validate against Microsoft's official schema
schema = ET.XMLSchema(file="office-schema.xsd")
if schema.validate(doc_xml):
    print("✅ Valid document, ready for distribution")

With pdfplumber:

# Automatically extract tables from invoices/reports
import pdfplumber
import pandas as pd

with pdfplumber.open("invoice.pdf") as pdf:
    tables = pdf.pages[0].extract_tables()
    df = pd.DataFrame(tables[0][1:], columns=tables[0][0])
    df.to_excel("analysis.xlsx")  # Ready for accounting systems

Real Enterprise Use Cases

  1. Legal contract automation — Validate generated documents against
    corporate templates
  2. Invoice processing — Extract tables from PDFs and export to
    accounting systems
  3. Financial reports — Guarantee Word/Excel documents are valid
    before distribution
  4. Form analysis — Process government PDFs and extract structured data

Impact

For users:

  • Complete workflows inside Gemini (no external tools needed)
  • Quality assurance on generated documents
  • True end-to-end document automation

For Google:

  • Differentiation vs competitors (Claude, GPT-4)
  • Increased enterprise adoption
  • Positioning Gemini as a complete automation platform

Implementation cost:

  • ~7MB total footprint
  • Mature, secure libraries (18+ and 8+ years respectively)
  • No conflicts with current stack
  • No binary dependencies required

Request

Would the Gemini team consider adding lxml and pdfplumber to the Code
Execution sandbox? I’ve prepared a detailed technical proposal including
full use cases, security analysis, and phased implementation plan —
happy to share it if there’s interest.

I’m also available for beta testing if that would help evaluate the addition.

Thanks for building such a great platform — these two additions would make
it truly complete for document automation.