All roles

Python Developer Needed: High-Performance PDF Redaction & Anonymization API (PyMuPDF)

Remote · USA Full-time New today

We are a health-tech / neurotechnology platform (SaaS) looking for an experienced Python developer to build a lightweight, high-performance microservice to automate the anonymization (redaction) of medical reports (PDFs). Our web application generates automated qEEG medical reports. Currently, historical reports are stored in a secure backup vault. When a user or system requests a historical PDF, we need a middleware/microservice to intercept the file, digitally destroy specific Patient Identifiable Information (PII) on-the-fly (in memory), and stream the clean PDF to the client browser in milliseconds. We previously attempted raw byte/string replacement with pdftk and RegEx, but due to internal PDF font structures and layout kerning arrays (TJ / Tj syntax objects), raw text replacement corrupts the files. Therefore, we require a robust, visual-coordinate-based redaction approach using libraries like PyMuPDF (fitz) or Apache PDFBox. Key Responsibilities: Develop a Python script/microservice that searches for specific visual anchor labels (e.g., "Subject ID:", "Client ID:") within a PDF document. Dynamically compute the visual boundaries (bounding boxes) following these anchors to cover unknown patient codes or file names. Fysically and irreversibly destroy/redact the underlying characters using proper PDF redaction methods (e.g., page.apply_redactions() in PyMuPDF), rendering the text completely unselectable and unsearchable. Apply an invisible mask (white fill) over the redacted area to preserve the original, professional template design perfectly. Wrap this functionality in a lightweight API framework (preferably FastAPI or Flask) so our web application back-end can communicate with it via internal HTTP requests. Apply To This Job

Related roles