Need Python script today to detect image vs text PDFs

פרויקט מס' 210097

6 Bids	תקציב עד 2,500 ₪	תוקף הפרויקט סגור להצעות נוספות	טווח הצעות 250 ₪ - 300 ₪ לשעת עבודה 1,000 ₪ - 1,500 ₪ מחיר קבוע	הצעה ממוצעת 275 ₪ לשעת עבודה 1,250 ₪ מחיר קבוע

תקציב

עד 2,500 ₪

תוקף הפרויקט

סגור להצעות נוספות

טווח הצעות

250 ₪ - 300 ₪ לשעת עבודה

1,000 ₪ - 1,500 ₪ מחיר קבוע

הצעה ממוצעת

275 ₪ לשעת עבודה

1,250 ₪ מחיר קבוע

שתף במייל דווח

תאריך פרסום: 05:49, 12 דצמבר, 2025

הצעות תתקבלנה עד: 16:51, 13 דצמבר, 2025

Need Python script today to detect image vs text PDFs

I need a Python developer available today in the next few hours to build a script that analyzes and parses PDF files.
There are two types of PDFs

One is image based scanned PDF

One is text based with Hebrew text example PDF attached

Project Requirements

Input: A folder of PDF files

Logic: For each file

Detect whether the PDF is image only or contains a text layer

If the file is an image output
"filename is an image and requires manual review"

If the file contains Hebrew text

Extract the text

Translate the extracted text to English

Environment

Python 3.13 installed on a Windows laptop

Script should run locally no Docker

Next Steps Phase 2 after this task

After this classification and translation step I will need additional parsing rules based on the extracted data this will be a follow on contract.

Ideal Freelancer

Experience with PDF parsing PyPDF2, pdfminer, OCR tools such as Tesseract

Familiarity with Hebrew OCR and right to left handling

Ability to deliver same day

Clear written communication chat or text only no phone calls

Deliverables

One Python script .py that

Detects image vs text

Extracts text when available

Translates to English

Simple instructions for running the script on Windows with Python 3.13