Need Python script today to detect image vs text PDFs
פרויקט מס' 210097
Job Statistics
| 6 Bids |
תקציב
עד 2,500 ₪
|
תוקף הפרויקט
סגור להצעות נוספות
|
טווח הצעות
250
₪
-
300
₪
לשעת עבודה
1,000
₪
-
1,500
₪
מחיר קבוע
|
הצעה ממוצעת
275
₪
לשעת עבודה
1,250
₪
מחיר קבוע
|
Job Info And Actions
תאריך פרסום:
05:49, 12 דצמבר, 2025
הצעות תתקבלנה עד:
16:51, 13 דצמבר, 2025
Need Python script today to detect image vs text PDFs
I need a Python developer available today in the next few hours to build a script that analyzes and parses PDF files.
There are two types of PDFs
One is image based scanned PDF
One is text based with Hebrew text example PDF attached
Project Requirements
Input: A folder of PDF files
Logic: For each file
Detect whether the PDF is image only or contains a text layer
If the file is an image output
"filename is an image and requires manual review"
If the file contains Hebrew text
Extract the text
Translate the extracted text to English
Environment
Python 3.13 installed on a Windows laptop
Script should run locally no Docker
Next Steps Phase 2 after this task
After this classification and translation step I will need additional parsing rules based on the extracted data this will be a follow on contract.
Ideal Freelancer
Experience with PDF parsing PyPDF2, pdfminer, OCR tools such as Tesseract
Familiarity with Hebrew OCR and right to left handling
Ability to deliver same day
Clear written communication chat or text only no phone calls
Deliverables
One Python script .py that
Detects image vs text
Extracts text when available
Translates to English
Simple instructions for running the script on Windows with Python 3.13
There are two types of PDFs
One is image based scanned PDF
One is text based with Hebrew text example PDF attached
Project Requirements
Input: A folder of PDF files
Logic: For each file
Detect whether the PDF is image only or contains a text layer
If the file is an image output
"filename is an image and requires manual review"
If the file contains Hebrew text
Extract the text
Translate the extracted text to English
Environment
Python 3.13 installed on a Windows laptop
Script should run locally no Docker
Next Steps Phase 2 after this task
After this classification and translation step I will need additional parsing rules based on the extracted data this will be a follow on contract.
Ideal Freelancer
Experience with PDF parsing PyPDF2, pdfminer, OCR tools such as Tesseract
Familiarity with Hebrew OCR and right to left handling
Ability to deliver same day
Clear written communication chat or text only no phone calls
Deliverables
One Python script .py that
Detects image vs text
Extracts text when available
Translates to English
Simple instructions for running the script on Windows with Python 3.13
תחומי הפרויקט
קבצי הפרויקט
אנא היכנס לאתר לקבלת גישה לקבצי הפרויקט
הלקוח
אנא
היכנס לאתר
לקבלת גישה ללקוח
עדכונים
אנא היכנס לאתר לקבלת גישה לעדכונים בפרויקט
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
0 פרויקטים
|
|
Private Bid
הצעה פרטית
|
3 פרויקטים
|
|