OCR (Optical Character Recognition) is technology that converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

What file formats are supported?

We support all major image formats including JPG, PNG, WEBP, TIFF, BMP, and PDF documents.

How accurate is the text recognition?

Our OCR engine achieves 99.9% accuracy on clear, high-quality documents. Accuracy may vary based on image quality, handwriting, and document complexity.

Yes, all data is encrypted in transit and at rest. We use industry-standard security practices and do not share your data with third parties.

What languages are supported?

We support over 107 languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and many more.

¿Funciona con documentos en español?

Sí, nuestro OCR soporta español y más de 107 idiomas. Simplemente selecciona 'Español' antes de escanear tu documento para obtener los mejores resultados con texto en español.

Can I scan Spanish documents?

Yes! Our OCR fully supports Spanish language documents. Select 'Español' as your document language before scanning to get optimized results for Spanish text.

Can I translate scanned text?

Yes! ScanThisText offers AI-powered translation for extracted text. After scanning a document, you can translate it to any of our 107+ supported languages instantly.

How many languages can I translate to?

Our translation service supports 107+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Portuguese, Italian, Russian, Korean, and many more.

¿Puedo traducir documentos escaneados?

¡Sí! ScanThisText ofrece traducción con IA para texto extraído. Después de escanear un documento, puedes traducirlo instantáneamente a cualquiera de nuestros 107+ idiomas compatibles.

Batch Document Processing: 100+ Files at Once

Single-file extraction is fine for ad-hoc use. But when your AP team receives 200 vendor invoices at month-end, or your medical billing team processes 500 EOBs weekly, you need batch processing that handles volume reliably.

The Batch Processing Pipeline

A well-designed batch pipeline has four stages: intake (file upload and validation), extraction (OCR + AI processing), enrichment (classification, GL coding, compliance checks), and output (save to database, trigger workflows, generate reports).

Key Design Principles

Sequential by default: Process files one at a time to respect API rate limits and maintain quality. Parallel processing saves time but risks throttling.
Per-file status tracking: Each file should have its own status (pending, extracting, saving, done, error) so users know exactly what's happening.
Graceful error handling: One failed file shouldn't stop the entire batch. Log the error, skip to the next file, and report failures at the end.
Progress visibility: Show a real-time progress bar with X/N files completed and estimated time remaining.

Batch Upload in Practice

ScanThisText's batch upload accepts up to 20 files per batch. Each file is extracted sequentially with Azure Document Intelligence and AI enrichment. Results are saved to your team's org-scoped database with full audit trails. A progress bar shows real-time status for each file in the queue.

Batch Document Processing: How to Handle 100+ Files at Once

The Batch Processing Pipeline

Key Design Principles

Batch Upload in Practice

More Guides