Single-file extraction is fine for ad-hoc use. But when your AP team receives 200 vendor invoices at month-end, or your medical billing team processes 500 EOBs weekly, you need batch processing that handles volume reliably.
The Batch Processing Pipeline
A well-designed batch pipeline has four stages: intake (file upload and validation), extraction (OCR + AI processing), enrichment (classification, GL coding, compliance checks), and output (save to database, trigger workflows, generate reports).
Key Design Principles
- Sequential by default: Process files one at a time to respect API rate limits and maintain quality. Parallel processing saves time but risks throttling.
- Per-file status tracking: Each file should have its own status (pending, extracting, saving, done, error) so users know exactly what's happening.
- Graceful error handling: One failed file shouldn't stop the entire batch. Log the error, skip to the next file, and report failures at the end.
- Progress visibility: Show a real-time progress bar with X/N files completed and estimated time remaining.
Batch Upload in Practice
ScanThisText's batch upload accepts up to 20 files per batch. Each file is extracted sequentially with Azure Document Intelligence and AI enrichment. Results are saved to your team's org-scoped database with full audit trails. A progress bar shows real-time status for each file in the queue.