OCR (Optical Character Recognition) is technology that converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

What file formats are supported?

We support all major image formats including JPG, PNG, WEBP, TIFF, BMP, and PDF documents.

How accurate is the text recognition?

Our OCR engine achieves 99.9% accuracy on clear, high-quality documents. Accuracy may vary based on image quality, handwriting, and document complexity.

Yes, all data is encrypted in transit and at rest. We use industry-standard security practices and do not share your data with third parties.

What languages are supported?

We support over 107 languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and many more.

¿Funciona con documentos en español?

Sí, nuestro OCR soporta español y más de 107 idiomas. Simplemente selecciona 'Español' antes de escanear tu documento para obtener los mejores resultados con texto en español.

Can I scan Spanish documents?

Yes! Our OCR fully supports Spanish language documents. Select 'Español' as your document language before scanning to get optimized results for Spanish text.

Can I translate scanned text?

Yes! ScanThisText offers AI-powered translation for extracted text. After scanning a document, you can translate it to any of our 107+ supported languages instantly.

How many languages can I translate to?

Our translation service supports 107+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Portuguese, Italian, Russian, Korean, and many more.

¿Puedo traducir documentos escaneados?

¡Sí! ScanThisText ofrece traducción con IA para texto extraído. Después de escanear un documento, puedes traducirlo instantáneamente a cualquiera de nuestros 107+ idiomas compatibles.

Train Custom AI Models for Document Extraction

Prebuilt OCR models are great for standard invoices and receipts. But what about medical superbills with CPT codes and modifiers? Or construction AIA pay applications with retainage percentages? Or your company's unique internal forms? That's where custom model training comes in.

Prebuilt vs. Custom Models

Prebuilt models extract generic fields: vendor name, total amount, date. Custom models extract domain-specific fields that matter to your business: CPT codes, ICD-10 diagnoses, GL accounts, cost centers, project numbers, authorization numbers, and any other structured data unique to your document types.

How Training Works

Collect 10-50 sample documents — Mix different vendors, formats, and variations of the same document type.
Define your fields — Specify exactly what data you want extracted: field names, types (string, number, date, table), and where they typically appear.
Label the training data — Draw bounding boxes around each field in your sample documents and assign labels.
Train the model — Azure Document Intelligence trains a neural model in 10-30 minutes. Neural models handle variable layouts (invoices from different vendors); template models work for fixed-layout forms.
Test and iterate — Upload new documents and verify extraction accuracy. Add more training documents to improve weak areas.

Accuracy by Training Set Size

10 documents: ~85% field-level accuracy
50 documents: ~93% accuracy
200 documents: ~97% accuracy

Continuous Improvement

The best custom models improve over time. When a user corrects an extraction error, that correction feeds back into the training pipeline. After 10 corrections accumulate, the system can automatically retrain the model with the expanded dataset. Each version is tracked with accuracy metrics and can be rolled back if a new version underperforms.

Get Started

ScanThisText's Model Training module lets enterprise teams create, train, and manage custom extraction models directly from the dashboard. Upload training documents, trigger training, monitor accuracy trends, and activate models per document type.

Train Custom AI Models for Your Specific Document Types

Prebuilt vs. Custom Models

How Training Works

Accuracy by Training Set Size

Continuous Improvement

Get Started

More Guides