Skip to main content
Enterprise8 min readApril 1, 2026

Train Custom AI Models for Your Specific Document Types

Prebuilt models extract generic fields. Custom models extract YOUR fields — CPT codes, GL accounts, AIA pay apps, or any domain-specific data. Here's how to train one.

Try it free — no account needed

Open Scanner

Prebuilt OCR models are great for standard invoices and receipts. But what about medical superbills with CPT codes and modifiers? Or construction AIA pay applications with retainage percentages? Or your company's unique internal forms? That's where custom model training comes in.

Prebuilt vs. Custom Models

Prebuilt models extract generic fields: vendor name, total amount, date. Custom models extract domain-specific fields that matter to your business: CPT codes, ICD-10 diagnoses, GL accounts, cost centers, project numbers, authorization numbers, and any other structured data unique to your document types.

How Training Works

  1. Collect 10-50 sample documents — Mix different vendors, formats, and variations of the same document type.
  2. Define your fields — Specify exactly what data you want extracted: field names, types (string, number, date, table), and where they typically appear.
  3. Label the training data — Draw bounding boxes around each field in your sample documents and assign labels.
  4. Train the model — Azure Document Intelligence trains a neural model in 10-30 minutes. Neural models handle variable layouts (invoices from different vendors); template models work for fixed-layout forms.
  5. Test and iterate — Upload new documents and verify extraction accuracy. Add more training documents to improve weak areas.

Accuracy by Training Set Size

  • 10 documents: ~85% field-level accuracy
  • 50 documents: ~93% accuracy
  • 200 documents: ~97% accuracy

Continuous Improvement

The best custom models improve over time. When a user corrects an extraction error, that correction feeds back into the training pipeline. After 10 corrections accumulate, the system can automatically retrain the model with the expanded dataset. Each version is tracked with accuracy metrics and can be rolled back if a new version underperforms.

Get Started

ScanThisText's Model Training module lets enterprise teams create, train, and manage custom extraction models directly from the dashboard. Upload training documents, trigger training, monitor accuracy trends, and activate models per document type.

Ready to try it yourself?

Free OCR Scanner — No Signup

More Guides

Train Custom AI Models for Document Extraction | ScanThisText.com