Skip to main content
Healthcare7 min readApril 18, 2026

HIPAA-Compliant OCR: What a Real BAA + PHI Redaction Look Like

A signed BAA is table stakes. The real question is what the OCR platform actually does with PHI — where it's stored, how it's logged, and whether redaction runs before the text ever leaves the pipeline.

Try it free — no account needed

Open Scanner

"HIPAA compliant" has become a marketing checkbox — which is exactly why healthcare compliance officers stopped trusting it. The real question isn't whether a vendor claims compliance. It's whether they'll sign a Business Associate Agreement, and whether their architecture actually supports the obligations inside it.

What the BAA Actually Obligates

A Business Associate Agreement commits the vendor to safeguard PHI, report breaches within a defined window, limit access to the minimum necessary, and return or destroy PHI at contract end. That's not just a signature — it's an architecture requirement. Platforms that weren't built for PHI usually can't meet those terms without rebuilding.

Where Most OCR Tools Break HIPAA

  • Training on customer data: Many free OCR tools explicitly reserve the right to use uploaded content to improve their models. That's a non-starter for PHI.
  • Third-party LLM passthrough: Sending extracted text to a downstream AI without a BAA on that vendor creates a chain-of-custody break.
  • Persistent storage: Holding documents longer than needed multiplies breach surface area.
  • Unredacted exports: Exporting a medical record with every identifier intact when the downstream workflow only needs the clinical content.

What Real PHI Redaction Looks Like

ScanThisText's medical module recognizes the 18 HIPAA Safe Harbor identifiers — names, dates, MRNs, SSNs, phone numbers, addresses, device IDs, account numbers, biometric identifiers, and more — and redacts them before the extracted text leaves the processing pipeline. You get two versions side by side: the full PHI-inclusive extraction for authorized users, and a de-identified export for research, analytics, or downstream tools that don't need identifiers.

The Audit Trail Built-In

Every scan logs who uploaded it, when, from which IP, which user agent, what fields were extracted, and whether the redacted or full version was exported. HIPAA's accounting-of-disclosures requirement becomes a database query instead of a forensic investigation.

Minimum Necessary, Enforced by RBAC

HIPAA's minimum-necessary standard says users should see only the PHI required for their role. In practice, that means the scheduling team shouldn't see clinical notes, billing shouldn't see diagnoses beyond what's needed for coding, and researchers should see de-identified data only. Role-based access control on the medical module enforces those boundaries automatically — no honor system.

Signed BAA, Not Implied

Enterprise and Healthcare customers on ScanThisText get a signed BAA as part of onboarding. Sub-processors — including the LLM provider handling enrichment — are covered under back-to-back BAAs, so chain-of-custody stays intact end to end.

Bring It to Your Compliance Review

If you're scoping an OCR vendor for medical intake, chart digitization, prior-auth processing, or claims workflows, start with the BAA and architecture docs — not the feature list. Request the Enterprise Healthcare brief to share with your compliance and security teams.

Ready to try it yourself?

Free OCR Scanner — No Signup

More Guides

HIPAA-Compliant OCR with BAA and PHI Redaction | ScanThisText.com