How Record Validation Works at Particle Health

The hidden complexity behind record validation and how we built a system precise enough to get it right at scale.

A misattributed record doesn't just create a data problem, it creates a care problem. A care manager builds a care plan around the wrong patient's clinical history. The mistake travels all the way to a care decision because nothing in the pipeline was built to catch it.

So how do you prevent that? And how do you do it at scale, across thousands of facilities where data is inconsistent by design? That's what this piece is about.

Why demographic mismatches happen at all

Health records don't live in one place with one consistent representation of a patient. They're spread across thousands of facilities, each of which captured demographics at intake, each in its own way.

"William" at a 2018 hospital visit is "Bill" at the urgent care clinic he went to four years later. A patient who moved between states has two addresses depending on which record you pull. Suffixes get dropped. Hyphenated names get split. Date of birth formats vary by system.

None of these are errors in the traditional sense. They're the natural output of healthcare data being captured locally, independently, and without coordination. The challenge is building a validation layer precise enough to catch real misattributions without rejecting legitimate records that just look different across systems.

That tension between precision and recall is what shaped how we built this. We see the results of this precision in our own data. Analysis shows that our Record Validation eliminates between 0.2% and 0.5% of files and 2%-4% of queries that otherwise would have contained mismatched patient data. In a recent evaluation, out of 100+ million files, 0.34% had validation errors - 345,000 files in one week alone. This volume proves that when dealing with expansive network returns, you need a precision gate to get rid of the mismatches and enforce trust. 

How validation actually works

Record Validation runs on every clinical document after it's been retrieved from a network. It works through a series of checks, each designed to handle a different category of identity complexity.

Direct demographic matching

The first check compares the demographics submitted at the time of the query against what's extracted from the returned document. We perform normalization before matching - stripping punctuation, removing suffixes (Jr., Sr., II, III), collapsing middle initials. We also check against alias names in the document, not just the primary name, since CCDAs may carry multiple name entries for the same person. 

Most records clear this stage. The ones that don't are where it gets interesting.

Verato eMPI verification

When a document can't be confirmed through direct matching, we leverage Verato's eMPI which uses a reference database of 266 million validated identities covering 98% of reported U.S. adults. We submit the document's extracted demographics and check them against the patient's pre-established identity anchor. If it matches, the document passes.  If it doesn't, it gets deleted from storage, not just excluded from the response. 

This is the backstop for the genuinely hard cases: identity drift that direct matching can't resolve, demographic variation accumulated across years and facilities, the edge cases that slip through any static rule set.

The part that's easy to miss

We don't only validate against the demographics a customer originally submitted. We first check against an enriched demographic set generated earlier in the pipeline. 

Before retrieval begins, we use Verato's reference model to expand a patient's known identity footprint: historical names, prior addresses, demographic variants already known to belong to this person. At validation time, that enriched set is checked first.

The result: a document returned under a patient's former name or an old address can still pass because we already knew those variants were valid. Enrichment expands what counts as a valid match; validation removes what demonstrably doesn't belong. Most systems pick one. This architecture runs both in parallel.

The four pieces working together

Record Validation doesn't work in isolation. It's the final gate in a pipeline that has to function end-to-end: the Patient Provider Map identifies where a patient has received care, the Record Locator Service finds the documents, and Demographic Enrichment builds out the identity footprint that validation checks against. 

It’s worth emphasizing that Enrichment and Validation aren't independent steps. Enrichment feeds validation with the expanded demographic set that improves recall. Validation enforces the precision gate that makes enriched records trustworthy. They're better as a pair.