In the landscape of data-driven decision-making, the journey begins long before analysts delve into complex algorithms or impressive visualizations. It starts at the very foundation: data retrieval. Picture this stage as the excavation site where raw material is unearthed, awaiting refinement into a valuable commodity. But just as miners sift through tons of ore to find precious gems, data retrieval involves meticulous processes to ensure the purity and integrity of the information extracted.
Our mission is to unlock the power of medical records in an intelligent platform that focuses health back on the patient. To achieve this vision and provide high-quality insights, we have made significant investments in our ability to locate, validate, standardize, and transform clinical data at scale.
Locate
Longitudinal clinical records are located and aggregated across all providers and EHRs.
- Geographic and Facility Coverage: We capture data across providers and EHRs by connecting to the three national networks (Carequality, Commonwell, and eHealth Exchange), as well as the New York State HIE (Healthix) and the California State HIE (Manifest MedEx). Not all facilities are accessible via the national networks, which means we're constantly evaluating gaps in our coverage in order to prioritize additional integrations. This wide net allows us to give healthcare providers a more complete view of a patient's medical history, regardless of where they received care.
- Proprietary Record Locator Service (RLS): Data on the networks is indexed meaning you need to know where to look to find it; you can’t just ping every endpoint since that would overwhelm the networks and cause a significant data delay. While Commonwell has it's own RLS, implementors on Carequality and eHealthExchange are responsible for managing their own RLS. That's why our intelligent RLS is a core component of our data quality strategy. Our RLS allows us to pinpoint the location of patient records across various networks. Many solutions perform a basic geographic search (50 or 100-mile radius of patient zip code), while we leverage various parameters in our search criteria including:
- Geography-based: We perform a 50-mile radius search, plus a state-wide search. We also search previous zip codes.
- Condition-based: Chronically ill patients often travel out of state for care, meaning traditional solutions powered by a geo-based RLS will miss this data. By parsing known patient diagnoses, we perform additional searches at relevant Centers of Excellence in the US across seven core specialties.
- Activity-based: We have insight into network traffic and patient activity, so we know where a patient has gone or is likely to go for care.
Validate
Have confidence that records are always associated with the right patient.
- Patient Identification via EMPI: The Enterprise Master Patient Index (EMPI) is a key service we use as part of the query process. By accurately identifying patients, the EMPI ensures that all data retrieved is associated with the correct individual. This is essential for both the integrity of the data and the safety of patient care.
- Address Verification: Accurate data starts with accurate inputs. We verify and normalize patients' addresses to ensure that all of the location data we collect and store is precise. This is especially important when tracking patient information across multiple care settings.
- Record Validation: Historically, we’ve observed situations where the networks returned records incorrectly. This poses a risk to patient safety and confidentiality. As a result, we conduct an additional check to verify that the patient data demographics inbound from the customer match the patient data demographics in clinical documents obtained from the EMRs.
A data quality program involves more than just ensuring data integrity; it also requires addressing concerns quickly and troubleshooting issues to improve service quality. To maximize data capture, we closely monitor network error rates and use internal dashboards and other tools to understand and track performance metrics in real-time. We fine-tune our retries and timeouts to gather as much data as possible. Additionally, we participate in network committees and promptly adopt new APIs as they become available.
While scalability is not directly tied to data quality, it is crucial due to the large volume of files we process each month, totaling in the tens of millions. Our system handles various querying patterns, from individual patient queries to large batches, especially when customers need information on upcoming appointments. By utilizing an event-based architecture, we ensure high scalability, allowing us to manage spikes in query volumes effectively without affecting processing speeds.
When discussing data and data quality, it’s always important to note limitations. Here are a few:
- While the overwhelming majority are, not all providers are part of health networks yet and not all EHRs support connectivity. This means we can’t always access every piece of data through our HIE connections.
- Some data exists in less accessible formats like PDF or TIFF files, which can be challenging to integrate seamlessly.
Maintaining high-quality data when retrieving records from health information exchanges involves a combination of advanced technology, careful monitoring, and constant adaptation to new challenges.
About the author
Jean Barmash
Jean is the Chief Technology Officer at Particle Health.