Tuesday, July 29, 2008

Notes from 7/29/08 Patient Matching call

OpenMRS De-duplication Process:

1. As a first pass, James will use an SQL query approach to create a data source reader (DSR) to access OpenMRS patient data from the Patient, Name, Address, and Attribute tables. To accomplish this we will need to map java object properties to SQL database field names. We will eventually evaluate using HQL after demonstrating functionality using SQL.

2. Nyoman will load "test5" synthetic data into OpenMRS. To do this he will need to create the following attributes: SSN, CC Num, CCV, CC expiration date. We may need to ask OPENMRS-DEV if any existing functions exist to do this (semi) automatically.

3. Nyoman will add functionality to introspect the attribute table, and will add attribute fields to the management web interface.

4. After the web GUI and DSR are implemented, James and Nyoman will ensure that the standard XML linkage configuration file is created. This will persist the deduplication configuration data. The XML config file will have defaults for certain parameters file. These defaults include data source (OpenMRS), number of random samples (100,000), string comparator (exact match)

5. James will test the new FormPairs implementation that avoids redundant pairs; he will determine whether the EM analysis and random sampling analysis are using the non-redundant FormPairs when deduplicating.

6. James will test the nascent transitive grouping functionality, which ultimately identifies the groups of potential duplicates.


User Workflow script for OpenMRS deduplication:

1. User logs in to OpenMRS

2. User selects admin page

3. User selects "Manage Configuration" from Patient Matching Module section

4. User creates 1-to-n "blocking runs" configuration from Patient, Address, Name, and Attribute table fields

5. User clicks generate report


GUI for Manual Review of Record Pairs:

1. As an initial "low-hanging fruit" strategy, the Manual Review GUI will access MatchResult (MR) objects in memory, rather than in a persistent database. Ultimately we want to persist MR's in a relational data model, but for initial testing and prototyping, we will use in-memory MatchResults.

2. Nyoman will design and prototype the Manual Review GUI features using the Power Point slide to prioritize features.

3. The Manual review GUI will be incorporated into the larger RecMatch GUI application.

No comments: