Tuesday, July 29, 2008

Notes from 7/29/08 Patient Matching call

OpenMRS De-duplication Process:

1. As a first pass, James will use an SQL query approach to create a data source reader (DSR) to access OpenMRS patient data from the Patient, Name, Address, and Attribute tables. To accomplish this we will need to map java object properties to SQL database field names. We will eventually evaluate using HQL after demonstrating functionality using SQL.

2. Nyoman will load "test5" synthetic data into OpenMRS. To do this he will need to create the following attributes: SSN, CC Num, CCV, CC expiration date. We may need to ask OPENMRS-DEV if any existing functions exist to do this (semi) automatically.

3. Nyoman will add functionality to introspect the attribute table, and will add attribute fields to the management web interface.

4. After the web GUI and DSR are implemented, James and Nyoman will ensure that the standard XML linkage configuration file is created. This will persist the deduplication configuration data. The XML config file will have defaults for certain parameters file. These defaults include data source (OpenMRS), number of random samples (100,000), string comparator (exact match)

5. James will test the new FormPairs implementation that avoids redundant pairs; he will determine whether the EM analysis and random sampling analysis are using the non-redundant FormPairs when deduplicating.

6. James will test the nascent transitive grouping functionality, which ultimately identifies the groups of potential duplicates.


User Workflow script for OpenMRS deduplication:

1. User logs in to OpenMRS

2. User selects admin page

3. User selects "Manage Configuration" from Patient Matching Module section

4. User creates 1-to-n "blocking runs" configuration from Patient, Address, Name, and Attribute table fields

5. User clicks generate report


GUI for Manual Review of Record Pairs:

1. As an initial "low-hanging fruit" strategy, the Manual Review GUI will access MatchResult (MR) objects in memory, rather than in a persistent database. Ultimately we want to persist MR's in a relational data model, but for initial testing and prototyping, we will use in-memory MatchResults.

2. Nyoman will design and prototype the Manual Review GUI features using the Power Point slide to prioritize features.

3. The Manual review GUI will be incorporated into the larger RecMatch GUI application.

Tuesday, July 22, 2008

Notes from 7/22/08 Patient Matching call

1. Ubuntu VM up and running

2. OpenMRS re-installed

3. Ticket 897 reviewed -- no major revisions

4. Minor Issues to address:
  • Lock the global config panel size (not adjustable in the vertical)
  • Ensure the config file history is updated in a timely fashion
  • Review De-duplicate check box work flow (Data source config complete check box is inactivated when un-checking de-duplicate check box)
  • Re-ordering blocking sessions is not saved
5. We briefly reviewed the evolving data model for the RecMatch process. Shaun will review further with Nyoman/James at a later time.

6. Nyoman will review requirements for the manual review GUI.

Still to be done:

7. Generate synthetic data (Shaun)

8. Load synthetic data into OpenMRS (James/Nyoman)

9. James to evaluate (speed/efficiency) using an HQL query approach to accessing patient data in OpenMRS

10. James/Nyoman to integrate UID/de-duplicate workflow; build uid rules into FormPairs

Notes from 7/15/08 Patient Matching call

1. Re-instantiate VM (Shaun/James)

2. Reload OpenMRS on the VM (Nyoman)

3. Generate synthetic data (Shaun)

4. Load synthetic data into OpenMRS (James/Nyoman)

5. James to evaluate (speed/efficiency) using an HQL query approach to accessing patient data in OpenMRS

6. James/Nyoman to integrate UID/de-duplicate workflow; build uid rules into FormPairs

7. Shaun to test ticket 897 changes

Tuesday, July 8, 2008

Notes from 7/8/08 Patient Matching Call

1. James will complete code and to test patient demographic queries using HQL

2. Nyoman to begin work on ticket #897

3. OpenMRS linkage VM set-up for development and testing. Nyoman to provide a public key for SSH access.

4. Nyoman to revise OpenMRS Web Admin screens
  • Create 3-c0lumn view for new configuration; field name, "blocking" check box, "include" check box
  • "Create Report" screen will directly invoke the de-duplication report
  • Scheduled report generation will be added later
5. James will implement a FormPairs object that implements rules specific to de-duplication

6. We discussed the concept of 'derived traits'. Any additional data that is added to the raw data source is a 'derived trait'. Examples of potential derived traits include Soundex of name, NYSIIS of name, and unique record ID ("uid"). Although we'll assume for the time being that the UID is present, we'll soon need to be able to add UID's to data sources that don't contain them natively, but this requires substantial change to the RecMatch work flow. Shaun will draft an overview of the work required to implement the a derived trait framework.

7. Shaun will generate synthetic patient data using DBGen.