<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1256963867207214096</id><updated>2011-07-28T04:06:43.265-07:00</updated><title type='text'>Science and A Life of Adventure</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>9</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-5923932217155545327</id><published>2008-10-23T11:35:00.000-07:00</published><updated>2008-10-23T11:39:14.386-07:00</updated><title type='text'>Notes from 10/21/2008 Patient Matching Call</title><content type='html'>(A) We're experiencing out of memory errors in Tomcat with 100,000 patients in OpenMRS and 512 MB allocated to Tomcat.  The first blocking run (blocking on postal code) appears to be completing; the out of memory error appears to occur during the analysis phase (either during random sampling or during Expectation Maximization) of the second blocking run (blocking on SSN).&lt;br /&gt;&lt;br /&gt;Memory snapshots from the profiler reveal that a large amount of memory is allocated for the MySQL database connection. Large result sets are not being released and closed.  Even though we are attempting to close the result set, it remains open, we believe because unidentified resources are still accessing the result set.&lt;br /&gt;&lt;br /&gt;James has updated the code to address releasing the result set, and Win is testing the revision on his computer&lt;br /&gt;If this code update doesn't resolve the issue, we'll re-examine the profiler output for other places where memory use could be further optimized.&lt;br /&gt;&lt;br /&gt;Update 10/22/2008: The new code successfully ran with no out-of-memory errors observed with the new code.  Therefore&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(B) In some cases, the duplicate report listing may be large.  So rather than displaying the report in a web browser (which is not designed to display large amounts of line-list data), it may be more practical to output the duplicate report directly to a file.  Consequently, we're in the process of modifying the module to accommodate this.  Win is implementing a more robust reporting work flow using AJAX.  The user will be able to initiate a report, navigate away from the administrator interface, return to check status, and when the report is complete, a link to that report is displayed.&lt;br /&gt;&lt;br /&gt;We’ve implemented a lock-out feature so that once a report is started, no reports can be initiated until the current report is complete.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(C)  Currently we have been unable to run the “Yourkit” java profiler on Linux.  We are using Windows to profile memory usage.  If this issue further hinders progress, we will need to address the barrier we face to run Yourkit on Linux.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-5923932217155545327?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/5923932217155545327/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=5923932217155545327' title='40 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/5923932217155545327'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/5923932217155545327'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/10/10212008-patient-matching-meeting.html' title='Notes from 10/21/2008 Patient Matching Call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>40</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-4074265634782769655</id><published>2008-09-23T07:44:00.000-07:00</published><updated>2008-09-24T11:15:27.589-07:00</updated><title type='text'>Notes from 9/23/2008 Patient Matching Call</title><content type='html'>We modified our tactics for matching because the matching process was taking longer to complete than we hoped.&lt;br /&gt;&lt;br /&gt;We believe the performance issues relate to the myriad Hibernate queries that are required to create blocks of potential pairs.  For example, if the patient table contains 4,500 unique SSN's, then 4,500 different Hibernate queries must be called to create potential pairs where SSN's match.&lt;br /&gt;&lt;br /&gt;To minimize the number of Hibernate queries, the batch de-duplication process first examines the following tables: person, patient, patient_identifiers and person_attributes. A "flat", non-normalized table is then created with all fields from the above 4 tables.&lt;br /&gt;&lt;br /&gt;All further analyses and scoring are performed against the flat, non-normalized table.&lt;br /&gt;&lt;br /&gt;Recent timing test found that 10,000 patients could be extracted and stored in the the flat table in about 20 minutes, a rate of about 9 patients extracted from OpenMRS per second.&lt;br /&gt;&lt;br /&gt;Once extracted from OpenMRS, analyzing and scoring the flat table took about 20 seconds. (SSN was the blocking variable and email/SSN were the include variables)&lt;br /&gt;&lt;br /&gt;Next Steps Include:&lt;br /&gt;&lt;br /&gt;1. Verify that the module can handle multiple blocking runs, and will join the multiple runs appropriately for the human readable report.&lt;br /&gt;&lt;br /&gt;2. Verify that the current patient extraction process execution time increases linearly with the number of patients. We need to load 20-40,000 patients into OpenMRS and measure how long it takes to extract patients into a flat table. If time is not linear, we will need to consider other optimizations.&lt;br /&gt;&lt;br /&gt;3. Create blocking runs that use &lt;span style="font-weight: bold; font-style: italic;"&gt;neither&lt;/span&gt; patient_identifiers table nor the person_attributes table and measure how long these take to complete. I suspect that the "stacked" nature of these tables impacts efficiency.&lt;br /&gt;&lt;br /&gt;4. Examine which specific tasks in the data extraction process are taking up time. To do so, we discussed the following approach:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Comment out PatientToRecord method. Comment out SQL INSERT statement to load record into flat table. Measure how long it takes to iterate thru all Patients.&lt;/li&gt;&lt;li&gt;Access only 1 or 2 properties in PatientToRecord.  Comment out SQL INSERT statement to load record into flat table. Measure how long it takes to iterate thru all Patients.&lt;/li&gt;&lt;li&gt;Fully execute the PatientToRecord method.  Comment out SQL INSERT statement to load record into flat table. Measure how long it takes to iterate thru all Patients.&lt;/li&gt;&lt;/ul&gt;5. Because the de-duplication process will need to run when an OpenMRS system is not heavily loaded, it will likely need to be scheduled. We need to explore implementing a scheduling component.&lt;br /&gt;&lt;br /&gt;6. We need to create two separate modules for each distinct linkage process: One for the batch duplication use-case, and one for the  real-time matching use case (NBS). We envision creating a common package of linkage utilities that can be re-used in both modules.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-4074265634782769655?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/4074265634782769655/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=4074265634782769655' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/4074265634782769655'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/4074265634782769655'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/09/notes-from-92308-patient-matching-call.html' title='Notes from 9/23/2008 Patient Matching Call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-1406521126737724961</id><published>2008-09-02T11:48:00.000-07:00</published><updated>2008-09-02T11:56:43.264-07:00</updated><title type='text'>Notes from 9/2/2008 Patient Matching Call</title><content type='html'>1) Implemented: Display additional matching variables in the web based report so the user  can better evaluate the nature/quality of the match&lt;br /&gt;&lt;br /&gt;2) Implemented: Create a flat file output containing (at least) all blocking and  include variables, if not additional variables&lt;br /&gt;&lt;br /&gt;3) Implemented: Because different blocking runs may have different include variables,  we need to be able to include the union of all fields used across all  blocking runs&lt;br /&gt;&lt;br /&gt;4) Testing: We need to test the de-dupe module with different data sets. We need  to test both for A) performance (how long does it take to match larger  numbers of patients), and B) bugs -- what issues will we run into with  different data?&lt;br /&gt;&lt;br /&gt;Results to date: An out of memory error occurs with an OpenMRS patient table containing 10,000 patients (likely a Tomcat memory error)&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;  Had increased Tomcat memory allocation to 1024M, no change&lt;/li&gt;&lt;li&gt;We may need to use a software profiler to see what processes are using memory&lt;/li&gt;&lt;li&gt;  Question: are match results objects (MR's) stored in memory? MR's should not be stored in memory -- to minimize memory use, each MR should be handled and written to a file, etc.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-1406521126737724961?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/1406521126737724961/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=1406521126737724961' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/1406521126737724961'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/1406521126737724961'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/09/922008-patient-matching-notes.html' title='Notes from 9/2/2008 Patient Matching Call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-2707873502218663251</id><published>2008-08-05T10:01:00.000-07:00</published><updated>2008-08-05T19:29:57.863-07:00</updated><title type='text'>Notes from 8/5/08 Patient Matching Call</title><content type='html'>1. The HQL-based data source reader (DSR) nearly complete. James needs to add patient_id (== person_id?) to the matching objects. About 1 day is needed to complete the DSR. James will move DSR to link VM for testing/implementation. He will test the DSR's ability to use multiple blocking columns and multiple data types as blocking columns&lt;br /&gt;&lt;br /&gt;2. Nyoman completed data loading utility. Takes approximately 1 second per patient to load. He will post his code to the OpenMRS website and get others' feedback on speed, etc.&lt;br /&gt;&lt;br /&gt;3. Nyoman has implemented introspection of person attributes. We discussed what to do if an OpenMRS implementer adds a new person_attribute after creating blocking runs in an operational system. Decision was that the existing blocking runs would need to be manually deleted by the user and new blocking runs created.&lt;br /&gt;&lt;br /&gt;4. Nyoman to ensure the config.xml file contains the presets for random sampling (="true"); number of samples (100,000); uid_field=??.&lt;br /&gt;&lt;br /&gt;5. James will review how the DSR interacts with, and functions appropriately with, the following processes: 1) random sampling, 2) EM analysis, 3) Scoring pairs. There may be some performance issues with Hibernate/caching that we'll need to address.&lt;br /&gt;&lt;br /&gt;6. Transitive grouping function will be tested as part of the end-to-end workflow (Nyoman/James). The transitive grouping report requires a "Group ID". The transitive grouping report will initially generate a flat file.&lt;br /&gt;&lt;br /&gt;7. (New code) Each blocking run typically has a different cut-off score, and that cut-off score has been manually determined in the past. A method is needed to automatically determine the cut-off score. The cut-off score will be calculated based on the total number of EM-estimated true matches.&lt;br /&gt;&lt;br /&gt;8. The following work flow will need to be implemented to deliver the de-duplication functionality:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;User clicks "Generate De-dupe Report" (&lt;span style="font-weight: bold;"&gt;new code&lt;/span&gt;)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;XML matching configuration file is read, matching objects are created (code exists, needs testing, modification)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Potential pairs are formed using the Blocking fields configured by the user. The patient_id (person_id) will be used to avoid redundant pairs being formed (code exists, needs testing, modification)&lt;/li&gt;&lt;li&gt;The pairs are randomly sampled to calculate u-values (code exists, needs testing, modification)&lt;/li&gt;&lt;li&gt;The pairs are evaluated by EM to calculate m-values and estimate the number of true matches (code exists, needs testing, modification)&lt;/li&gt;&lt;li&gt;The match cut-off score is calculated (&lt;span style="font-weight: bold;"&gt;new code&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;True matches (determined by score cut-off) from all 3 blocking runs are "squeezed" based on unique ID's (&lt;span style="font-weight: bold;"&gt;new code&lt;/span&gt;)&lt;/li&gt;&lt;li&gt;The "squeezed" pairs are processed by transitive grouping function. The transitive grouping function should include a "Group ID" to identify records that belong to the same duplicate group. (code exists, needs testing, modification)&lt;/li&gt;&lt;li&gt;The grouped pairs are output to a flat file. (&lt;span style="font-weight: bold;"&gt;new code&lt;/span&gt;)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The OpenMRS User reviews the flat file for duplicates&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-2707873502218663251?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/2707873502218663251/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=2707873502218663251' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/2707873502218663251'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/2707873502218663251'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/08/1.html' title='Notes from 8/5/08 Patient Matching Call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-3610109398450195931</id><published>2008-07-29T14:12:00.001-07:00</published><updated>2008-07-29T14:12:58.454-07:00</updated><title type='text'>Notes from 7/29/08 Patient Matching call</title><content type='html'>&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;OpenMRS&lt;/span&gt; De-duplication Process:&lt;br /&gt;&lt;br /&gt;1. As a first pass, James will use an &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;SQL&lt;/span&gt; query approach to create a data source reader (&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;DSR&lt;/span&gt;) to access &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;OpenMRS&lt;/span&gt; patient data from the Patient, Name, Address, and Attribute tables. To accomplish this we will need to map java object properties to &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;SQL&lt;/span&gt; database field names. We will eventually evaluate using &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;HQL&lt;/span&gt; after demonstrating functionality using &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;SQL&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;2. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;Nyoman&lt;/span&gt; will load "test5" synthetic data into &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;OpenMRS&lt;/span&gt;. To do this he will need to create the following attributes: &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;SSN&lt;/span&gt;, CC &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;Num&lt;/span&gt;, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;CCV&lt;/span&gt;, CC expiration date. We may need to ask &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;OPENMRS&lt;/span&gt;-DEV if any existing functions exist to do this (semi) automatically.&lt;br /&gt;&lt;br /&gt;3. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;Nyoman&lt;/span&gt; will add functionality to introspect the attribute table, and will add attribute fields to the management web interface.&lt;br /&gt;&lt;br /&gt;4. After the web GUI and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;DSR&lt;/span&gt; are implemented, James and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;Nyoman&lt;/span&gt; will ensure that the standard XML linkage configuration file is created. This will persist the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_16"&gt;deduplication&lt;/span&gt; configuration data. The XML &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;config&lt;/span&gt; file will have defaults for certain parameters file. These defaults include data source (&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_18"&gt;OpenMRS&lt;/span&gt;), number of random samples (100,000), string comparator (exact match)&lt;br /&gt;&lt;br /&gt;5. James will test the new &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;FormPairs&lt;/span&gt; implementation that avoids redundant pairs; he will determine whether the EM analysis and random sampling analysis are using the non-redundant &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;FormPairs&lt;/span&gt; when &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;deduplicating&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;6. James will test the nascent transitive grouping functionality, which &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_22"&gt;ultimately&lt;/span&gt; identifies the groups of potential duplicates.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;User &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;Workflow&lt;/span&gt; script for &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_24"&gt;OpenMRS&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_25"&gt;deduplication&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;1. User logs in to &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_26"&gt;OpenMRS&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;2. User selects admin page&lt;br /&gt;&lt;br /&gt;3. User selects "Manage Configuration" from Patient Matching Module section&lt;br /&gt;&lt;br /&gt;4. User creates 1-to-n "blocking runs" configuration from Patient, Address, Name, and Attribute table fields&lt;br /&gt;&lt;br /&gt;5. User clicks generate report&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;GUI for Manual Review of Record Pairs:&lt;br /&gt;&lt;br /&gt;1. As an initial "low-hanging fruit" strategy, the Manual Review GUI will access &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_27"&gt;MatchResult&lt;/span&gt; (MR) objects in memory, rather than in a persistent database. Ultimately we want to persist &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_28"&gt;MR's&lt;/span&gt; in a relational data model, but for initial testing and prototyping, we will use in-memory &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_29"&gt;MatchResults&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;2. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_30"&gt;Nyoman&lt;/span&gt; will design and prototype the Manual Review GUI features using the Power Point slide to prioritize features.&lt;br /&gt;&lt;br /&gt;3. The Manual review GUI will be incorporated into the larger &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_31"&gt;RecMatch&lt;/span&gt; GUI application.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-3610109398450195931?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/3610109398450195931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=3610109398450195931' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/3610109398450195931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/3610109398450195931'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/07/notes-from-72908-patient-matching-call.html' title='Notes from 7/29/08 Patient Matching call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-7263500105658580672</id><published>2008-07-22T08:45:00.000-07:00</published><updated>2008-07-22T14:01:20.146-07:00</updated><title type='text'>Notes from 7/22/08 Patient Matching call</title><content type='html'>1. Ubuntu VM up and running&lt;br /&gt;&lt;br /&gt;2. OpenMRS re-installed&lt;br /&gt;&lt;br /&gt;3. Ticket 897 reviewed -- no major revisions&lt;br /&gt;&lt;br /&gt;4. Minor Issues to address:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Lock the global config panel size (not adjustable in the vertical)&lt;/li&gt;&lt;li&gt;Ensure the config file history is updated in a timely fashion&lt;/li&gt;&lt;li&gt;Review De-duplicate check box work flow (Data source config complete check box is inactivated when un-checking de-duplicate check box)&lt;/li&gt;&lt;li&gt;Re-ordering blocking sessions is not saved&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;5. We briefly reviewed the evolving data model for the RecMatch process. Shaun will review further with Nyoman/James at a later time.&lt;br /&gt;&lt;br /&gt;6. Nyoman will review requirements for the manual review GUI.&lt;br /&gt;&lt;br /&gt;Still to be done:&lt;br /&gt;&lt;br /&gt;7. Generate synthetic data (Shaun)&lt;br /&gt;&lt;br /&gt;8. Load synthetic data into OpenMRS (James/Nyoman)&lt;br /&gt;&lt;br /&gt;9. James to evaluate (speed/efficiency) using an HQL query approach to accessing patient data in OpenMRS&lt;br /&gt;&lt;br /&gt;10. James/Nyoman to integrate UID/de-duplicate workflow; build uid rules into FormPairs&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-7263500105658580672?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/7263500105658580672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=7263500105658580672' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/7263500105658580672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/7263500105658580672'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/07/notes-from-72208-patient-matching-call.html' title='Notes from 7/22/08 Patient Matching call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-2887573362529568011</id><published>2008-07-22T08:01:00.000-07:00</published><updated>2008-07-22T08:45:06.098-07:00</updated><title type='text'>Notes from 7/15/08 Patient Matching call</title><content type='html'>1. Re-instantiate VM (Shaun/James)&lt;br /&gt;&lt;br /&gt;2. Reload OpenMRS on the VM (Nyoman)&lt;br /&gt;&lt;br /&gt;3. Generate synthetic data (Shaun)&lt;br /&gt;&lt;br /&gt;4. Load synthetic data into OpenMRS (James/Nyoman)&lt;br /&gt;&lt;br /&gt;5. James to evaluate (speed/efficiency) using an HQL query approach to accessing patient data in OpenMRS&lt;br /&gt;&lt;br /&gt;6. James/Nyoman to integrate UID/de-duplicate workflow; build uid rules into FormPairs&lt;br /&gt;&lt;br /&gt;7. Shaun to test ticket 897 changes&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-2887573362529568011?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/2887573362529568011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=2887573362529568011' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/2887573362529568011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/2887573362529568011'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/07/notes-from-71508-patient-matching-call.html' title='Notes from 7/15/08 Patient Matching call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-8043539793234496520</id><published>2008-07-08T21:40:00.000-07:00</published><updated>2008-07-08T22:22:09.427-07:00</updated><title type='text'>Notes from 7/8/08 Patient Matching Call</title><content type='html'>1. James will complete code and to test patient demographic queries using &lt;a href="http://www.hibernate.org/hib_docs/reference/en/html/queryhql.html"&gt;HQL&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;2. Nyoman to begin work on &lt;a href="http://dev.openmrs.org/ticket/897"&gt;ticket #897&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;3. OpenMRS linkage VM set-up for development and testing. Nyoman to provide a public key for SSH access.&lt;br /&gt;&lt;br /&gt;4. Nyoman to revise OpenMRS Web Admin screens&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Create 3-c0lumn view for new configuration; field name, "blocking" check box, "include" check box&lt;/li&gt;&lt;li&gt;"Create Report" screen will directly invoke the de-duplication report&lt;/li&gt;&lt;li&gt;Scheduled report generation will be added later&lt;/li&gt;&lt;/ul&gt;5. James will implement a FormPairs object that implements rules specific to de-duplication&lt;br /&gt;&lt;br /&gt;6. We discussed the concept of 'derived traits'. Any additional data that is added to the raw data source is a 'derived trait'. Examples of potential derived traits include &lt;a href="http://en.wikipedia.org/wiki/Soundex"&gt;Soundex&lt;/a&gt; of name, &lt;a href="http://en.wikipedia.org/wiki/NYSIIS"&gt;NYSIIS&lt;/a&gt; of name, and unique record ID ("uid"). Although we'll assume for the time being that the UID is present, we'll soon need to be able to add UID's to data sources that don't contain them natively, but this requires substantial change to the RecMatch work flow. Shaun will draft an overview of the work required to implement the a derived trait framework.&lt;br /&gt;&lt;br /&gt;7. Shaun will generate synthetic patient data using &lt;a href="http://www.cs.utexas.edu/users/ml/riddle/data/dbgen.tar.gz"&gt;DBGen&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-8043539793234496520?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/8043539793234496520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=8043539793234496520' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/8043539793234496520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/8043539793234496520'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/07/notes-from-7808-patient-matching-call.html' title='Notes from 7/8/08 Patient Matching Call'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1256963867207214096.post-1071318284162233254</id><published>2008-06-16T08:02:00.000-07:00</published><updated>2008-07-08T21:32:33.874-07:00</updated><title type='text'>OpenMRS Patient Matching Module</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;Hi! This is the initial post describing the collective efforts toward implementing an open-source patient identity management framework in &lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.openmrs.org/"&gt;OpenMRS&lt;/a&gt;&lt;span style="font-family:trebuchet ms;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:trebuchet ms;font-size:130%;"  &gt;Motivation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;As you may know, health care information is increasingly distributed&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt; across many independent databases and systems, both within and among organizations as separate islands with different patient identifiers. This is the case for data collected about the same patient at different health care institutions, different pharmacy systems, different payers, and so on.  This situation interferes with the aggregation of informatio&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;n about individuals across such databases as needed for many health care use-cases: public health reporting, clinical research, outcomes management, and administrative reporting.  Aggregation is important not only to determine a patients’ health care status, but also for population based studies.&lt;/span&gt;&lt;span style="font-weight: bold;font-family:trebuchet ms;" &gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:trebuchet ms;" &gt;&lt;span style="font-size:130%;"&gt;Why is a patient matching module needed for OpenMRS?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;As mentioned above, health care data is scattered across dispar&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;ate systems, and as OpenMRS implementations grow, they will begin facing multiple instances of the same patient across and within their implementation.  One OpenMRS implementati&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;on is known to contain more than 700,000 patients(!) In a related fashion, it's also the case that duplicate patient registrations will accumulate over time in the same single system. Processes to link entities (e.g., patients) &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;font-family:trebuchet ms;" &gt;across&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt; and &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;font-family:trebuchet ms;" &gt;within&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt; OpenMRS implement&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;ations will become increasingly important.&lt;/span&gt;&lt;span style="font-weight: bold;font-family:trebuchet ms;" &gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;What functionality will patient identity management add to OpenMRS?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;The patient matching module will initially provide two core types &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;of functionality. First, it provides a stand-alone application that implements sophisticated probabilistic matching algorithms for both a) identifying duplicates in a single data source, and b) identifying matches between two generic data sources. The current output from the stand-alone application is a delimited file containing matches with associated match scores: the higher the score, the more li&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;kely the match.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center; font-family: trebuchet ms;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_gvorhPUXDyE/SHOyL2OpenI/AAAAAAAAAAU/kvO6hH2BZIY/s1600-h/Screenshot-Record+Linker.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 330px; height: 243px;" src="http://2.bp.blogspot.com/_gvorhPUXDyE/SHOyL2OpenI/AAAAAAAAAAU/kvO6hH2BZIY/s320/Screenshot-Record+Linker.png" alt="" id="BLOGGER_PHOTO_ID_5220712309657795186" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;&lt;span&gt;Screen-shot of Stand-alone Matching Application&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center; font-family: trebuchet ms;"&gt;&lt;div style="text-align: left;"&gt;The second core function addresses the issue of duplicate patient registrations in an instance of OpenMRS.  Many OpenMRS implementations have 10's if not 100's of thousands of patient records. Over time, duplicate patient records will creep in. The patient matching &lt;a href="http://openmrs.org/wiki/Modules"&gt;module&lt;/a&gt; will identify and provide a list of likely duplicates to OpenMRS administrators. Because patient identifiers vary across countries and culture, we've designed the patient matching module to adapt to widely varying patient identifiers.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_gvorhPUXDyE/SHO-xvrsBbI/AAAAAAAAAAc/GFjCbz4PJmg/s1600-h/CreateMatchingConfig.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_gvorhPUXDyE/SHO-xvrsBbI/AAAAAAAAAAc/GFjCbz4PJmg/s320/CreateMatchingConfig.png" alt="" id="BLOGGER_PHOTO_ID_5220726154875110834" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;Screen-shot of the OpenMRS de-duplication Admin Screen&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:trebuchet ms;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;When will this functionality be available to the OpenMRS community?&lt;/span&gt;&lt;br /&gt;The source code for the patient matching module is currently available in the &lt;a href="http://svn.openmrs.org/openmrs-modules"&gt;OpenMRS subversion repository&lt;/a&gt; by clicking &lt;a href="http://svn.openmrs.org/openmrs-modules/patientmatching"&gt;here&lt;/a&gt; (http://svn.openmrs.org/openmrs-modules/patientmatching). The stand-alone application has expanding functionality. The OpenMRS de-duplication module is currently under active developmen&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;t, with great support provided through the Google Summer of Code 2008 initiative! We anticipate delivering the de-duplication functionality to the community by Summer's end.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Join In the Fun!&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;We welcome those with an interest in this area (you know who you are)!  To become further acquainted and involved, we encourage you to:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;read the OpenMRS developers "&lt;a href="http://openmrs.org/wiki/Developers"&gt;Where to Get Started&lt;/a&gt;" page&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;check out the &lt;a href="http://nyomanribeka.wordpress.com/"&gt;blog&lt;/a&gt; of our excellent GSoC intern, Nyoman Ribea&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;&lt;a href="http://www.merriam-webster.com/dictionary/peruse"&gt;peruse&lt;/a&gt; the &lt;a href="http://svn.openmrs.org/openmrs-modules/patientmatching/"&gt;source code&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;review &lt;a href="http://dev.openmrs.org/search?q=sgrannis&amp;amp;noquickjump=1&amp;amp;ticket=on"&gt;outstanding developer ticket&lt;/a&gt;s&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;find us on the &lt;a href="irc://irc.freenode.org/openmrs"&gt;OpenMRS IRC Channel &lt;/a&gt;(sgrannis, james_regen, nribeka)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;email shaun: s g r a n n i s { a t } r e g e n s t r i e f { d o t } o r g&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:trebuchet ms;"&gt;check back here from time-to-time&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_gvorhPUXDyE/SHQ7L4NlstI/AAAAAAAAAAk/fWzYpbnV01U/s1600-h/Picture+1.png"&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1256963867207214096-1071318284162233254?l=doctorshaun.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://doctorshaun.blogspot.com/feeds/1071318284162233254/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1256963867207214096&amp;postID=1071318284162233254' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/1071318284162233254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1256963867207214096/posts/default/1071318284162233254'/><link rel='alternate' type='text/html' href='http://doctorshaun.blogspot.com/2008/06/openmrs-patient-matching-module.html' title='OpenMRS Patient Matching Module'/><author><name>Shaun</name><uri>http://www.blogger.com/profile/14780659859420416390</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://bp3.blogger.com/_gvorhPUXDyE/SHQ_tl7312I/AAAAAAAAAAw/iNNhUyLReu0/S220/Grannis+Shaun.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_gvorhPUXDyE/SHOyL2OpenI/AAAAAAAAAAU/kvO6hH2BZIY/s72-c/Screenshot-Record+Linker.png' height='72' width='72'/><thr:total>1</thr:total></entry></feed>
