2) Implemented: Create a flat file output containing (at least) all blocking and include variables, if not additional variables
3) Implemented: Because different blocking runs may have different include variables, we need to be able to include the union of all fields used across all blocking runs
4) Testing: We need to test the de-dupe module with different data sets. We need to test both for A) performance (how long does it take to match larger numbers of patients), and B) bugs -- what issues will we run into with different data?
Results to date: An out of memory error occurs with an OpenMRS patient table containing 10,000 patients (likely a Tomcat memory error)
- Had increased Tomcat memory allocation to 1024M, no change
- We may need to use a software profiler to see what processes are using memory
- Question: are match results objects (MR's) stored in memory? MR's should not be stored in memory -- to minimize memory use, each MR should be handled and written to a file, etc.
No comments:
Post a Comment