Python script to find the intersection of files using multiple parameters
This script can be used to create the intersection of two files. The intersection of file A and file B is such that if the record is contained in both files, then it is contained in the output.
In this particular case we use a set of parameters to define the intersection of the two files. The set of defining parameters are an id, an encounter id, and a date. If all three parameters are contained in file A and file B, then we output to file C.
Below is a sample of the python code used in this script.
for row in readImg:
outputRowData=[]
outputRowData=outputRowData+row
id = row[0]
encounterid = row[1]
photoDate = row[3]
photoDateObj=datetime.datetime.strptime(photoDate, '%d/%m/%Y')
infile.seek(0) # reset to start of file
for ref in readOpt:
ref_id = ref[0]
ref_encounterid = ref[7]
ref_photoDate = ref[9]
if not (ref_photoDate=='' or
ref_photoDate=="" or
ref_photoDate==None):
ref_photoDateObj=datetime.datetime.strptime(
ref_photoDate, '%d/%m/%Y')
if id==ref_id and
encounterid==ref_encounterid and
photoDateObj==ref_photoDateObj:
outputRowData=outputRowData+ref
The script was used to filter large patient data files. The first file contained information for a large set of images taken of retina from an eye screening program. The second files contained diseases grading and clinical measurements associated from various time points, which were then related to the eye screening event. This data was used for a deep learning system for classifications of disease and analysis into risk factors for disease.