Python script to find the intersection of files using multiple parameters

This script can be used to create the intersection of two files. The intersection of file A and file B is such that if the record is contained in both files, then it is contained in the output.

In this particular case we use a set of parameters to define the intersection of the two files. The set of defining parameters are an id, an encounter id, and a date. If all three parameters are contained in file A and file B, then we output to file C.

Below is a sample of the python code used in this script.

for row in readImg:

	outputRowData=[]
	outputRowData=outputRowData+row

	id = row[0]
	encounterid = row[1]
	photoDate = row[3]
	
	photoDateObj=datetime.datetime.strptime(photoDate, '%d/%m/%Y')

	infile.seek(0) # reset to start of file

	for ref in readOpt:

		ref_id = ref[0]
		ref_encounterid = ref[7]
		ref_photoDate = ref[9]	
		
		if not (ref_photoDate=='' or 
                        ref_photoDate=="" or 
                        ref_photoDate==None):			

			ref_photoDateObj=datetime.datetime.strptime(
				ref_photoDate, '%d/%m/%Y')

			if id==ref_id and 
			   encounterid==ref_encounterid and 
			   photoDateObj==ref_photoDateObj:
				outputRowData=outputRowData+ref

The script was used to filter large patient data files. The first file contained information for a large set of images taken of retina from an eye screening program. The second files contained diseases grading and clinical measurements associated from various time points, which were then related to the eye screening event. This data was used for a deep learning system for classifications of disease and analysis into risk factors for disease.