//AWK script for data extraction from large CSV files

AWK script for data extraction from large CSV files

A script was developed which can be used to extract data from large CSV files. The script was developed primarily for use in the situation of processing large CSV files containing data on patients. The script was needed in this particular environment because:

  • The files are too large to open in standard spreadsheet applications (many millions of rows)
  • The task is repetitive such that we want to automate

The script shown below is a small sample script to show how it was used.
The script is run under the linux OS using the AWK application


awk -F, '$1=="123" { print $1,$2,$10,$45,$46 }' LargeFile.csv > Result.csv

The script above will search a given csv file ("-F,"). If the id of the row matches the specified id (" $1=="123" ") then the specified columns of the row ("{ print $1,$2,$10,$45,$46 }") is output to the Result.csv file.