Friday, February 12, 2016

How not to use awk

Today I was working on automating AMI creation for a project and ended up using packer. So packer has this thing called -machine-readable output which can be easily parsed as CSV (at the very least). I ended up writing a bash script to parse the output.
I started off with using while loop + awk for parsing (in that order) and emitting the lines that contains the artifact information. As you can see above for parsing around 1700+ lines it took > 6 seconds.

#!/bin/bash
PACKER_LOG_FILE=$1
while read LINE; do
BUILDER_NAME=$(echo $LINE | awk -F, '{print $2}')
if [ "X$BUILDER_NAME" != "X" ]; then
echo "found $BUILDER_NAME on $LINE"
fi
done < $PACKER_LOG_FILE
view raw 1.sh hosted with ❤ by GitHub
brindavan:~ ashwanthkumar$ time ./parse-packer packer.log
found hvm-builder on 1455261963,hvm-builder,artifact-count,1
found hvm-builder on 1455261964,hvm-builder,artifact,0,builder-id,mitchellh.amazonebs
found hvm-builder on 1455261964,hvm-builder,artifact,0,id,us-east-1:ami-abcd
found hvm-builder on 1455261964,hvm-builder,artifact,0,string,AMIs were created:nnus-east-1: ami-abcd
found hvm-builder on 1455261964,hvm-builder,artifact,0,files-count,0
found hvm-builder on 1455261964,hvm-builder,artifact,0,end
found pvm-builder on 1455261964,pvm-builder,artifact-count,1
found pvm-builder on 1455261964,pvm-builder,artifact,0,builder-id,mitchellh.amazonebs
found pvm-builder on 1455261964,pvm-builder,artifact,0,id,us-east-1:ami-efgh
found pvm-builder on 1455261964,pvm-builder,artifact,0,string,AMIs were created:nnus-east-1: ami-efgh
found pvm-builder on 1455261964,pvm-builder,artifact,0,files-count,0
found pvm-builder on 1455261964,pvm-builder,artifact,0,end
real 0m6.265s
user 0m2.829s
sys 0m4.118s


Afterwards I refactored the code to use awk first and feed that output to while loop, which actually ran 100x faster.

#!/bin/bash
PACKER_LOG_FILE=$1
awk -F, '{if($2 != "") { print $0}}' $PACKER_LOG_FILE | while read LINE; do
BUILDER_NAME=$(echo $LINE | awk -F, '{print $2}')
if [ "X$BUILDER_NAME" != "X" ]; then
echo "found $BUILDER_NAME on $LINE"
fi
done
view raw 2.sh hosted with ❤ by GitHub
brindavan:~ ashwanthkumar$ time ./parse-packer packer.log
found hvm-builder on 1455261963,hvm-builder,artifact-count,1
found hvm-builder on 1455261964,hvm-builder,artifact,0,builder-id,mitchellh.amazonebs
found hvm-builder on 1455261964,hvm-builder,artifact,0,id,us-east-1:ami-abcd
found hvm-builder on 1455261964,hvm-builder,artifact,0,string,AMIs were created:nnus-east-1: ami-abcd
found hvm-builder on 1455261964,hvm-builder,artifact,0,files-count,0
found hvm-builder on 1455261964,hvm-builder,artifact,0,end
found pvm-builder on 1455261964,pvm-builder,artifact-count,1
found pvm-builder on 1455261964,pvm-builder,artifact,0,builder-id,mitchellh.amazonebs
found pvm-builder on 1455261964,pvm-builder,artifact,0,id,us-east-1:ami-efgh
found pvm-builder on 1455261964,pvm-builder,artifact,0,string,AMIs were created:nnus-east-1: ami-efgh
found pvm-builder on 1455261964,pvm-builder,artifact,0,files-count,0
found pvm-builder on 1455261964,pvm-builder,artifact,0,end
real 0m0.063s
user 0m0.036s
sys 0m0.032s

I didn't knew AWK was so good at processing things at scale (if I may). 

No comments:

Post a Comment