Friday, February 12, 2016

How not to use awk

Today I was working on automating AMI creation for a project and ended up using packer. So packer has this thing called -machine-readable output which can be easily parsed as CSV (at the very least). I ended up writing a bash script to parse the output.
I started off with using while loop + awk for parsing (in that order) and emitting the lines that contains the artifact information. As you can see above for parsing around 1700+ lines it took > 6 seconds.

Afterwards I refactored the code to use awk first and feed that output to while loop, which actually ran 100x faster.

I didn't knew AWK was so good at processing things at scale (if I may). 

Saturday, February 6, 2016

Introducing Matsya

I recently wrote a blog post on Indix's Engineering blog about Matsya. Do check it out at I'm going to summarise the blog post here in TLDR; version.

We have 100% of our Hadoop workloads running on AWS Spot infrastructure. One of the prominent issues we've have faced is "Spot Outages". In order to solve that, we went cross AZs. This in return resulted in huge data transfer costs especially for systems like Hadoop HDFS. We realised the only way to survive this madness is to be on a single AZ but intelligently move to another AZ when there's a Spot spike in the current one. That worked :) There were cases when we couldn't find any AZ within your bid budget. Typically our bid budget was 100% of OD price. During such times, it can fallback to OD instances until the surge ceases.

I wrote a tool to automate the above process, and that's Matsya. Matsya is the name of first incarnation of Lord Vishnu in this world. The main part of the story is he takes a form of a Fish (with a horn) and carries the Seven Sages (saptarishis) through safety on the Judgement day. The name was so apt since Matsya is able to carry the cluster from one Spot Market to another (read cheapest) during a Spot surge.

Bonus - Slides to my presentation I gave at Chennai Devops meetup on January 2016 can be found at

Thursday, February 4, 2016

Building GoLang on SnapCI

I've been a long time user of SnapCI. I recently started playing around with GoLang. I started writing tests and releases on some of the hobby projects and it's high time I start setting up a CI for it. Unfortunately SnapCI doesn't support GoLang out of the box and making it work is a bit of a hassle. After fighting with it for more than a hour, I figured out how to compile my go project.

Disclaimer - My project on my local machine is part of my custom $GOPATH so all the `go` tool chain works. Makefile also assumes this. If you're used to building your project on a non-standard directory structure, this is not for you.

Thanks to this post talks about how to setup GoLang on SnapCI. It doesn't work anymore out of the box, but does have the fundamental things to get started. Final settings that worked for me

Hope this helps.

Apart from the above, we also need to setup the GOPATH, in order for go to work. On the stage configuration page there is a section called Environment Variables for this stage in there we need to set the name to GOPATH, and the value to /var/snap-ci/.