Saturday, September 12, 2015

Find that missing Host in Hadoop Cluster

I've started to manage 200 node hadoop clusters recently at work. All these are running on AWS with latest CDH5. We went from separate HDFS + TT model to co-locating TT and DN daemons together.

These machines are all Spot machines backed by a ASG (Auto Scaling Group). If any of them die because of spot prices, they would come back up in a while. So to manage these machines better we attach our own custom generated DNS names to these machines.

Once in a while, the machines that come up doesn't have either a TT or DN daemons running. They would have failed at startup for variety of reasons. The task was to find that missing hosts (generally 1 or 2) of the lot. So I wrote a script that would help us get the missing hosts which don't run one of the process.

Gist -

No comments:

Post a Comment