I've started to manage 200 node hadoop clusters recently at work. All these are running on AWS with latest CDH5. We went from separate HDFS + TT model to co-locating TT and DN daemons together.
These machines are all Spot machines backed by a ASG (Auto Scaling Group). If any of them die because of spot prices, they would come back up in a while. So to manage these machines better we attach our own custom generated DNS names to these machines.
Once in a while, the machines that come up doesn't have either a TT or DN daemons running. They would have failed at startup for variety of reasons. The task was to find that missing hosts (generally 1 or 2) of the lot. So I wrote a script that would help us get the missing hosts which don't run one of the process.
Gist - https://gist.github.com/ashwanthkumar/3624a4e69ab26236a746
These machines are all Spot machines backed by a ASG (Auto Scaling Group). If any of them die because of spot prices, they would come back up in a while. So to manage these machines better we attach our own custom generated DNS names to these machines.
Once in a while, the machines that come up doesn't have either a TT or DN daemons running. They would have failed at startup for variety of reasons. The task was to find that missing hosts (generally 1 or 2) of the lot. So I wrote a script that would help us get the missing hosts which don't run one of the process.
Gist - https://gist.github.com/ashwanthkumar/3624a4e69ab26236a746
No comments:
Post a Comment