Saturday, February 25, 2012

Hadoop AutoKill Hack

February's Hack! As you might have already known that I am working on Big Data, and by De facto we all use Hadoop eco-system to get things done. On the same page, I was just looking into some Hadoop Java API the other day to see how well I can get to see somethings happening under the hood.

More specifically I was trying to use JobClient class to see if I can build some custom client or an interface to the Hadoop Jobs we run on our cluster. During which I thought, can I add custom Job Timeout feature to Hadoop.

Problem Statement: I want to kill any job that runs beyond T time units in Hadoop and how do I do it? 

So I started writing custom client which can interact with the JobTracker to get the list of running jobs, and how long they have been running. If they exceed my given threshold time limit I would want to kill them. That is the overall concept, and I guess what I built it. API is so simple and straight forward, all it took was less than an hour to look into the jobdetails.jsp and see how to access the Jobs from the JobTracker and display the start time.

However the tricky thing was how to run the damn thing. I always got the "IOException: Broken Pipe" error. Then finally got the way we need to access it, was through running it as

$ hadoop jar JARName.jar

So, yeah I wrote a small hack for this. You can find it on my Git (https://github.com/ashwanthkumar/hadoop-autokill).

1 comment:

Magnific IT said...

This site has been very useful to us to stay up to date with our training. We would definitely recommend you as one of the

top informative news blogs for big data! Keep up the great work & Thanks! -
Hadoop training

Post a Comment