Monday, March 11, 2013

HFileInputFormat for Bulk repair of HBase

Today something interesting happened to our HBase cluster, of around 1600 regions we currently hold some odd 400+ regions got their STARTKEY as '' (Empty Start Keys). We tired to do an Offline Meta Repair, but that kept failing saying "Multiple regions have the same startkey:".

Quick (Dirty) Fix
  1. Move those regions out of the /hbase/
  2. Now run offline meta repair on the existing data (this is optional, as in our case if your .META. table is screwed you might need to run this, else you might need to find and remove the entries out of .META. manually. I did not try this so, I am not going to dwell on this further) 
Now, write a simple MR job to process all those regions on a CF basis (per-CF) to export them as SequenceFiles and import them back again using the regular HBase Import (org.apache.hadoop.hbase.mapreduce.Import).

When writing the MR job, one thing you might need which is not available of the shelf is the HFileInputFormat. I found a scala version of it, which I ported to Java.

Also, I made some changes to the scala version to be usable on HBase 0.94.2 (tested version)

DISCLAIMER: The above solution was tested with HBase 0.94.2 and Hadoop 1.1.1 setup. 

No comments:

Post a Comment