Friday, June 13, 2014

User ID Normalization from Big Data book using Scalding

Thanks to my mentor, I have been following Big Data book closely in MEAP. On Chapter 6, he talks about the User Identifier Normalization problem, which is an iterative graph algorithm. He provides a reference implementation using Cascalog.

Since I have been using Scalding close to a year now. I re-wrote the same in scalding. This is my first attempt in writing an iterative algorithm using scalding.

Any kind of feedback is highly appreciated.