Monday, October 5, 2015

Introducing scalding-dataflow

For the last 3 days, I've been working on trying to understand the Google Cloud Flow pipeline semantics for batch processing. Result was a ScaldingPipelineRunner for DataFlow pipelines.

NOTICE (You've been warned)
  1. It is still in very very early stages. 
  2. It doesn't have all translators implemented (as of writing)
  3. It isn't tested on a Hadoop Setup yet
It runs WordCount though :) Do give it a spin. 

It goes well with scalaflow - Scala DSL for building dataflow pipelines. 

Special thanks to Cloudera's spark-dataflow project. Couldn't have done without it :)

No comments:

Post a Comment