For the last 3 days, I've been working on trying to understand the Google Cloud Flow pipeline semantics for batch processing. Result was a ScaldingPipelineRunner for DataFlow pipelines.
NOTICE (You've been warned)
NOTICE (You've been warned)
- It is still in very very early stages.
- It doesn't have all translators implemented (as of writing)
It isn't tested on a Hadoop Setup yet
It runs WordCount though :) Do give it a spin.
It goes well with scalaflow - Scala DSL for building dataflow pipelines.
Special thanks to Cloudera's spark-dataflow project. Couldn't have done without it :)
No comments:
Post a Comment