My Standalone Complexities: labs

Showing posts with label labs. Show all posts

Tuesday, January 26, 2016

Getting started on Go Lang with Slack

It is a Republic Day today here in India, January 26th and a national holiday. During one of our random morning conversations with Manoj. Hashicorp cropped up into the topic and that quickly escalated to "Go Lang".

Go Lang has been on my radar for quite some time. I wanted to get started with the language. It's my personal thing that lots of new tools and products are being built on Go and I'm still stuck with JVM. After all moving to Devops demands some change in how I operate right? 😃

While getting started with a new programming language, I always write a simple tool using it and open source it. I wrote a Live NSE Stock information fetcher while I was learning NodeJS 4 years ago and a uClassify Scala client while I was getting started with Scala 3 years ago.

On the similar lines, introducing Slack webhooks library in Go. It's similar to it's Java counterpart, it helps you post messages to slack using a Incoming Webhook url.

Github - https://github.com/ashwanthkumar/slack-go-webhook
Usage - https://github.com/ashwanthkumar/slack-go-webhook/blob/master/README.md

This is still my first Go lang code so any kind of feedback will be helpful.

Sunday, October 18, 2015

Chrome Tamil TTS Engine powered by SSN Speech Lab

In an attempt at first good impression that went wrong this was the outcome.

I got to know about SSN's Speech Lab yesterday and built a Chrome extension that builds on top of it as a TTS Engine. These guys have a wonderful system built - You should check out 'em out.

Right click on any Tamil text, right click and listen to it in a Male / Female voice.

You can install the plugin from https://chrome.google.com/webstore/detail/lhalpilfkeekaipkffoocpdfponpojob

The code is available on https://github.com/ashwanthkumar/chrome-tts-tamizh

Details for other extensions using the TTS
- Language - ta-IN
- Gender - male and female
- Voice names - Male is Krishna and Female is Radhae

Monday, October 5, 2015

Introducing scalding-dataflow

For the last 3 days, I've been working on trying to understand the Google Cloud Flow pipeline semantics for batch processing. Result was a ScaldingPipelineRunner for DataFlow pipelines.

NOTICE (You've been warned)

It is still in very very early stages.
It doesn't have all translators implemented (as of writing)
~~It isn't tested on a Hadoop Setup yet~~

It runs WordCount though :) Do give it a spin.

Github - https://github.com/ashwanthkumar/scalding-dataflow

It goes well with scalaflow - Scala DSL for building dataflow pipelines.

Special thanks to Cloudera's spark-dataflow project. Couldn't have done without it :)

Friday, September 25, 2015

Winning Amaz-ing Hackathon - Meghadūta

Last weekend was fun at Chennai Amazon office. In association with Venture City, Kindle Team at Chennai organized a hackathon on the theme - "Building Scalable Distributed Systems". The name was theme was good enough to attract me to register for the event :) I went there with Salaikumar and Vijay Kumar to the event under the team name - "Salaikumar".

You can find the problem statements given at the hackathon here.

We won two awards at the event - "Best Voted Award" and "Ultimate Hack Award".

You can find our code at https://github.com/ashwanthkumar/meghaduta.

Few pictures taken during the event

Me doing the presentation of our hack

Salai helping me with the screens

Our prize, a Kindle Paper White each and some certificates :)

Tuesday, June 2, 2015

ClassNotFound inside a Task on Spark >= 1.3.0

Context - Spark 1.3.0, Custom InputFormat and InputSplit.

Problem - At my work, I had a custom InputSplit definition which had another class A object which then I need to pass it to my Key. I then have a Spark job that reads using my custom InputFormat and things were all fine on Spark 1.2.0. When we upgraded to 1.3.0 things started breaking with the following stack trace.

Caused by: java.lang.ClassNotFoundException: x.y.z.A
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at org.apache.spark.util.Utils$.deserialize(Utils.scala:80)
 at org.apache.spark.util.Utils.deserialize(Utils.scala)
 at x.y.z.CustomRecordSplit.readFields(CustomRecordSplit.java:91)

Solution - It took me a while to realize that I've been using Spark's Util object to serialize and deserialize the object (x.y.z.A). The fix was very simple

objA = Utils.deserialize(buffer, Utils.getContextOrSparkClassLoader());

Looks like in the earlier versions the __app__.jar is getting added as part of the Executor and Task classloader but not in the latest versions. When I passed the Context ClassLoader to the deserialization it worked perfectly fine.

Lessons

- Don't use Spark's Util method. Even though the Util is a private[spark] object since I'm accessing it from a Java class the scala package access protection doesn't seem to apply. I never knew that until now.

- Always use Utils.getContextOrSparkClassLoader() when doing Java Deserialization in Spark.

Wednesday, May 20, 2015

Parser Combinator

After fiddling with RegexParser for a while in Scala, I realized how much I missed learning Automata properly at college. I was migrating a 140-line REGEX from MySQL to Scala at my work and learned a lot of new things in the process. It was during one of those times one my mentors - yellowflash, helped me understand some of the forgotten concepts about left factoring, recursive grammars, etc.

In the process he were discussing about how would I go about writing RegexParser if I had to write it by hand myself? The exercise was to help me understand how the Parser Combinator works which would help me write better grammar. We did some scribbling on the paper and decided to implement it in Scala. You can find it on https://github.com/ashwanthkumar/parser-combinator. It is just a start, still a long way to go. Looking forward to it, it should be fun.

Friday, May 15, 2015

[IDEA] Autoscaling in Hadoop

Everybody today uses Hadoop + some more of its ecosystem tools. Let it be Hive / HBase etc. I have been using Hadoop for writing production code from 2012 and for experiments much earlier than that. One thing that I don't see it anywhere is the ability to autoscale the Hadoop cluster elastically. Unlike scaling web servers having all map and reduce tasks full doesn't necessarily translate to CPU / IO metrics spiking on the machines.

Figure - Usage graph observed from a production cluster
hand drawn on white-board

On this front - Only Qubole guys have seem to have done some decent work. You should check out their platform if you haven't. It is really super cool. A lot of inspiration for this post have been from using them.

This is one of my hobby project attempt at building just the Autoscaling feature for Hadoop1 clusters if had been part of say Qubole team back in 2012.

In this blog post I talk about the implementation goals and/or hows building this as part of InMobi's HackDay 2015 (if I get through the selection) or go ahead and build it anyways on that weekend.

For every cluster you would need the following configuration settings

minNodes - Minimum # of TTs you would always want in the cluster.
maxNodes - Maximum # of TTs that your cluster would like to use at any point in time.
checkInterval - Time unit in seconds to check the cluster for compute demand (default - 60)
hustlePeriod - Time unit in seconds to monitor the demand before we go ahead with upscaling / downscaling the cluster. - (default - 600)
upscaleBurstRate - Rate at which you want to upscale based on the demand (default- 100%)
downscaleBurstRate - Rate at which you want to downscale (default - 25%)
mapsPerNode - # of map slots per TT (default - based on the machine type)
reducersPerNode - # of reduce slots per TT (default - based on the machine type)

Assumptions
- All the nodes in the cluster are of same type and imageId - easier during upscaling / downscaling.
- All TTs will have Datanodes also along with it

Broad Goals

- Less / No manual intervention at all - We're talking about hands-free scaling and not one click scaling.

- Should have less / no changes in the framework - If we start making forks of Hadoop1 / Hadoop2 to support certain features for autoscaling then most likely we'll have a version lock which is not a pretty thing 1-2 years down the lane.

- Should be configurable - For users willing to dive deeper for configuring their autoscaling they should have options to do that. Roughly translates to being all blue configurations having sensible defaults.

Larger vision is to see if we can make the entire thing modular enough to support any type of scaling.

Please do share your thoughts if you have any on the subject.

Monday, February 2, 2015

GoCD - Slack Build Notifier

In my last post I wrote about a GoCD plugin that I've been working on. I finally got to complete it this weekend. Check it out https://github.com/ashwanthkumar/gocd-slack-build-notifier. This is how the final result looks like

There are two features in the plugin that I'm really happy about (apart from pushing messages to slack)

Pipeline rules set. It is heavily inspired from the current email notification framework available as part of GoCD. Check out the "Pipeline Rules" section in README.
Notifier is pluggable. Slack Notifier is provided out of the box. With very little change, one can write any type of notifications transport using the existing framework.

Overall it was a time well spent that helped me to write (which I guess is) first notification plugin in the GoCD community.

Saturday, January 24, 2015

Slack Java Webhook

GoCD recently added support for notification extension point. I've started building slack notification plugin (its a WIP here). As part of that I wrote a Java client for Slack Webhooks. Although I found a java library here, which said it was published but I couldn't find it anywhere on sonatype / maven central. I can't even publish it, so I took that as an inspiration and wrote my own implementation on https://github.com/ashwanthkumar/slack-java-webhook.

Usage

new Slack(webhookUrl)
    .icon(":smiling_imp:") // Ref - http://www.emoji-cheat-sheet.com/
    .sendToUser("slackbot")
    .displayName("slack-java-client")
    .push(new SlackMessage("Text from my ").bold("Slack-Java-Client"));

It gets posted in the slack channel like below

Dependencies

For Maven,

<dependency>
  <groupId>in.ashwanthkumar</groupId>
  <artifactId>slack-java-webhook</artifactId>
  <version>0.0.3</version>
</dependency>

For SBT,

libraryDependencies += "in.ashwanthkumar" % "slack-java-webhook" % "0.0.3"

Java Utils Library

After a long time I seem to be writing Java code more often recently. A bunch GoCD plugins, code kata sessions with friends and things like that. I saw there are few things like List transformations, filter, I automatically start searching for Option / Some and None implementations. Simple solutions would be just write it in Scala, right? I know, but there are places where I wasn't. Example was GoCD plugins. Reasons being - Scala standard library is heavy and usually causes OOM on Agent without increasing heap sizes and final jar is also heavy in terms of size.

Check out https://github.com/ashwanthkumar/my-java-utils. If you find some implementations not so efficient or can be done better, please do let me know.

Features

List

Lists#map
Lists#filter
Lists#foldL
Lists#find
Lists#isEmpty
Lists#nonEmpty
Lists#mkString

Set

Sets#copy
Sets#isEmpty
Sets#nonEmpty

Iterable

Iterables#exists
Iterables#forall

Lang

Option / Some / None
Tuple2 / Tuple3
Function
Predicate

Dependencies

For Maven,

<dependency>
  <groupId>in.ashwanthkumar</groupId>
  <artifactId>my-java-utils</artifactId>
  <version>0.0.2</version>
</dependency>

For SBT,

libraryDependencies += "in.ashwanthkumar" % "my-java-utils" % "0.0.2"

Monday, August 13, 2012

Physique - Marrying Physics and Social Computing

Consider a world, where in "living" beings aren't the only "living" entities anymore. Things considered impossible / immovable started freely communicating with each other. I guess right now "The Matrix" should have crossed your mind if you are by any chance a tech bluff like me.

That is exactly what I always wanted to see / build in my real life, not just in movies. Now at Yahoo! Open Hack 2012 (5th Edition at India), these guys provided a problem statement - under "Digital Communication"

Your challenge is to build a product or feature that solves a problem in the digital communications space. Think about how you can add to the existing communications landscape. Your solution may be mass market, or specific to a particular segment of the marketplace, like development teams for example. Where appropriate, use Yahoo! Mail or Messenger APIs to add value to your hack.

Source - http://openhackindia2012.hacker.yahoo.net/#discussions/49989

And reading, a lot of articles (including this on Techcrunch published < 24 hours ago, during the hackathon), news, and personally watching lot of changes and trends in the market myself I am completely convinced that I there is lot of scope in "Digital Communication" space.

I checkout my Facebook and Twitter Feeds often, but when you are part of a gathering of this size, 1 in every 3 - 5 machines has either Twitter / Facebook open ALWAYS. I realized that Social Networking has just become so much part of our lives.

Thinking all these into my mind, allow me to introduce - "Physique". Something we were able to develop in ~30 hours, hoping to change the way "Machines" communicate and collaborate in the future, for Humans.

Physique is all about everyday things sharing their current state information with you in real time. Imagine if all the things you own had a digital identity (say, like a Facebook Profile) and shared their information continuously with you like your social media friends do. That is what Physique aims to be!

We call all physical objects as "Thngs" (yes intentionally) and provide a API to access them. Basically, you build apps on top of our platform for the objects to connect with each other and enable collaboration among them.

Some high-level example Use-Cases (something which will be reality in < 50 years from now)

Your fridge can order Milk cartons for you, when you are running out of milk.
Your Television will record certain segments of the telecast, when you get a call and you ought to attend it.
Wine glasses in a party can help bring like-minded people together, same wine glasses can also pre-order condoms for you if there need be.

... Etc. to name a few

Well, I think you get the picture by now, regarding what I said by "communication + collaboration" of machines. In order for all these things to happen, there has to be some established standards and protocols, for the devices to communicate and make things happen.

We have a basic version of the hack up and running at http://labs.ashwanthkumar.in/physique/ (we are still working on the API Documentation + SDKs for developers).

We will come back to you with more updates and features in the up coming weeks. In the meantime if you have any ideas, or if you are working on something similar we would love to connect with you :-)

Sunday, July 22, 2012

Tweetoem - Discovering Art in Tweets

Title is slightly mis-leading, but I cannot think of a better way to put it. Let me tell you the story behind building of "Tweetoem" (Tweets + Poem).

Just in case you stumbled on this post first, Tweetoem is live on http://labs.ashwanthkumar.in/tweetoem/

Tweetoem is highly inspired from 140verses (http://140verses.com/). One of my friends shared this link (he is one of the developer of 140verses) on his facebook timeline. It was love at first sight for me, when my reverse-engineering brain cells got activated. I looked into the tweet generating poems, I loved the idea, I loved the execution of work. It has been quite some time until I did any hacks (none after I started working, life of intern seems a bliss now). So to put my spirits high and to brush up on my PHP skills I wondered if I can put something like that in < 6 hours of work? And "Tweetoem" was the result of that work.

Algorithm (working behind the scenes)

Get the tweets from Twitter
Strip user_mentions, hashtags, and links of the tweets
Get the last word and reverse it
Calculate the Metaphone value of that word
Store the Tweet + Metaphone value

I am not sure if this is exactly what is being used (from my observation, it seems to be something more sophisticated), but this seems to do the trick for me. I got a bootstrap template, wrote a couple of controller methods (limonade kicks ass here), and that's it. What you see is the outcome of that work in ~6 hours of hacking.

Known Issues

System is relatively new, so you might not get poems at many chances
Sometimes the lines might be repeated in the poem
Occasionally search breaks for no reason, little refresh or a new keyword should do the trick
No Share features like in 140verses (intentionally not implemented)

Disclaimer - Tweetom idea was inspired from 140verses and the original developers are to be highly appreciated for the innovative thinking + work. Tweetoem was just a self ego satisfying bad hack to replicate the same in less than a day. It is by no means tries to compete with 140verses / their scope.

Friday, June 29, 2012

[INFO] Apache Wave First Look

It is been so long, so I thought I will give Apache Wave (http://incubator.apache.org/wave/) a try.

Steps to get up and running
$ git clone git://git.apache.org/wave.git wave
$ cd wave
$ ant compile-gwt dist-server
$ ant -f server-config.xml
$ ./run-server.sh

PS -- Watch out first ant command will take immense CPU, and some time.

One of the best things is most of the Google Wave plugins still work :-) #win

Saturday, February 25, 2012

Hadoop AutoKill Hack

February's Hack! As you might have already known that I am working on Big Data, and by De facto we all use Hadoop eco-system to get things done. On the same page, I was just looking into some Hadoop Java API the other day to see how well I can get to see somethings happening under the hood.

More specifically I was trying to use JobClient class to see if I can build some custom client or an interface to the Hadoop Jobs we run on our cluster. During which I thought, can I add custom Job Timeout feature to Hadoop.

Problem Statement: I want to kill any job that runs beyond T time units in Hadoop and how do I do it?

So I started writing custom client which can interact with the JobTracker to get the list of running jobs, and how long they have been running. If they exceed my given threshold time limit I would want to kill them. That is the overall concept, and I guess what I built it. API is so simple and straight forward, all it took was less than an hour to look into the jobdetails.jsp and see how to access the Jobs from the JobTracker and display the start time.

However the tricky thing was how to run the damn thing. I always got the "IOException: Broken Pipe" error. Then finally got the way we need to access it, was through running it as

$ hadoop jar JARName.jar

So, yeah I wrote a small hack for this. You can find it on my Git (https://github.com/ashwanthkumar/hadoop-autokill).

Thursday, February 9, 2012

Introducing Scraphp - Web Crawler in PHP

Scraphp (say Scraph, last p is silent) is a web crawling program. It is basically built to be a standalone executable which can crawl websites and store extract useful content out of it. I created this script for a challenge posted by Indix on Jan 2012, where in I was asked to crawl AGMarket (http://agmarknet.nic.in/) to get the prices of all the products, and store their prices. I also had to version the prices such that it should persist across dates.

Scraph was inspired from a similar project called Scrappy, written in Python. This is not an attempt to port it, but just wanted to see how much similar properties can I build from it in less than a day.

One of the major features I would like to call it is, When you crawl the page you can extract entites out of it based on XPath. So basically when we crawl a page I create a bean whose properties are set of values got by applying the given XPath on the page. Each XPath is completely independent of the other. Currently Scraph supports creating only 1 type of object per page.

Hack into the source code, its well commented and easy to modify as per requirement. All the details of the crawling page, XPath queries are all provided in the configuration.php or you can supply your own config file, see the Usage.

Code is available on my Git Repo - https://github.com/ashwanthkumar/scraphp

I have tired my best to document the entire code well, and if you feel like any improvements can be made or you have got any suggestions? Please do not hesitate to fork and send me a pull request.

Sunday, January 1, 2012

Live NSE Stock Rates API - NodeJS

Ok, I'm 10 min away from close of New Year day here in India, I wanted to do some hack today and so here it is. I was learning Node.JS for a couple of days now, and hosting it on Heroku for testing purposes, and here goes one for the marking.

A simple Hack to provide Live NSE Stock Rates in a simple to use API.

API URI - http://live-nse.herokuapp.com/?symbol=INFY

Replace INFY with any valid NSE symbol from NSE. It returns a JSON containing all the required data. So next time you want to create any virtual stock market games or apps that need live Stock rates, you can well use this.

Update - Application Source code is available on GitHub.

Happy Hacks! :-)

PS: This is a demo app that I created while learning Node.JS. It runs on a free account, so operates only on one web worker.

Thursday, December 8, 2011

ARO - Doc Comment Parsing in PHP

Following the work on IoC (Inversion of Control), here I am again to present the second module of my project, ARO (At the Rate Of). ARO is a PHP Doc Comment parsing library useful for parsing the Annotations, descriptions, etc. from the Classes, Methods or Properties.

ARO uses PHP Reflection to get its job done. Well this library is again flexible and hence I thought I can release it as open source. Its hosted on GitHub (https://github.com/ashwanthkumar/aro-php). Feel free to fork or report an issue.

I tired my best to document it on the GitHub, so let me skip further details here.

Disclaimer: Reason I created this ARO is for using IoC effectively in the first place. Also there are many more advanced Doc Comment parsers in PHP. This module is just what I came up in around 5 hours.

Wednesday, December 7, 2011

BlueIgnis - Starting Finally

Finally after almost 8 months of planning, modelling, and designing process - BlueIgnis is finally taking place. Good part is that my mentor has asked me to do Oracle ThinkQuest as the final year project. I was hunting for a good topic to build. The category I am eligible is "Application Development", and when I was wondering I decided its time I spend some time for BlueIgnis (aka Social Heat).

Taking the wonderful experience from building a custom framework using simple design patterns (more on this in a later post) for Webnaplo, and from the previous TGMC project (Back To My Village) - Reformists; Past 2 days went good building the BlueIgnis architecture with more new features which I only used to dream before.

I am all excited to work on this. This time with more features packed into its design. Since I am planning to release this as a standalone app and not as SAAS (I wish I can do it), it runs on your infrastructure.

For people who does not know what BlueIgnis is, you can refer my earlier blog post which gives a gentle introduction about its functionality.

Keep watching this space for more information and updates regarding BlueIgnis.

Wednesday, October 26, 2011

MySPARQL - Multimedia Database for Semantic Web

Following is just a specification for one of my hacks, I wrote a long back and thought I can share it the world. If anyone interested in this, or already know something like this please feel free to comment about it, I would love to look at it.

PS: Document is very informal and basically a note of what came to my mind. Any help improvising it, is also much appreciated. Thanks!

Tuesday, October 25, 2011

From SQL to NoSQL - SubhDB

It is so nice to be back home after months. Especially so that I can get my hands on my computer, God it feels so great. Been playing around for past 2 days with Document Oriented datastores like MongoDB, RavenDB, CouchDB, etc. I liked them all. Most of them required me to install them as an application on my server to actually do something really useful. Since I only own a shared hosting and not a VPS or a Cloud, it was impossible for me do it.

I wanted to play around with it on my existing LAMP stack that I was given. So, I created one Document-Oriented datastore for myself.

Please say hello to SubhDB. Abstraction of document-oriented datastore over traditional MySQL implemented with PHP.

Inspired by the diagram on the home page of RavenDB (http://www.ravendb.net/Uploads/WindowsLiveWriter/RavenDB_C707/image_thumb_1.png), I designed the datamodel for the datastore. ~~Still it can't store array attributes yet.~~

You can find the source code and instructions to give it a test drive at GitHub. Please post what you feel regarding this. I searched for any existing implementations and was able to find none.

PS:

One day hack, which allowed me to spend some useful time in my home during the holidays.
Well this is not even close to being complete or stable.
No comments regarding the name of the project please.