Saturday, December 25, 2010

Research Engine Update #11

Wow, today was one hell of a day if i should say. Early in the day, I was breaking my head with many concepts, technologies, etc. that I was actually running mad. But, now at the end its slightly better.

Let me brief you what I did today.

  1. Installed DB2, duh.. I never wanted to do this.
  2. Created a SPARQL endpoint for knowledge engine. No more dependence on internet.
  3. Updating (its on the way) my database with all my 28+ GB, of linked data.
I'm so damn tired to type, I'll brief the rest tomorrow.

UPDATE: After 12 hours 40 minutes, 218.8 MB of data has been successfully uploaded to DB2, at this rate..?!@# OMG! When will my 28GB get uploaded?!

- Ashwanth Kumar and Salaikumar @ Saravanan

Wednesday, December 22, 2010

Research Engine Update #10

Update 9.20 AM:
Wow! We're making some serious progress here. Web search engine on Tapestry works, and its live and running.

Search Index is limited to 5% of Google Directory URLs.


Update 10.10 AM:
Added Cached support to Web search.

Update 10.40 AM:
Added OpenSearch RSS Export.

Update 11.52 AM:
Added General WebHistory management with Cassandra Distributed database.

Research Engine #9

If you had read my SRS, Knowledge Engine uses a tool called wikixtractor. Its my project for extracting various contents of Wikipedia into RDF (N-Triples) format.

Atlast I committed the latest version in to the repo. Its a fork project of DBpedia Extractor.

- Ashwanth Kumar

Monday, December 20, 2010

Research Engine #8

Well, the update is pretty much late and simple. I've deployed Cassandra in our cluster. With a simple tweaking the config file, it was a breeze to set it up. The requirements are simple (Java 1.6, thats it) and easy.

Next on the update for tomorrow is using a JPA compatible library for Cassandra. A beautiful project that helps you get your job done, so quickly and easily.

Project Website :
Documentation (excellent for understanding Cassandra Datamodel and Kundera usage) -

PS: This is an update post.

- Ashwanth Kumar

Tuesday, December 14, 2010

Social Networking Platform (IndiKonn)

Project Name - IndiKonn
Project Scenario - Social Networking Platform
Team Members - Lakshmi Narayanan B, Divya K, Jayalakshmi S, and Prasanna Kumaor

You can also download it from here -

Sunday, December 12, 2010

Knowledge Engine (Developer Preview)

Hello guys,

Atlast a working prototype for Knowledge Engine module. After a lot of trouble with the implementation technique that is actually feasible, here is the one atlast. You can find the Knowledge Engine residing here. The implementation was done in less than 2 hours, so no templates were designed. Its a basic HTML page powered by PHP.

Usage Mode for KBEngine
born 1974-06-22
- To get all the people born on a specified date

dead 1974-06-22
- To get all the people dead on a specified date

starring rajinikanth aishwarya
- To identify the films by actors

birthplace kumbakonam
- To get all the people's birth place

deathplace kumbakonam
- To get all the people's death place

list Company (watch the capitalization)
- To get the list of all companies available in the system. This can be substituted with any of the following:
  • Company
  • People
  • Actors
  • Airport
  • Country
  • TelevisionShow
  • Artwork
  • FootballEvent
  • Publisher
  • Animal
  • Subject
  • EducationalInstitution

Please provide your valuable comments for improvement.

PS: This is the Research Engine Update #7

- Ashwanth Kumar

Saturday, December 11, 2010

User Trend Graph - RFC

In my websearch module of the important feature when a user logs in, is the ability to filter results based on their likes and activities. In this post, I'm drafting out the methodology called "User Trend Graph", built using the data from the Facebook Graph API (OpenGraph Protocal).

About User Trend Graph
User Trend Graph (now on called UGraph), is based on time, number and type of activities, the user likes on FB. The following graph depicts the UGraph:

UGraph Sample 3D Graph

Here a sample user's likes is being analyzed, thus his Product/Service category likes are more than Application likes. The data used here is a random. The graph line can be both increasing or decreasing, since the users can Like and Unlike pages on FB, as depicted above.

Over the period of time, user's trend can be calculated, projected and used in any social machine learning algorithms (is there any such?). This can act as a representational medium for the algorithm to act upon.

The time is calculated from the FB's JSON response (see here for a sample), as "created_time", which denotes the date and time when the user made a connection with that node in the graph (OpenGraph of FB).

UGraph, depicts the user's activities on the web at a social networking platform. If the same could be extended to a broader perspective on all web activities. Such as Google (or probably they're using a similar thing, already? No idea!), twitter, FB, etc. can all benefit in understanding their User's context and provide a better service to them.

Since many Web 2.0 services are going social the most profound method to analyze the user's interest and provide a higher degree of user relevance in the context of search engines, user suggestions, etc.

Using UGraph in iBlue
As i said before, we're implementing the concept of UGraph in our iBlue, as a proof of concept application.

Once the user's login with their FB Account, we cache their Likes JSON in our database (as we're hard pressed on resources to access them dynamically) until they login again, during which it is updated. Since the response is in reverse sorted order, one can use binary search technique to identify the previous latest node, delete them if not present (user would've disliked it), insert the remaining ones, updated "created_time" for the existing if they're changed meanwhile.

We use the "category" property as the series for each Like of the user. Then dynamically compute the graph, identify the current taste and trend of the user, and sort the results accordingly for the user to view.

Please provide your feedback on this implementation. If such a method exist please help me improve it, if not lets start using it.

Update: This is also Research Engine Update #6.

- Ashwanth Kumar

Friday, December 10, 2010

Research Engine Update #5

# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0xb701077a, pid=10042, tid=3018517360
# JRE version: 6.0_22-b04
# Java VM: Java HotSpot(TM) Client VM (17.1-b03 mixed mode, sharing linux-x86 )
# Problematic frame:
# V []
# If you would like to submit a bug report, please visit:

--------------- T H R E A D ---------------

Current thread (0x0b878c00): JavaThread "FetcherThread" daemon [_thread_in_vm, id=8797, stack(0xb3e5e000,0xb3eaf000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x00000004

EAX=0x92e57044, EBX=0x00000000, ECX=0x0b879474, EDX=0x0aae7750
ESP=0xb3eac958, EBP=0xb3eacafc, ESI=0x00000000, EDI=0x0b879470
EIP=0xb701077a, CR2=0x00000004, EFLAGS=0x00010286

Top of Stack: (sp=0xb3eac958)
0xb3eac958: 0b878c00 0b878c00 b732ef90 b3eacae0
0xb3eac968: b3eaca88 b70cbdb2 0b878d18 51071390
0xb3eac978: 00000000 00000000 51070d84 51070dd4
0xb3eac988: 00000044 b3eac9d0 00000002 0b87946c
0xb3eac998: b3eac9c8 0b879468 b7319c08 0b879470
0xb3eac9a8: 0000005b 00000000 0000005b 00000000
0xb3eac9b8: 00000000 b726fc06 0b87946c 00000023
0xb3eac9c8: b3eaca08 b4e02388 00000000 510710d8

Instructions: (pc=0xb701077a)
0xb701076a: 00 00 8b 7d 08 8b 75 0c 8b 07 83 c0 24 8b 1c b0
0xb701077a: 8b 43 04 8d 48 08 8b 40 08 51 ff 90 8c 00 00 00

Full stack trace here: Full Stack trace

I've never such JVM runtime errors! Well the fact is that the nutch crawler didn't complete properly. I had to start crawling all over again; and i did so.

Man, this is seriously getting somewhere, i just can't get where?!

Tuesday, December 7, 2010

Research Engine Update #4

OMG! Today is the last date for SRS Submission! I'm yet to start mine!

Fine, let me just brief whats the status about is happening in for iBlue.

  1. Got (46.85% as of writing this post) downloading the entire Wikipedia in all languages (but We'll be using only English(en) and German (de) versions) in N-Triples (Around 16GB)! - Thanks a lot to DBPedia, for its public available data set. Its this dataset that powers the Knowledge Engine component as of now, until we think of any other better implementation.
  2. Completed the Code Search (like Google Code Search) in CLI. - We're harvesting the Apache, Google Code, public SVN repositories (using SVNKit), index them, and provide a search layer on top of it. - Lucene is used here extensively for both indexing and searching.
  3. Completed the Web Search on IBM and SASTRA sites. We're using the industry standard Nutch crawler for crawling the Web, and again Lucene for indexing and searching. Also, using Clustering plugins like Carrot2 and using the Ontology Specifications of Open Calais, result presentation and query processing are improved respectively.
- Ashwanth Kumar and Salaikumar @ Saravanan

Friday, December 3, 2010

Research Engine - SRS (From Google Docs)

You can also download the PDF from

Research Engine Update #3

Wow! Atlast, Myself and Salai has settled on the final draft of Research Engine. Here are the following features (trust me this list is final, its decided)
  1. Web Search - Combination of WolframAlpha and Google
  2. Code Search - Similar to Google Code Search
Also, user can register using their Facebook account. It also has 2 main sub-components inside it,
  1. To synchronize bookmarks (Facebook Likes to quantize the search results)
  2. Maintain Web History, similar to Google's Web History.
Alpha version for closed circle testing will be out soon. Watch this space for more info.

- Ashwanth Kumar and Salaikumar @ Saravanan

Thursday, December 2, 2010

Research Engine - Update #2

Haaaa, A very good morning. My day kicks of with the need to design (re-architect, atleast I and Salai must have redesigned the entire RE about 5 - 6 times now, we lost count) RE for the final time, before the development work starts (by today afternoon).

Yes, there were earlier prototypes built during our Exams (mainly for testing technologies only). Actual product development starts only today.

So, yes it was decided that RE will not give you any Search results, instead it helps you find information (after processing unstructured data on the web like Google Squared {thanks to @Nivas}, Wolfram, etc.); for which we need to find a way to store the information (knowledge) or in our case (in pure Semantic Web style) Resource.

Thus, after breaking my head in due course sleeping for 35 min. I've come up with a resource representation technique (data structure for resource);

Every single information found on the web is said to wonder in space (Web) in no definite path. We pack such information into an infolet. Since information is unstructured, there is no definite domain or context to the source, unless interpreted from various sources like in Wikipedia.

Infolet - An entity of information containing subject, context, and data (actual info)
Eg. Ram was born on Jan. 1, 1990.
Subject - Ram
Context - {year 1990, birthday}
Data - "Ram was born on Jan. 1, 1990"

- Syndicating many infolets based on the subject, forms the Infomat. It tells us, what it is, but no information is provided about where, how, etc.
Eg. General_Info - ({Ram was born on Jan. 1, 1990}, {Ram won his first International physics Olympaid in 2000}, {Ram joined IIT after securing AIR #1 in JEE}, {Ram feel in love with Sita, and married her at 2017}. {Ram and Sita, lived happily ever after});

In the above example, each {infolet} is combined into a single (infomat) containing a collection or set of infolets, based on their Subject. Sequence is still a problem, unless a year is specified in the text, to enhance its context a bit more.

Infonet - Collection of Infomats based on their context (yes, various context information about a particular subject). Infonets are always in proper order, as the context is taken into consideration.

Eg. FB_Wall_Sets - ({Ram joined FB}, {Ram added Stanford to his list of Schools}, {Ram is preparing for his Advanced Operating Systems overnight}, {Ram had a good nice date with the most beautiful girl in the whole galaxy})

From the previous 2 examples, you can find that 2 Infomats named, "General Info" and "FB Wall Sets" are combined into a single Infonet as:

Generally - Ram[General_Info,FB_Wall_Sets,...]
In detail - Ram[General_Info({....},{...}),FB_Wall_Sets({...},{...}),...]

Thus goes my resource representation model, all the subject are (tentatively) linked Facebook (i just love FB for its pure interest and care it takes for Social network, no comments about Privacy Issues okay?). Using FB Connect, every single Member, Page, Group, everything has a RID (resource identification) from which all subject are derived.

PS: We're not afflicted with Facebook by any means.

Any feedback regarding the same will be highly appreciated.

- Ashwanth Kumar

Wednesday, December 1, 2010

Research Engine - Update #1

After a good evening dinner, I and Salai, came back to his room to discuss about Research Engine, and its implementation methods. The first thing, that made us wonder even while going to dinner was - "What exactly does Multi-column database engine does? What is it exactly? How does it differ from traditional fixed column schema of writing DBMS apps? Thus, is implementations like HBase, Cassendra, Voldermort, etc are actually needed for our project?"

When we came back, Salai immersed himself into all this (I'm yet to get updates regarding that from him, as of writing this post); while i was busy prep'ng the system with Ubuntu 10.04 on Central node (Salai's PC) and updating it. Meanwhile also updating my farm at Farmville ;)

I was so damn tired after clicking through my 350+ plots of land for seeding, i decided to call it a day and went to sleep.

- Ashwanth Kumar

Tuesday, November 23, 2010

I'm back!!

Hi ,actually it took me Long time to Come out of my Personal matters,Anyhow i'm Back.I Feel it is the right time to start our work and finish "Research Engine". Hope God will help us to give life to our dream...

- Saravanan @ Salaikumar

Sunday, November 14, 2010

Research Engine - Alpha (under way)

Research Engine Alpha, is under construction and will be out for testing soon. Its primarily built on Apache Solr server. Its plugin architecture helps a lot to include a custom components performing the required job, and still remaining to be a stable search server.

I thought it would be easy to start with Solr, and go as required.

ETA: 45 days

- Ashwanth Kumar

This is an update post, used as a log for me and my teammates working on the project to record the happenings of our work. This is a personal effort. If you plan to use this intelligent property, please drop a message at:

Wednesday, November 10, 2010

My 5th Semester Practicals - I

Just now, i came finishing my practicals examination. Paper today was "Computer Networks"; i got
  1. A Simple Firewall Implementation
  2. Ping this computer, and get its IP Address and its name. Also, ping other systems in the network to detect their IP Address and name. (Seriously this is so easy)
I took my paper to find these. I was a bit happy, cauz both are simple programs. Though Firewall is huge but easy to code, approx 150 lines of code thats it. I wrote the usual cover story on the answer sheet, it was then I heard something much worse than "PROFANITY".

"Write the Aim, Procedure, PROGRAM and result. After which you can use the system."

I just don't get it, or if i'm wrong please correct me. Why on the earth, in a computer science practical examination, we should write a program first, and then type it and show the output. Are they testing us, being in third year CSE, do we know to type with a keyboard and compile the program?

Seriously! when i asked my invigilator, he said, "NO! Write the program and only then you can type!". Why is the educational standard in SASTRA going so low?

Well, actually its not only here (in SASTRA), its the case in most of the Universities in India!

How do they propose to make better programmers and software professionals who actually mug the program for their practicals? How do they except us, to develop India by 2020 (i find this year fancy, and nothing more special)??

Do you have an answer for this??

Saturday, November 6, 2010

New "Web Page" Concept - Alpha - Work in Progress

Hello, After sleeping for 10 hours straight, and really pissing my gf off, I started working on our college Tech Fest, Theta '11; website ( I was browsing many templates online, but was happy with none! :(

So, i thought y not create my own template (1st time ever!). Concept!? I need a damn concept for the page; I ain't a designer in Photoshop or Flash not even in Rome! I still need to create stunning website to attract sponsors.

My understanding of the webpages' are as follows:
  1. Full multi page websites - In this category, entire website is composed of multiple pages. I found 96% of the websites on WWW fall under this category.
  2. Single portfolio websites - In this category, entire website is composed in a single page; but contains many sections within it. Each section is loaded on demand via Ajax or using scripts like Nivo jQuery slider, etc. I also found only certain websites like Portfolio, personal resume, etc fall under this category. It constitutes to around 4% of WWW.
I wanted to create a new web page concept, something that meets the Web 3.0 standards for WWW AI, and also being able to be accepted by Web 2.0 users.

  • Something that will replace conventional Menu bar by something else; Making the full use of CSS3 and HTML5 to its extent.
  • Something that targets modern browsers, and not some ghost browsers (IE 6 -).
  • Something that will replace the single-click information to fully user interactive page.
  • Something that will acknowledge the user and not the webmaster.
  • Something that can sustain for next decade!
  • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I need a damn concept! Any Ideas or suggestions, please post!

- Ashwanth Kumar

Thursday, November 4, 2010

Diwali Eve Logs

I've just now finished my dinner! Its around 10.30PM here. I don't generally sleep the day before Diwali. So, just wondering what can be done now for the entire night?!

Update: Nothing really happened~ I just slept off! Man, y don't i get ideas like Mark Zuckerberg?

- Ashwanth Kumar

Monday, November 1, 2010

My New Project - After Effect of "The Social Network"

Inspired by the movie "The Social Network" (still 45 min to go as on writing of this blog entry), i thought of this idea. I don't know if such ideas exist already or not, if you know of any please do enlighten me. I would really like to use it.

Idea in Abstract:
An operating system, or a system application, that logs all user interactions with the system and learns the user's tastes' of life. It monitors everything starting from scanning the directory structure, analyzing the movies, songs, pictures etc. available in the system. Live with the user and learn along with him/her. An artificially intelligent OS or a program, that can assist user in all his actions, with the system. Sometimes can also be a PA to him.

In Detail:
I want to create an artificially intelligent program or OS, that assist the user in everything. Making the OS customizable on the fly (dynamically). It can get the job done instantly, with the help of few or no inputs at all. Yes, it can be scripted or commanded over the mic or message or chat or email, virtually any means of communication can be understood and decoded by the system, and the corresponding actions shall be executed.
- Also, it can suggest you online movies or news articles or etc. after reading (analyzing) your browser history, file access history, etc.
- Provide updates for your Anti virus programs,
- Tell you if there are any new mails, or your girl friend is online.
- Analyse your phonebook and tell you if there are duplicate contacts.
- Sync all your email accounts, and suggest frnds from FB.
- Suggests new albums for your iPod or MP3 player
- Well the list is endless..

Well, i do know there exists separate pgms for doing all this crapy things. It has become a part of our daily life. I just want to integrate all the frequently done jobs into a single piece of application. Bring them all under one roof, make it install as components, etc. Its like this, it can read the chat or email msg, to find out the tone of the other person, and suggests you the same. It can play background music, when it feels you're really pissed off.

I don't know the feasibility of this project or say boredom blog post. I just wanted to know, if such a thing exist already or if we're really in need of such an app.

Please feel free to post your comments on this issue.

- Ashwanth Kumar

Sunday, October 17, 2010

Research Engine - A new identity

Well, this is yet another series on my TGMC project this year, "Research Engine". I've been thinking off late so much about Research engine, especially after my break up with my girl few weeks ago.

"If i would make it a search engine, it would still like be one of them. May be one with a better filtering capability of all others, but there is no innovation in that. Search is still not a solved problem! I still believe in that, but what can Research Engine can possibly achieve?!", said me to myself.

I was pondering over this for a long time, until now. I again came up with a new improved version of Research Engine, one already but not completely visualized by my team mate, Salaikumar @ Saravanan (

We came up with another way of representing information to the users. All search engine, gives you pages and pages of links, to some so-called relevant information for your search query. We though why not give the users directly the information what they are looking for, instead of giving them links to go and search from.

So we decided to have a user view layer like that. It is information centered (re)search, so that you don't have to re-search again else where.

Well you should be thinking, "Isn't this what Wikipedia does in the first place?". Actually the answer is, "Yes!". You can also call Research Engine to be the next encyclopedia, but the potential of RE is more than just displaying information.

Major features or improvements over a wiki is that,
  1. All the data collection process is completely automated.
  2. All the information you see is live data (possibly some milliseconds old)
  3. All the information you get is presented after analysis of same content over a variety of authenticated websites.
That is with this post as of now. If you feel, RE can have more features or if you have a suggestion for the same, please drop your views on the issue, as comments. As i said, i want this to be a community driven Web 3.0 Technology, that makes the lives of the people easier.

- Ashwanth Kumar

Wednesday, September 22, 2010

"Unable to scan networks"

Hell Yeah! I've rooted by Android device successfully. Y on the earth do i've to do it?!

I had a problem with my WiFi! It gave me the "Unable to Scan networks" error message since yesterday evening, and i was fed up with it. While i was googling about the problem, when I was -motivated- forced to root my device ;)

After rooting the device (how to do it? Can be found all over the internet. Go to for more details), i'll skip this part and go to the solution WiFi problem directly.

Solutions: There're basically 2 solutions available

  1. Simple, less effective - Soft Reset option. Go to Settings -> Privacy -> Reset to Factory Defaults. Phone is automatically rebooted and your WiFi problem is solved. Only thing, is that you loose all ur Apps, settings, etc. However, your SDCARD files are not deleted.
  2. Bit complex - Requires rooted device. Delete the bcm_supp.conf file from /data/wifi and reboot your phone. It'll work again. U need to set up all your Wifi connections (with key again).
    Note: Backup the existing file just in case.
PS: I've tested the above procedure on my Samsung Galaxy 3 ( I5801), with success. Use this at your own risk. I can't be liable for any problems/loss that may arise to your phone, or your girl friend, or PC, or yourself, or any damn thing you'll charge me with!

Monday, September 6, 2010

Atlast Office Buddy - Alpha Version out!

At last, after 2 months of work. Completed the alpha version of Office Buddy project. All the changes are committed to the SVN. If you want to check out the code, please use branch/alpha as the trunk has still old data.

Its made open-source, under Apache License 2.0. If anyone is using this project, please drop me a mail or a confirmation, I'll add you to the list of usability. Also, if you find any bugs or usability problems, please report them in the issues tab.

Update: At last i gave a demo today. Everything was good, and the AO has proposed some new additions. Will patch accordingly, and let you know.

PS: This is update post.

Friday, September 3, 2010

Fuzzy Speaker

Fuzzy Speaker, is a gesture based project for giving speech to dumb people. It was mainly inspired by Sixth Sense by Pranav Mistry, and Project Un-Mute by Ramaprasanna Chellamuthu of Microsoft.

Project Description:
Its a wearable device, that can detect gesture language of the dumb and speak for them.

Ashwanth Kumar, Gaurav Kumar, Nilesh Kumar of SASTRA SRC, Kumbakonam.

Related Papers: (Papers inspired us)

Update: This project was accepted by our Dean, for sending to CENTRE FOR TECHNOLOGY DEVELOPMENT AND TRANSFER (CTDT) for funding. Keeping the fingers crossed.

Research Engine - Project Proposal Accepted

Horray... Horray.. Our project proposal, for TGMC 2010, Research Engine has been accepted by the TGMC 2010. Now, its time to kickstart our project officially in full fledge.

Hell Yeah!!

PS: This is an update post.

Saturday, August 28, 2010

Innovative Student Project Fund

I found this on the net. Hoping, it might be useful for someone like me. Anna University, is proposing a Fund of Rs. 25,000/- (Max) to student projects. Only criteria, is that, U must be a pre-final year student.

Hope this is useful.

PS: This is just a information post.

Tuesday, August 24, 2010

Our TGMC Group - Tech Buddies

Created our TGMC Group - Tech Buddies (, as per the TGMC rules. Except that i forgot to register first! (:0)

All other information can be found there.

PS: This is just an update post.

Friday, August 20, 2010

?!@#$^& :'(

Just when i thought, I'm an inch from an innovation. Technology looked at me, and said: "Aaah! Thats old news". Let me brief you, what exactly happened.

My Topic: Universal Device
Description: Change a touchscreen with pop-out display device into practically any electronic device, using it APIs. Run different apps, on a virtualized manner (mobile virtualization).

So, I did some Googling to figure out, if such researches already exist. And guess what?! I ended up in the following two pages.

TGMC as a Final Year Project?!


I was browsing through the wikis' of TGMC '10, and i harped upon this (
Wiki Page titled, "TGMC Project as Final Year Project", For all my final year friends, do u think will be able to implement this in our university or the campus?

Please do comment, what do you think about this!

Wednesday, August 18, 2010

Conference Alerts

Hello again, today one of my Prof. asked me to do some paper for two international conference(s). Well they're,
Conference? Me? She was really kidding (never mind her). I've never done a paper in my life before. I hate to sit and think. I just love to work with new Technology (I don't create them, atleast until now). Now, she has changed the way i look at things. I'm now planning to submit my G-SPADE (my innovation of the existing SPADE project) in one of the conference. Let's see, how well it goes. I'll keep you guys, posted.

PS: This is just a update post.

Components of Research Engine - Developer Preview

Okie, this is going to be the first component proposal for Research Engine, by Ashwanth (myself) and Kirubaharan.

Any comments, and feedback regarding this is welcome.

Ya, I know, the project is becoming more and more community based. I dont mind, if there is a fork too. Just let me know there is one! I would be happy to know it. Once we start Coding, CodeBase is also planned to be publicly available too.

TGMC '10 Project Scenario

TGMC 2010, now has some new rules regarding custom Project Scenario being sent to him for approval, and then development of the same.


Well, Research Engine, does need approval i think?! Anyways, I'm currently working on the same against the TGMC Scenario Template. I'll post the Link, once i'm done with it.

UPDATE on 20th August, 2010: Our TGMC Project Scenario ( We've send this for approval, and awaiting for their response.

PS: This is just an update post. To log the status of the app.

Tuesday, August 17, 2010

Scalable Parallel and Distributed Environment (SPADE)

Today, at around 6.30 PM i went to Nivas's house, to discuss generally about SPADE (his final year project) and sample search engine, built with it. I actually learned a lot from that, and this post is to reflect the same in words. This is basically for my own purpose, just in case. If this was useful to you by any means, please leave a comment, or say your thanks to him.

SPADE -3 Properties:
  1. Object Oriented Support - full support for all data types
  2. Distributed cache vs DB
  3. Scalability on the fly (as opposed to the need to rewrite source, recompile and redeploy in MPI)
  4. Dynamic scheduling - No barrier synchronization - no static binding of code to machine
  5. Asynchronous Communication as opposed to blocking sends and receives
  6. Fault tolerance - no single point of failure
SPADE Abstract can be found here:

I'm currently planning to work with him, on this environment to implement the Research Engine, for ThinkQuest 2011. Again, time'll answer everything. I'll keep this updated, as we progress further.

Sunday, August 15, 2010

TGMC 2010 is Out!

TGMC 2010, is out with a bang. This time, it is expected to be with more fun and learning experiences.

The site was quoted as: "The Great Mind Challenge (TGMC) is back in a new avatar!After a superb year in 2009 where we saw more than 100,000 students participating with some excellent projects, this year we’re proud to announce the launch of TGMC 2010.This year, there is unprecedented focus on the most important aspect of TGMC. You.As a student or a faculty member, TGMC 2010 is the forum for you to come forward, take your destiny by your hands and make it happen. And we put it in the form of this simple mantra.
‘Initiate. Collaborate. Innovate.’

YOU need to initiate the chain of events that will take you to the brink of Success.
YOU need to collaborate, with your peers, your faculty and IBM to ensure you achieve that Success.

And YOU need t
o innovate to ensure that together, you achieve not only Success, but a lasting place in the Halls of Fame.Technology is the vehicle. YOU are in the Driver’s seat. And it’s going to be a great ride.

Saturday, August 7, 2010

Crawl + Index Module Test - Successful

Today, after a long sleepless night, crawl and indexer search module of Nutch has been tested, and this blog has been taken as the test benchmark. Results seems promising.

Tech Info: Test was performed on a single system, running the following config: 2 GB RAM, Ubuntu 10.04 Desktop, 160 GB HDD. Time taken was ~75 secs @ 2Mbps connection.

PS: This is an update post.

Research Engine - Code Name: iBlue

Now, its official (from the team + mentor), the Research Engine has been code named: iBlue.

Logo coming soon.

PS: This is just an update post.

Wednesday, July 28, 2010

Hadoop Cluster Deployment + Step-By-Step Process

I've successfully deployed a small cluster of 3 nodes on Hadoop platform. I mark this as the 1st success towards the long road for Research Engine. It took me a while to understand the basics (since this is the 1st time) but it was such a wonderful experience.

The Cluster Specs're:
  1. Core - Ubuntu 10.04 - 2 GB RAM
  2. Core - Ubuntu 9.04 - 1 GB RAM
  3. Virtual PC (VBox 3.2.4) - Ubuntu 10.04 - 512 MB RAM. (Host is 1 machine)

Tests: Grep for Map/Reduce, Content Duplication by going on a copy of 500 MB replicated over 2 nodes.

Info: 1 NameNode, 1 DataNode, and 1 JobTracker

Below is the Step-by-Step procedure to deploy a Hadoop Cluster (for Learning purposes only. This can't be used as such in production environment. Please refer to Official Docs, and latest release for that). See the disclaimer on the bottom before even you start reading beyond this.
  1. In this steps, i shall assume you've 3 -4 systems, on a network and each of them running on Ubuntu 9.04+ with sun-java6-jdk and ssh packages installed. Its preferable to use a new system installation, though its not mandatory.
  2. Due to some issues with Hadoop 0.20.* (latest stable as of writing this post) we shall now (currently) use Hadoop 0.19.2 (stable). You can get a copy of yours from: (53 MB).
  3. Create a new user for Hadoop work. This step is optional. Its recommened, as the path HADOOP_HOME is the same in the cluster.
  4. Extract the hadoop distribution on your home folder (u can extract it anywhere though). So, your HADOOP_HOME will be like: /home/yourname/hadoop-0.19.2
  5. Now, repeat the steps in all the nodes. (make sure the HADOOP_HOME) is the same on all the nodes.
  6. We need the IPs of all the 3 nodes. Let them be:,, Where *.1.5 is the NameNode, *.1.6 is the JobTracker, these 2 are the main exclusive servers. You can find more info regarding them here ( Node *.1.7 is the DataNode, which is used for both Task Tracking and storing Data.
  7. U'll find a file called: "hadoop-site.xml" under the conf directory of the Hadoop distribution. Copy and paste the following contents between <configuration> </configuration>
    <!-- IP Of the NameNode -->

    <!-- IP of the JobTracker -->
  8. Make sure the same is done for all the nodes in the system.
  9. Now, to create the Slaves for the NameNode to replicate the data. Go the HADOOP_HOME directory in the NameNode. Under the folder "conf" you should see a file called slaves.
  10. Upon opening slaves, you should see a line with "localhost". Add the IPs of all the DataNodes you wish to connect to the cluster here, one per line. Sample Slaves will be as follows:


  11. Now, its time to kick-start our cluster.
  12. Open terminal in the NameNode, go to HADOOP_HOME.
  13. Execute the following commands:

    # Format the HDFS in the namenode
    $ bin/hadoop namenode -format

    # Start the Distributed File System service on the NameNode, which will ask you the passwords for itself and All the slaves, to connect via SSH
    $ bin/
  14. Your NameNode should start and be running. To check the nodes connected to your cluster, go to step 19 and come back.
  15. Now, its the JobTracker Node. Execute the following commands:

    # Start the Map/Reduce service on the JobTracker
    $ bin/
  16. The same process follows for JobTracker. It asks all the password for itself and all its slaves (did i tell u, u can also add slaves to JobTracker?. Its the same process as NameNode, just add the IPs to the slaves file of JobTracker Node's Hadoop distribution), to start the Map/Reduce service.
  17. Now, that we're done starting the cluster. Its time to check it out!
  18. In the NameNode execute the following command:

    # Copy a folder (conf) to HDFS - For sample purpose
    $ bin/hadoop fs -put conf input
  19. If you go to, on your browser. U should see the Hadoop HDFS Admin interface. Its a simple interface created to meet the purpose. It shows you the Cluster Summary, Live and Dead Nodes etc.
  20. U can browse the HDFS using Browse the filesystem link on the top-left corner.
  21. Go, to, to view the Hadoop Map/Reduce Admin Interface. It displays the current jobs, finished jobs etc.
  22. Now, its time to check the Map/Reduce Process. Execute the following:

    # Default example code comes along with the distro.
    $ bin/hadoop jar hadoop-*-examples.jar grep conf output 'dfs[a-z.]+'

Disclaimer: This is for my future reference. I don't take any responsibility over physical/mental/any other type of damage that may arise on following the above said process.

Tuesday, July 27, 2010

Opensource Project Hosting @ SRC - Part II


This is just an update blog post. The "Opensource Project Hosting", at SRC has been approved by our System Admin. So, if anyone is interested in join this project please let me know!

Contact Details:

Requirements: U should be a student of SRC.

Sunday, July 25, 2010

Project Gym Buddy

My first idea for a robotic project. Inspired by Ramaprasanna Chellamuthu's presentation at Microsoft Community Tech Days, i've actually started thinking robotically ;) Now, let start with the project description.

A short history
Today we had the usual power cut at 0800 hours, and i was doing my workout at my exercise cycle, when this stuck me.

Use my exercise cycle for playing games, do all (thats almost that is) my gym activities right with it. Why would any one want to play games with their exercise cycles, when u've high end Wii remotes and stuff?? Well its because i want to multi task! ;)


Crap it Ash, take a look here.

Saturday, July 24, 2010

User is the King!

While browsing through Yahoo! Labs, i came upon 2 projects, which actually inspired me a lot. So, this is just a brief implication what i've got from them, and use of similar feature in my Research Engine project.
The thing that really got my eye is this, "The biggest scientific challenge in contextual advertising is that compared to sponsored search or Web search, user intent is not very clear".

My Question: How on the earth are we to find the context of the user who we don't know or can't see?
Their Answer: The Keystone system works by first extracting "essence" from opportunity - understanding what the content is about and who is viewing it.

More Info: A key difference between Keystone and other contextual advertising systems is that Keystone tries to predict and model user response based on all user context, including page content, user attributes like behavioral and geographical data, referrals to the page (how the user got there), and information about the publisher page.

Read the rest here.

Another project is Motif, from Search technologies group of Yahoo! Research.
Project Motif is very similar to Keystone (which is focused on advertising), in usage of context. The thing is Motif is more concerned about Query Context, than User Context. Try out the demo here, you'll know what i mean. This is relatively easy to implement and maintain.

I really like Motif for its search relevance, and like to add a similar feature to my Research Engine Search Module. Also, Keystone methodology helps me understand user context, based on which i can search the query context to further grain my results.

Got any similar kinda stuff? Please share!

Open Source Project Management in SRC

The following post is available for all my SRC, SASTRA frnds only. Never mind, if u're an old or present student if u've any connection for SASTRA SRC, Kumbakonam. Then its for you!

I would like to propose a new project to implement using the H/W power on our college server. Its Opensource project hosting. Like, Google Code Project Hosting, Git, etc.

Features of this proposal
  • Students given an account and allowed to create multiple projects on the server.
  • Server supports Subversion, Mercurial, and if possible Git also.
  • They're given a Issue Tracking system (like Trac).
  • Others can also join them to code an application.

General focus is on final year students, who can use this repository for their final year projects, so that once they're out, juniors can work on the same topic to improve the existing system. Actual scope of this project is very less, but the main reason behind is to make is more effective along side the SRC-FOSS and make students contribute to it (not only by using/sitting in the class but) by actually working on wat they use as OSS.

Please post your like or dislike of this proposal in the comments.

PS: This is just my own proposal, i'ven't yet talk'd about this to any FOSS (pre/post/current) member(s). Will they approve for this? OMG! Thats a $1,000,000,000 question!

Update 1: This is a series of project proposals for SRC. If you've any ideas or proposals, leave it as a comment.

Friday, July 23, 2010

SRC Office Automation - On 2nd (3rd) Year

Its been 2 years now, since i've first stepped foot on my SASTRA SRC campus as a student. Its been so great that i never realized the day flying by. I would like to add a feather on this beautiful day, with a short info that happened just yesterday.

"Our Office, u know what? Maintains all the student info in a Excel Workbook!! OMG! Is SASTRA that bad?"

"Every time, i need to pay the fees and get a receipt, that guy takes eternity!!"

"SRC needs to grow a lot more!"

These were some of the usual comments we students normally say on seeing the condition of our management (at least in our campus). All that now gone in thin air? (Y??) I've been called by our AO (or is it EO, or an office staff?? I'm still not sure about it actually but all i know is that its official) and has been asked to do automation application for our campus.

I saw the Excel sheets in my own eyes and never believed the complexity involved! That poor guy maintains 5 *.xls files, with 6 - 10 sheets each (most of them were reports). The speed at which he moves in Excel was out-standing. To be really honest i never thought our campus office guy can work at this speed.

Who would've thought? Me getting the work, which i've been commenting about for the past 2 years. I feel very proud (dont know y though, probably cauz its for our office) and happy :)

Today, i completed the Data Model for their application and got the approval from the client (:wink:).

TO My Fellow SRC Dudes_&_Dudities: If you're interested, please let me know. Cauz i'm working on this project and many others (for our campus) alone. If you would like to join hands with me please do so!

Thursday, July 22, 2010

Which Data Warehouse Infrastructure to use?

OMG! Problem of selection of components has finally started again. Question is: Which data warehouse infrastructure to use for Research Engine?

Available choices are:
  1. Apache Hive -
  2. IBM InfoSphere Warehouse -
  3. Mike 2 -
  4. MySQL (Really ?) -
If you've got any experience in implementing any one (or more) of the above list, please do let me know. I can surely use your help. Got anything more? Better suggestions? Please do comment..

Update: Since its TGMC, i'm sticking with InfoSphere.

Research Engine - Work Flow

A proposal for the event work flow for Research Engine, If u've suggestions or improvements, please do post it as a comment.

  1. Get the user input search terms or Query
  2. Find the Model (or Domain) at which the Query belongs to. This step is to find the model of the user query using keywords. The purpose of this step is to identify as much related models of the query as possible, for the Query processing based on Language Processing (LP) techniques.
  3. Once the list of related models is identified, the query is now under the process of Language Processing (LP). This step ensures the evolution of the Research Engine, over time. This step does the following work: Understand the Query, Identify the exact related models (if any) or Create new Models (if none).
  4. Once we identify the Models regarding the query, query the Model Data Store (DS) to fetch the related information about the model (subset of the model).
  5. The output of the previous step gives all the related information, the user wants. Now, all that is left is to output the processed info in any format of choice (depending upon the application).
Well, this is the initial setup of the RE, so this event flow is subject to change at any time without notice. If you've suggestions or improvements over the existing design please do let us know.

Wednesday, July 21, 2010

Research Engine - Interactive Search Engine

Atlast our mentor accepted our proposal, "Research Engine". Its basically an incremented version of Semantic Search Engine. Its being planned to be built from down to top in a pure interactive way. Since, too much of interaction makes the user lazy or dislike the concept, we're planning to extract meta data from social networking profiles of the users(like Facebook, Orkut, Twitter, MySpace, etc.) to automate the process of interaction and improve it, dynamically

Also, we're planning to build this project on-top of Nutch. Many modifications are required to make it a semantic search engine. DB2 9.5 Enterprise, Jena, WASCE, Hadoop, are some the major components to be included in the project.

PS: I'll try to make updates like this regular, but not sure about it either.

Tuesday, July 20, 2010

My TGMC teammates this year

Atlast after a long struggle for team members with vibrant interest and enthu to match my frequency, i got myself the best pieces of SRC. Below are the names in Alphabetical order:
  • Ashwanth Kumar - III CSE
  • Kirubaharan A - III CSE
  • Saravana Kumar - II CSE
  • Swetha S - II CSE
Now that the team is set, we're all ready for the launch of TGMC 2010.

Tuesday, July 6, 2010

Hello Blog

Hello, I'm Ashwanth. I never actually sit and blog, but i just thought i'll give it a try! Let's see how well does this actually go.