Thursday, December 2, 2010

Research Engine - Update #2

Haaaa, A very good morning. My day kicks of with the need to design (re-architect, atleast I and Salai must have redesigned the entire RE about 5 - 6 times now, we lost count) RE for the final time, before the development work starts (by today afternoon).

Yes, there were earlier prototypes built during our Exams (mainly for testing technologies only). Actual product development starts only today.

So, yes it was decided that RE will not give you any Search results, instead it helps you find information (after processing unstructured data on the web like Google Squared {thanks to @Nivas}, Wolfram, etc.); for which we need to find a way to store the information (knowledge) or in our case (in pure Semantic Web style) Resource.

Thus, after breaking my head in due course sleeping for 35 min. I've come up with a resource representation technique (data structure for resource);

Every single information found on the web is said to wonder in space (Web) in no definite path. We pack such information into an infolet. Since information is unstructured, there is no definite domain or context to the source, unless interpreted from various sources like in Wikipedia.

Infolet - An entity of information containing subject, context, and data (actual info)
Eg. Ram was born on Jan. 1, 1990.
Subject - Ram
Context - {year 1990, birthday}
Data - "Ram was born on Jan. 1, 1990"

- Syndicating many infolets based on the subject, forms the Infomat. It tells us, what it is, but no information is provided about where, how, etc.
Eg. General_Info - ({Ram was born on Jan. 1, 1990}, {Ram won his first International physics Olympaid in 2000}, {Ram joined IIT after securing AIR #1 in JEE}, {Ram feel in love with Sita, and married her at 2017}. {Ram and Sita, lived happily ever after});

In the above example, each {infolet} is combined into a single (infomat) containing a collection or set of infolets, based on their Subject. Sequence is still a problem, unless a year is specified in the text, to enhance its context a bit more.

Infonet - Collection of Infomats based on their context (yes, various context information about a particular subject). Infonets are always in proper order, as the context is taken into consideration.

Eg. FB_Wall_Sets - ({Ram joined FB}, {Ram added Stanford to his list of Schools}, {Ram is preparing for his Advanced Operating Systems overnight}, {Ram had a good nice date with the most beautiful girl in the whole galaxy})

From the previous 2 examples, you can find that 2 Infomats named, "General Info" and "FB Wall Sets" are combined into a single Infonet as:

Generally - Ram[General_Info,FB_Wall_Sets,...]
In detail - Ram[General_Info({....},{...}),FB_Wall_Sets({...},{...}),...]

Thus goes my resource representation model, all the subject are (tentatively) linked Facebook (i just love FB for its pure interest and care it takes for Social network, no comments about Privacy Issues okay?). Using FB Connect, every single Member, Page, Group, everything has a RID (resource identification) from which all subject are derived.

PS: We're not afflicted with Facebook by any means.

Any feedback regarding the same will be highly appreciated.

- Ashwanth Kumar

No comments:

Post a Comment