Tuesday, December 7, 2010

Research Engine Update #4

OMG! Today is the last date for SRS Submission! I'm yet to start mine!

Fine, let me just brief whats the status about is happening in for iBlue.

  1. Got (46.85% as of writing this post) downloading the entire Wikipedia in all languages (but We'll be using only English(en) and German (de) versions) in N-Triples (Around 16GB)! - Thanks a lot to DBPedia, for its public available data set. Its this dataset that powers the Knowledge Engine component as of now, until we think of any other better implementation.
  2. Completed the Code Search (like Google Code Search) in CLI. - We're harvesting the Apache, Google Code, SF.net public SVN repositories (using SVNKit), index them, and provide a search layer on top of it. - Lucene is used here extensively for both indexing and searching.
  3. Completed the Web Search on IBM and SASTRA sites. We're using the industry standard Nutch crawler for crawling the Web, and again Lucene for indexing and searching. Also, using Clustering plugins like Carrot2 and using the Ontology Specifications of Open Calais, result presentation and query processing are improved respectively.
- Ashwanth Kumar and Salaikumar @ Saravanan

No comments:

Post a Comment