For the past 2 days, I was working along side with a friend of mine for her final year project topic title. Her guide restricted the domain only to "Image Processing", for she had something to do with it I guess. Anyways after browsing and downloading about 40+ papers against Image processing. I really liked this one.
Yesterday we were preparing the abstract and after a lot of local misunderstandings, the guide finally signed her abstract. Today morning, when I was preparing for my semester exam, this thought struck me and I thought before I forget, let me document it. So here it goes.
Image Processing + Semantic Web => My Kind of Vision
If you took time to read the abstract in the above link you would know how wonderfully they had brought about a new data structure for image annotation. Their motive is to build an online collaborative image annotation tool, something like LabelMe. The main feature being - modular design, ability to import other online (LabelMe, Flickr) and offline (Caltech101, Lotus Hill) datasets into the system. Apart from the text based human annotation, it can also embed the low level image details like color histograms, etc. (I'm relatively new to Image Processing and let me just skip the rest of the details with the thought not to confuse you).
Well, this is already there and she has decided to implement such a tool with other added features like Web Services to search through the annotations, and more. Rest of blah.. blah.. content goes here.
When I was browsing through the dataset of LabelMe, I figured out something. Its like the usual temptation to study well only outside the exam hall. This is what I concluded myself with before I ran out of time for my E-Commerce exam.
I've a collection of images that were annotated very well (atleast decently well) to make it more human friendly and make the images more semantic. That does not make it Machine friendly does it? (or am I being carried away?)
Another MIT Media Lab project is ConceptNet5. This provides general usage common sense knowledge to the computers in my most favorite language - JSON. It contains around 15+ million entries in it.
So, this is what I'm going to build (hoping my guide would approve it).
- Taking the existing annotated image dataset, apply ConceptNet common sense knowledge (on the annotation of the images) to make the images more semantic and machine friendly. Now at this point machine can learn from the annotated image - Let us call it TEACH mode
- Build an index of properties with all the available images to enable image or object recognition via a Search Interface - Let us call it SEARCH mode
- Build a reasoner on top of the ConceptNet relations and annotation currently processed. This helps make conclusions based on the image annotations - Let us call it PROCESS mode
Possible Applications
- Now, combined the power of Relations and Reasoner I would be able to fetch dynamic content (from Google and Wikipedia) about a particular annotated object within a image when queried.
- With the power of Image index thus created, I can recognize the object and thus automate the process of annotation. thus providing the previous application in a more automated way.
Things I have to learn and random notes
- Basic image properties that I would be indexing with the images
- May be use SVG or a another custom data structure to store the index
- Bridge the relations and concepts to Predicate logic reasoning
- Modify a Machine Learning algorithm (something like Naive bayesian or even more sophisticated to make the image learning possible)
If anyone of you find any more features worthwhile to be added into the system, please feel free to post it as a comment. Probably this Idea isn't new at all. I didn't take time to Google about it. Let me know if its already there, probably we can build something really even more useful on top of that.
No comments:
Post a Comment