Tuesday, April 26, 2011

Sentiment Analysis - iBlue Component Preview

iBlue is fast evolving from a dream to a reality. After working on Semantic Web in depth for more than a week, I learnt many principles, theorems (I didn't even bother to study theorms' for my Maths papers), standards, and many more.

You might be had a look at my Spam Detection plugin for Elgg, here. It was during working for this, I came to know about Bayesian Filters and its usage in SPAM Detection, text classification, etc. It is one of fundamental machine learning techniques. Blah.. Blah.. You can find more details at the wikipedia - http://en.wikipedia.org/wiki/Recursive_Bayesian_estimation

Thats it with the introduction, and now I welcome you all to test drive my very first Bayesian filter implementation - Sentiment Analysis. It is every similar to the working of Semantic Extractor, except it gives only one information about the text.

Is the given text a positive feedback or negative?

It returns the final percentage of both the cases. So, just go ahead and give it a try.

Technical Specs for Nerds - The filter was trained using the test data from the dataset provided by Mark Dredze in their Multi-Domain Sentiment Dataset. I took around 25000 Amazon reviews from the dataset to train the filter, from multiple product categories.

Demo for : http://ashwanthkumar.in/labs/sentiment/sentiment.php - It uses the content from Mashable for Samsung Galaxy S Android Smart phone (Link: http://mashable.com/2010/07/26/galaxy-s-review/)

Any feedback is highly appreciated.


Lakshmi Narayanan G said...

You're awesome bro. Now can you please check this one out?

Sentiment Analysis Component Preview
Ashwanth Kumar rocks!!
Sentiment Results

Positive: 100%

Negative: 0%

Sentiment Analysis Component Preview
Ashwanth Kumar rocks!!!
Sentiment Results

Positive: 17.03%

Negative: 82.97%

Both the comments are not much different except for one !. And how does ! makes the comment 82.97% Negative?

Unknown said...

Hai it is working..but i want to know how actually the analysis is being done and on what basis the results are?

Ashwanth Kumar said...

@Lakshmi Narayanan - Thanks for trying it out. I'm not exactly sure da, Cauz I'm not saving the Filter scores yet.

Well, probably using a extra ! might have represented a more excitement to the filter which should have made it look more like a negative one, and "Awesome" accounted for the positive score.

I'm on the scale of improving the knowledge base, So it should be able to learn more soon.

Ashwanth Kumar said...

@Kriti Hey thanks for trying it out yar. As I've already mentioned in the post, It uses Bayesian Filter analysis to test the data as how much percentage its positive and how much its negative. Check out the wiki for more details.

Venkatraman said...

It's cool I tried out some articles from my old blog and other reviews .. results were cool and acceptable..
Then tried out this...

Sentiment Analysis Component Preview

Sometimes I am an idiot but actually I am a genius

Sentiment Results

Positive: 19.35%

Negative: 80.65%

Sentiment Analysis Component Preview

Actually I am a genius but sometimes I am an idiot

Sentiment Results

Positive: 23.77%

Negative: 76.23%

1.Both sentences have same meaning but results are different(too close)...

2.It's result should be high in positive and low in negative but the actual result was quite the opposite.. because I have given the input with keyword "sometimes" I am an idiot so remaining most of the time I am a genius right??
Can u explain why??

Saanu said...

looks interesting! Good job ashwanth!!!

Ashwanth Kumar said...

@Venkatraman Anna, I'm not performing Natuaral language processing. So, system can't understand if the meaning is close or not, and more over if u've read the above comments the dataset is limited to product reviews. So, if you try something abstract or logical. System can't process it yet. I just finished it today, so my baby is still learning a lot. I'm planning to add an teach interface so, that any user can teach the system how things are to be done.

Thanks for trying it out!

Ashwanth Kumar said...

@Sannu Thanks akka :)

Bhagi said...

great job dude :) but as like venky anna and gln , even i tried few sentences with a similar in meaning but different in vocabulary , it shown different results , but fine as it is a just born baby it's having a lot to learn and as far as i know u r the best teacher so we will be awaiting to see your fully completed app :)

Great job friend :)

Unknown said...

nice work... keep testing it. it will be a complete component one day

Ashwanth Kumar said...

@Bhagi Thanks dude.

Ashwanth Kumar said...

@CS Yeah sure man!

Unknown said...

Hai Ashwanth,
This is really cool! Great stuff..After the great success of Web-2.0, sentiment analysis became a demanding and commercially supported research field..It sounds like you’re off to an amazing start..Keep doing. & I have few doubts..

Have you tried inputting your scores into other types of classifiers yet, like a neural network or support vector machine, like LibSVM? Whether it is automated sentiment analysis ?
& U ve mentioned Multi domain dataset ( which version it is ..Name it)..& Try it out with Multi Domain Reuters Dataset(this dataset has uneven positive and negative samples in each subcategory)..
We generally use sentiment labels( r u familiar with this) ranging from (1) Very Negative to Very Positive: Very Negative, Negative, Neutral, Positive, Very Positive; (2) Negative to positive; (3) Negative to neutral; (4) Positive to neutral etc. In most of the paper that I read; I found they use this type of shift in classification. There should be some effect of such shifting and including this effect may give more effective result...Check out these issues if time permits...
But again, great work! Eagerly waiting to see ur complete result !!!

Ashwanth Kumar said...

@ice Akka, thanks for trying this out.
"Have you tried inputting your scores into other types of classifiers yet..."
No akka, I've not yet tried that. I just learnt about Bayesian filters when I was developing a SPAM detection plugin for a client of mine. So, i thought Y not use the same for sentiment analysis. While googling I stumbled upon the multi domain dataset and I tried implementing the same. I learnt about SVM, and will try to implement it this summer.

"We generally use sentiment labels...."
Yes akka, that is very very true. That dataset I've mentioned here, also has the same. But, currently I didn't make use such labels to the fullest extent yet. As I said, I need to explore and learn more this summer. Will do so akka. Thanks again :-)

Post a Comment