Is It Time To Reconsider Sentiment Scoring?

In a recent blog post called The Problem with Automated Sentiment Analysis from Fresh Networks, a social media agency, they evaluated a few sentiment tools and their results are quite similar to what we’ve found in a number of our own experiments:

–       About 80% of posts are neither positive nor negative.

–       Sentiment tools “accuracy” of 70% to 80% is largely driven by their ability to correctly label neutral posts.

–       “In our tests when comparing with a human analyst, the tools were typically about 30% accurate at deciding if a statement was positive or negative”

From the blog comments, it’s clear that the companies in this space are doing their best to obfuscate the truth. To some’s credit, they do state that sentiment alone is not enough information to derive any conclusions.

However it’s NOT better than nothing, it’s actually worse than doing nothing because you are getting INCORRECT information.

With sentiment there is no such thing as accuracy, there is only agreement.  The technology can’t become more accurate, it can only agree with people more often.  And, “sentiment” does not mean the same thing to all people in all situations.  You can’t get more “accurate” at “sentiment” because what you are actually talking about is trying to solve hundreds or thousands of slightly different problems with one tool.  Until we can map the human brain into a program or electronic circuits, I just don’t think that is going to happen.

I completely believe that having inaccurate sentiment is worse than having nothing.  Here is a good example.  In posts about “Blackberry” that have been classified 3 different times by hand, about 32% of posts are positive (with a majority vote).  When we take that same data set and have each post classified 10 times, now about 10% of posts are positive (with a majority vote).  And, if we only consider the posts we are confident in, only about 3% of posts are positive.

So, which is it: do 30% of people like “Blackberry” or do 3% of people, because that’s a BIG difference.  Of course, the answer is probably neither because we aren’t actually measuring how many people like “Blackberry”.  Unfortunately, that’s how it can be interpreted.  Hence, bad information can be worse than no information.

Marketers need to be ware that a lot of these companies say they do monitoring and provide analytics like sentiment but in reality they are really keyword-focused listening platforms with limited analysis capability. If you really want to go beyond sentiment analysis you need to use semantic analysis. With semantic analysis marketers can better understand the conversations about their brand or product category– here is a white paper that compares Semantic vs Sentiment Analysis and can help you make a more informed decision about when and how to use Sentiment.

8 comments to Is It Time To Reconsider Sentiment Scoring?

  • Definitely there’s a lot of very weak sentiment scoring going on out there, and when it’s not weak, it’s simplistic, reduction to a few summary numbers an analysis that should happen at the “feature” rather than the document level. Good sentiment analysis applies sophisticated natural language processing in order to resolve sentiment, including emotion, at that more granular level: Not, Did person X have a favorable opinion in a hotel review, but rather, What did X think of the cleanliness, service, location, prices, and so on, each separately. Call this semantic analysis if you wish: Many better tools do it as part of their sentiment analysis tools or services.

    Seth

    (Chair, Sentiment Analysis Symposium, http://sentimentsymposium.com)

  • Hey Seth thanks for commenting

    It seems that we agree very much on the important points: the meaning or semantics of a post is more important than a measurement of its sentiment. Many customers still request the measurement and we are simply trying to demonstrate the value of an alternative approach!

  • In the sentiment reporting/tracking I do with my clients, we manually audit the reports each month, adjusting the sentiment and deleting the irrelevant posts. And we do separate analysis for review sites vs. Twitter and other platforms and track both the sentiment and issue of each consumer review. It can be a lot of work so we try and do a little each week to keep on top of it. It’s an inexact science, but it helps keep a finger on the pulse of what’s being said out there.

  • Hey Jack!
    Wow it sounds like you really do a lot of manual work to get the most out of your sentiment score
    thanks for commenting – hows the purple goldfish project coming?

  • You’re right. Scoring a post for sentiment does not work in general. But that doesn’t mean we should abandon semantic analysis of what people are writing online; it just means we have to do it in a way that provides useful information. One way to do this is using strict rule-based analysis for specific products or categories. It takes a lot of customization work up front but it means you can achieve highly accurate (yep, I used that word) results for a narrow industry or product category. For instance, looking at the auto industry, you can find out how many people like or dislike a specific element, option or feature on a specific car. That sort of business intelligence is useful for marketing, customer service, product planning and a number of other areas.

  • @Erik – thats a fair point – and I agree good quality data like you were suggesting takes hard work!

  • Hey Paul, your link to The Problem with Automated Sentiment Analysis is dead. Is this the story you meant? http://www.freshnetworks.com/blog/2010/05/the-problem-with-automated-sentiment-analysis/

    Doesn’t the fact that the majority of Social Media Monitoring services included in that post use Keyword-based as opposed to NLP-based tech cloud the water a bit? I’d be more interested in seeing their reaction to some of the smaller players in the space like General Sentiment or Crimson Hexagon (spoiler alert: I used to work at GS). There’s no question that automated sentiment scoring doesn’t work when you’re using keyword based systems, but, given my own personal experience, I think some of the smaller NLP outfits out there are doing a pretty good job. My two cents, in any case.

  • YES you are right Aubrey – that is the correct link – they must have changed it so thank you for that.

    Sure NLP is a better mouse trap than Sentiment but its still not perfect (spoiler alert – nothing will be perfect – since people cant always agree!) Machine learning too shows alot of promise for classifying text too – its still a developing space

    thanks for commenting

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>