Showing posts with label Sentiment Analysis. Show all posts
Showing posts with label Sentiment Analysis. Show all posts

Sunday, November 18, 2012

Jab Tak Hai Jaan - Twitter Metrics


One of the topic that attracts good twitter traffic in India is movies. And if the movie is a much awaited Diwali release from Super star, then its a simple expectation to have a good twitter buzz. And on its release day Jab Tak Hai Jaan was a trending topic. I did collected the tweets related and in this blog will present some metrics. Some might call this Analysis but I prefer metrics/report because no Statistical or NLP algorithms involved to derive these results.

Data was from 11/12/2012 18:33 to 11/14/2012 12:15. Twitter Steaming API is used to collect the data and the keywords that were tracked are "JTHJ", "Jab Tak Hai Jaan" and "SRK". Not much of data cleaning applied as these keywords are not so common. Total tweets collected are 104,654 of these Retweets are 31177 (30%).
First lets see simple report, the tweet time line. The graph shows the tweets trending. The value on x-axis is dd/yy HH. Around 3'O Clock in afternoon of 13th Nov is the peak.

Below table shows top sources for traffic with re-tweet segregation. I expected percentage of re-tweets will be more for mobile devices. But my assumption seems to be wrong from below data. Surprisingly, twitter for Blackberry is more popular than iphone though the margin is less. This data also signifies the dominance or abundance of Android.


Source Total tweets Original Original% Retweets Retweets%
web 38043 24635 64.76 13408 35.24
mobile web 14813 10382 70.09 4431 29.91
twitter for android 14359 9285 64.66 5074 35.34
twitter for blackberryâ® 8828 6698 75.87 2130 24.13
twitter for iphone 8035 5287 65.80 2748 34.20
facebook 4682 4658 99.49 24 0.51
tweetdeck 1490 1069 71.74 421 28.26
tweet button 1429 1398 97.83 31 2.17
twitter for ipad 1276 807 63.24 469 36.76
ubersocial for blackberry 1114 935 83.93 179 16.07
tweetcaster for android 1103 774 70.17 329 29.83

Most of the tweets with triple figure  re-tweet count are from celebs. Of the top 10 3 are from Sharukh, 3 are from reviewers (like taran adarsh). One from Yash raj films also in top 10. Looks the movie team engaged themselves well in Social Media, which I feel is very important for successful social media marketing. Below are top 5 tweets with max re-tweets.


Tweet Retweet Count Author
RT @iamsrk: Very touchng how all of u r buyng tckets for jthj and excted about t. It makes me feel very wanted & postvely pro ... 1221 iamsrk
RT @bipsluvurself: Jab Tak Hai Jaan mkes u wnt 2 find ur soulmate ur true love.Infuses romance in2 ur life.A must see!Saluting the King  ... 641 bipsluvurself
RT @priyankachopra: As everyone comes together to Celebrate Yashji tonight..I sit far away and thank him for touching all our lives..wis ... 567 priyankachopra
RT @iamsrk: En route to Jalandhar for JTHJ...havent slept a wnk all nght...eyelds r feelng lke a mattress. Hope ts a long slow flt... 561 iamsrk
RT @taran_adarsh: Those asking, yes, all three Khans - SRK, Aamir & Salman - were under one roof for #JTHJ premiere. Premiere and po ... 493 taran_adarsh

In the next blog will try to interpret actual review of the film from Tag Cloud generated from these tweets.








Sunday, August 12, 2012

Satyameva Jayate - Twitter (Sentiment) Analysis


As we all know Social media is changing or influencing lot of the traditional communication mechanisms. Its providing more wide platform for people to share thoughts, advertise products and analyze user's feelings. For those close to technology, Social Media monitoring, Social Analytics, Social Engineering etc are buzzwords heard more often these days. Its hard for any body to ignore Social Media now. From my experience so far FMCG, Entertainment media are among the areas that has their presence more in Social Media. Its very common to see references to Twitter in print and electronic media these days.

Amir Khan's completed his Satyameva Jayate's (SMJ) first season. keeping aside the sensitiveness of topics, accuracy of facts, its impact on society etc, SMJ is certainly one of the shows which is talked about more in Social Media. Thought of using my experience in Social Analytics to present some of the analysis on SMJ from twitter feeds. I will start with basic analysis in this post.

First lets see the tweet density, the number of related tweets during the 13 week time. The peaks that you observe are, as expected, are on the days the show telecast.Overall trend line show little decline in tweet. Also few episodes in between are less popular. These are when topics like domestic voilance, use fertilizers etc discussed. Alcohol Abuse, Aging parents, Water conservation topics got some increased attention on twitter. Of course, this might reflect only immediate reaction to show. Will try to present detailed analysis by presenting average week analysis and topic wise analysis in future posts. Next lets see the Sentiment Analysis.
Lets first look at the Positive trend, that is the percentage of the total tweets with positive sentiment over period of time. Adjacent graph depicts the positive sentiment. Overall trend line shows a little decline in positive sentiment. From initial observation it looks tweets during episode on medicine bear more positive sentiment compared to other episodes. Again, this graph reflects overall mood of the user in general. More deeper analysis, like feature based Sentiment Analysis, is required to capture exact mood of the audience. It will answer questions like, is this positive sentiment during healthcare episode is towards Amir's views or towards Health care industry? 

Finally lets look at Negative trend graph. Graph on left hand side shows the same. As observed the overall trend line shows increase in Negative trend. As mentioned earlier will provide deeper analysis in future posts. Main intention of this post is not to do postmortem of SMJ but to show some applications of Social Analytics with an interesting and relevant case study.

Before closing, a word about data collection and analysis techniques. Data was collected with Twitter Search API using keywords. Of the keywords used SMJ, Satyameva Jayate, Amir Khan, SatyamevJayate, smjindia are the common keywords used for all episodes and episode specific keywords like AlcoholAbuse, dignity4all etc based are used. I wont rule out possibility of missing some tweets partly because of timing of search and partly because of accuracy of Search API itself. I generally executed Search program couple of times on Sunday and once during week. For Sentiment Analysis used Sentistrength Algorithm (http://sentistrength.wlv.ac.uk/) with modified Lexicon. Again the accuracy of this Sentiment Analysis might not be high. I expect this to be around 65%. In fact from our experience so far this is reasonable accuracy especially for Social Media data which doesn't follow language grammar. For more information on Sentiment analysis of Twitter feeds look at this white paper I co-authored - http://t.co/aXAv7aly.