Sunday, November 18, 2012

Jab Tak Hai Jaan - Twitter Metrics


One of the topic that attracts good twitter traffic in India is movies. And if the movie is a much awaited Diwali release from Super star, then its a simple expectation to have a good twitter buzz. And on its release day Jab Tak Hai Jaan was a trending topic. I did collected the tweets related and in this blog will present some metrics. Some might call this Analysis but I prefer metrics/report because no Statistical or NLP algorithms involved to derive these results.

Data was from 11/12/2012 18:33 to 11/14/2012 12:15. Twitter Steaming API is used to collect the data and the keywords that were tracked are "JTHJ", "Jab Tak Hai Jaan" and "SRK". Not much of data cleaning applied as these keywords are not so common. Total tweets collected are 104,654 of these Retweets are 31177 (30%).
First lets see simple report, the tweet time line. The graph shows the tweets trending. The value on x-axis is dd/yy HH. Around 3'O Clock in afternoon of 13th Nov is the peak.

Below table shows top sources for traffic with re-tweet segregation. I expected percentage of re-tweets will be more for mobile devices. But my assumption seems to be wrong from below data. Surprisingly, twitter for Blackberry is more popular than iphone though the margin is less. This data also signifies the dominance or abundance of Android.


Source Total tweets Original Original% Retweets Retweets%
web 38043 24635 64.76 13408 35.24
mobile web 14813 10382 70.09 4431 29.91
twitter for android 14359 9285 64.66 5074 35.34
twitter for blackberryâ® 8828 6698 75.87 2130 24.13
twitter for iphone 8035 5287 65.80 2748 34.20
facebook 4682 4658 99.49 24 0.51
tweetdeck 1490 1069 71.74 421 28.26
tweet button 1429 1398 97.83 31 2.17
twitter for ipad 1276 807 63.24 469 36.76
ubersocial for blackberry 1114 935 83.93 179 16.07
tweetcaster for android 1103 774 70.17 329 29.83

Most of the tweets with triple figure  re-tweet count are from celebs. Of the top 10 3 are from Sharukh, 3 are from reviewers (like taran adarsh). One from Yash raj films also in top 10. Looks the movie team engaged themselves well in Social Media, which I feel is very important for successful social media marketing. Below are top 5 tweets with max re-tweets.


Tweet Retweet Count Author
RT @iamsrk: Very touchng how all of u r buyng tckets for jthj and excted about t. It makes me feel very wanted & postvely pro ... 1221 iamsrk
RT @bipsluvurself: Jab Tak Hai Jaan mkes u wnt 2 find ur soulmate ur true love.Infuses romance in2 ur life.A must see!Saluting the King  ... 641 bipsluvurself
RT @priyankachopra: As everyone comes together to Celebrate Yashji tonight..I sit far away and thank him for touching all our lives..wis ... 567 priyankachopra
RT @iamsrk: En route to Jalandhar for JTHJ...havent slept a wnk all nght...eyelds r feelng lke a mattress. Hope ts a long slow flt... 561 iamsrk
RT @taran_adarsh: Those asking, yes, all three Khans - SRK, Aamir & Salman - were under one roof for #JTHJ premiere. Premiere and po ... 493 taran_adarsh

In the next blog will try to interpret actual review of the film from Tag Cloud generated from these tweets.








Tuesday, November 13, 2012

otter4java - Java wrapper for Topsy's Otter API

Finally today I uploaded my first version of code for long back started my open source project... otter4java. Java wrapper for Topsy's Otter API. In this version I included only search call. Will include other endpoints in future versions. 

Also included in this current version is integration with Crawler. We can crawl the links from the results from search to get the contents. Output is stored as xml files (one for each result) with URL and content as fields. I used crawler4j for this.

Link to the code : http://code.google.com/p/otter4java/

Sunday, November 11, 2012

Social Influence Ranking - My take

I read some where that a celebrity is paid good money if tweeted about a product.The concept "Social Influence" is becoming prominent thanks to Social. There are quite a few sites that calculate influence scores.
Most of these methods assign either a score or level to the Social ID. But how efficient is this to use in Social Marketing? 
Lets consider ID of a popular Sports Star. He might have got millions of fans and his tweets might be re-tweeted numerous times. He might also be re-tweeting or replying to tweets. This will result in good Influence score for him. Now if a pesticide company wants to use Social channel, they might as well get influenced by the influence score and get this sports star tweet about their product. Will this of any help. In most cases no! because most of the people in his circle will be interested in sports and with little or no interest towards pesticides.
Leave aside marketing, will this score help me find whom to follow in my area of interest?I doubt, because they give high level score but doesn't say in which field they are influential. I remember Klout API does give some description about topics the person generally talk about but still it wont assign score against topic.I feel the scoring should ideally against topics and not just for ID. For example the above Sports star ID will have a high score for sports and low in politics and agriculture.I am not sure if there is any tool/site that provide this information. So far I haven't came across any. There are sites that can extract topics from feeds/data. We need this to be incorporated into Social Influence ranking algorithms for better use.

Sunday, August 12, 2012

Satyameva Jayate - Twitter (Sentiment) Analysis


As we all know Social media is changing or influencing lot of the traditional communication mechanisms. Its providing more wide platform for people to share thoughts, advertise products and analyze user's feelings. For those close to technology, Social Media monitoring, Social Analytics, Social Engineering etc are buzzwords heard more often these days. Its hard for any body to ignore Social Media now. From my experience so far FMCG, Entertainment media are among the areas that has their presence more in Social Media. Its very common to see references to Twitter in print and electronic media these days.

Amir Khan's completed his Satyameva Jayate's (SMJ) first season. keeping aside the sensitiveness of topics, accuracy of facts, its impact on society etc, SMJ is certainly one of the shows which is talked about more in Social Media. Thought of using my experience in Social Analytics to present some of the analysis on SMJ from twitter feeds. I will start with basic analysis in this post.

First lets see the tweet density, the number of related tweets during the 13 week time. The peaks that you observe are, as expected, are on the days the show telecast.Overall trend line show little decline in tweet. Also few episodes in between are less popular. These are when topics like domestic voilance, use fertilizers etc discussed. Alcohol Abuse, Aging parents, Water conservation topics got some increased attention on twitter. Of course, this might reflect only immediate reaction to show. Will try to present detailed analysis by presenting average week analysis and topic wise analysis in future posts. Next lets see the Sentiment Analysis.
Lets first look at the Positive trend, that is the percentage of the total tweets with positive sentiment over period of time. Adjacent graph depicts the positive sentiment. Overall trend line shows a little decline in positive sentiment. From initial observation it looks tweets during episode on medicine bear more positive sentiment compared to other episodes. Again, this graph reflects overall mood of the user in general. More deeper analysis, like feature based Sentiment Analysis, is required to capture exact mood of the audience. It will answer questions like, is this positive sentiment during healthcare episode is towards Amir's views or towards Health care industry? 

Finally lets look at Negative trend graph. Graph on left hand side shows the same. As observed the overall trend line shows increase in Negative trend. As mentioned earlier will provide deeper analysis in future posts. Main intention of this post is not to do postmortem of SMJ but to show some applications of Social Analytics with an interesting and relevant case study.

Before closing, a word about data collection and analysis techniques. Data was collected with Twitter Search API using keywords. Of the keywords used SMJ, Satyameva Jayate, Amir Khan, SatyamevJayate, smjindia are the common keywords used for all episodes and episode specific keywords like AlcoholAbuse, dignity4all etc based are used. I wont rule out possibility of missing some tweets partly because of timing of search and partly because of accuracy of Search API itself. I generally executed Search program couple of times on Sunday and once during week. For Sentiment Analysis used Sentistrength Algorithm (http://sentistrength.wlv.ac.uk/) with modified Lexicon. Again the accuracy of this Sentiment Analysis might not be high. I expect this to be around 65%. In fact from our experience so far this is reasonable accuracy especially for Social Media data which doesn't follow language grammar. For more information on Sentiment analysis of Twitter feeds look at this white paper I co-authored - http://t.co/aXAv7aly.