Maruti Gollapudi's Blog

Friday, February 7, 2014

Issue with Facebook Graph API

Facing an issue with Facebook API. Looking out for help to understand the reasons and resolve the same
We are trying get public wall messages containing a particular keyword.We created application with permissions including read_stream and manage_posts. Application visibility is public and most of other settings are defaults. We could see API working and getting the feeds. Last week we observed an issue. Search API is not picking up messages from certain users. We picked up couple of such ids did some testing. Privacy settings all seems to be fine. Posts visibility settings are public both at feed level and account leven. Platform is turned on. Visibility to search engines is on. For these selected IDs we tried to get wall feeds using their IDs. Below are the observations.

1. For a certain ID we could get all messages both in search and wall retrieval. Both with our application key and default Graph API Explorer access key.
2. For one ID we for messages using Graph API Explorer but not with our application. With our application only feeds we see response are friend request accepts and likes to some other public posts.
3. For another ID Graph API explorer is giving empty results and out application key is giving likes etc as observed in above case.
4. For one of the Use we asked him to turn off the Platform, we were getting messages before. When turned off we stopped getting messages. When turned on we didn't get all messages but only likes etc.
5. When we use Social Mention (socialmention.com) and search for keyword, we could see all messages.

We feel it has to do something with the Application and access key permissions but couldn't get much details. Any help in cracking this is welcome.

Update (May 05, 2014):
The issue is with the way access token generated.

While generating access key we can select permissions to match application requirements. Any thing beyond basic permissions require permission from User. For example if we choose user_location permission, user need to provide permission to get that information. If no additional permissions selected, application will not require any permission and can access all the public data. But if any additional permissions are selected and user not given permission, we cannot get even the public data of that user. With default permissions I generated access key and could get all public data as expected.

Monday, January 6, 2014

"Apache Flume: Distributed Log Collection for Hadoop" - Book review

We have been working on Apache Flume for quite some time now. We used it to load data from Social network into MongoDB and also for log collection. Recently I read a book on Apache Flume titled "Apache Flume: Distributed Log Collection for Hadoop".

This is a good starter material for a serious Flume developer. 2nd Chapter provides a good step by step guide to setup and get running with Flume. Liked the way the flow is presented and the mention about important arguments that can be provided to Flume command line. Other useful information from this book is about the Monitoring tools. Though the description is brief good to have introduction to some tools. Overall book gives good details with examples on Flume flow and architecture which includes Channels, agents, sinks, interceptors etc. It would have been more helpful if last two chapters were elaborated more.

Tuesday, July 9, 2013

Modes of Big Data Analysis

We can look at analysis in three modes based on trigger for analysis.

Offline/Batch Mode

Analytics performed and results are made available for applications to use
Ex: Clinical Trails, Voice of Customer

Real Time – OnDemand

Analysis done and results are presented when requested.
Ex: Up-sell/Cross-sell

Real Time – Stream based

Monitor streaming data (Twitter messages, Transaction logs, data from Sensors) and trigger analysis based on event/data.
Ex: Monitor and analyze online transactions for Fraud, Monitor social media messages for serious incidents.

And below are the implementation approaches:

Massive Parallel Programming (Data Bases and Programming)Hadoop MapReduce
Scalable Database – NoSQL databases and Databases with ability to store huge data (Ex Oracle ExaData) and to perform operations on data.
In-memory Analytics - an approach to querying data when it resides in random access memory (RAM), as opposed to querying data that is stored on physical disks.
Big Data Appliance - combination hardware and software products designed specifically for analytical processing.
Processing in Memory (PIM) - a chip architecture in which the processor is integrated into a memory chip to reduce latency.
In-Database Analytics - a technology that allows data processing to be conducted within the database by building analytic logic into the database itself.
Real-time Stream Processing & CEP

Combination of above approaches need to implement the Analytic Apps

Almost 2 years back, for couple of months, I had my first stint with Big data and Hadoop before moving on to Social Analytics. As I resumed my interest into Big data I was looking at my old work and above are from one of my early presentations.

Sunday, January 6, 2013

Klout4Java : Java wrapper for Klout partner API

One more contribution for Open source. Uploaded klout4java, a Java wrapper for Klout Partner API v2. I had calls to klout API in our Social Analytic platform. As v1 of klout API getting deprecated, I had to upgrade and I couldn't find any Java wrapper for v2. This lead me to write a wrapper for myself and I uploaded the same for community use.

The link to klout4java http://code.google.com/p/klout4java/

Sunday, November 18, 2012

Jab Tak Hai Jaan - Twitter Metrics

One of the topic that attracts good twitter traffic in India is movies. And if the movie is a much awaited Diwali release from Super star, then its a simple expectation to have a good twitter buzz. And on its release day Jab Tak Hai Jaan was a trending topic. I did collected the tweets related and in this blog will present some metrics. Some might call this Analysis but I prefer metrics/report because no Statistical or NLP algorithms involved to derive these results.

Data was from 11/12/2012 18:33 to 11/14/2012 12:15. Twitter Steaming API is used to collect the data and the keywords that were tracked are "JTHJ", "Jab Tak Hai Jaan" and "SRK". Not much of data cleaning applied as these keywords are not so common. Total tweets collected are 104,654 of these Retweets are 31177 (30%).

First lets see simple report, the tweet time line. The graph shows the tweets trending. The value on x-axis is dd/yy HH. Around 3'O Clock in afternoon of 13th Nov is the peak.

Below table shows top sources for traffic with re-tweet segregation. I expected percentage of re-tweets will be more for mobile devices. But my assumption seems to be wrong from below data. Surprisingly, twitter for Blackberry is more popular than iphone though the margin is less. This data also signifies the dominance or abundance of Android.

Source	Total tweets	Original	Original%	Retweets	Retweets%
web	38043	24635	64.76	13408	35.24
mobile web	14813	10382	70.09	4431	29.91
twitter for android	14359	9285	64.66	5074	35.34
twitter for blackberryâ®	8828	6698	75.87	2130	24.13
twitter for iphone	8035	5287	65.80	2748	34.20
facebook	4682	4658	99.49	24	0.51
tweetdeck	1490	1069	71.74	421	28.26
tweet button	1429	1398	97.83	31	2.17
twitter for ipad	1276	807	63.24	469	36.76
ubersocial for blackberry	1114	935	83.93	179	16.07
tweetcaster for android	1103	774	70.17	329	29.83

Most of the tweets with triple figure re-tweet count are from celebs. Of the top 10 3 are from Sharukh, 3 are from reviewers (like taran adarsh). One from Yash raj films also in top 10. Looks the movie team engaged themselves well in Social Media, which I feel is very important for successful social media marketing. Below are top 5 tweets with max re-tweets.

Tweet	Retweet Count	Author
RT @iamsrk: Very touchng how all of u r buyng tckets for jthj and excted about t. It makes me feel very wanted & postvely pro ...	1221	iamsrk
RT @bipsluvurself: Jab Tak Hai Jaan mkes u wnt 2 find ur soulmate ur true love.Infuses romance in2 ur life.A must see!Saluting the King ...	641	bipsluvurself
RT @priyankachopra: As everyone comes together to Celebrate Yashji tonight..I sit far away and thank him for touching all our lives..wis ...	567	priyankachopra
RT @iamsrk: En route to Jalandhar for JTHJ...havent slept a wnk all nght...eyelds r feelng lke a mattress. Hope ts a long slow flt...	561	iamsrk
RT @taran_adarsh: Those asking, yes, all three Khans - SRK, Aamir & Salman - were under one roof for #JTHJ premiere. Premiere and po ...	493	taran_adarsh

In the next blog will try to interpret actual review of the film from Tag Cloud generated from these tweets.

Tuesday, November 13, 2012

otter4java - Java wrapper for Topsy's Otter API

Finally today I uploaded my first version of code for long back started my open source project... otter4java. Java wrapper for Topsy's Otter API. In this version I included only search call. Will include other endpoints in future versions.

Also included in this current version is integration with Crawler. We can crawl the links from the results from search to get the contents. Output is stored as xml files (one for each result) with URL and content as fields. I used crawler4j for this.

Link to the code : http://code.google.com/p/otter4java/

Sunday, November 11, 2012

Social Influence Ranking - My take

I read some where that a celebrity is paid good money if tweeted about a product.The concept "Social Influence" is becoming prominent thanks to Social. There are quite a few sites that calculate influence scores.
Most of these methods assign either a score or level to the Social ID. But how efficient is this to use in Social Marketing?
Lets consider ID of a popular Sports Star. He might have got millions of fans and his tweets might be re-tweeted numerous times. He might also be re-tweeting or replying to tweets. This will result in good Influence score for him. Now if a pesticide company wants to use Social channel, they might as well get influenced by the influence score and get this sports star tweet about their product. Will this of any help. In most cases no! because most of the people in his circle will be interested in sports and with little or no interest towards pesticides.
Leave aside marketing, will this score help me find whom to follow in my area of interest?I doubt, because they give high level score but doesn't say in which field they are influential. I remember Klout API does give some description about topics the person generally talk about but still it wont assign score against topic.I feel the scoring should ideally against topics and not just for ID. For example the above Sports star ID will have a high score for sports and low in politics and agriculture.I am not sure if there is any tool/site that provide this information. So far I haven't came across any. There are sites that can extract topics from feeds/data. We need this to be incorporated into Social Influence ranking algorithms for better use.