MIT Sloan
 

Research


Predicting the Popularity of Tweets

We developed a probabilistic model for the spread of an individual tweet in Twitter. By observing the times of the retweets of a tweet, we are able to predict the total number of retweets the tweet will receive.

This work is being done in collaboration with Emily Fox and Eric Bradlow.

The figure on the right shows predictions and the number of observed retweets versus time for the tweet "Standing in the field trying to see a 100 mill.. #MMG" by user rickyrozay which had 730 retweets. The green squares and error bars are the median and 90% credibility intervals of the predictions of the total number of retweets for each observation time.


Superstar Model

We developed the Superstar Model to describe the growth of the retweet network of a topic in Twitter. The model has two main features. First, it has a superstar vertex whose degree grows linearly in the network size. Second, the non-superstar vertices have degrees that grow sub-linearly in the network size and they have a degree distribution that exhibits a power-law behavior. Our model predicts a relationship between the superstar degree and the degree distribution of the non-superstar vertices. These predictions match well with real retweet networks in Twitter and model the degree distribution better than more well-known models such as preferential attachment.

This work is being done in collaboration with Mike Steele and Shankar Bhamidi.

The figure below shows the retweet graph for the topic "BET Awards" along with plots of its empirical degree distribution and predictions of the superstar model and preferential attachment.


Finding Rumor Sources

Imagine a rumor spreads in a network and all we know is the source. Then can we find the source using only the network structure? It turns out that we can and there is a network centrality we created called rumor centrality which does this. For regular tree networks under a certain rumor spreading model it is an exact maximum likelihood estimator for the source. We have proven that it performs well on essentially any sparse network, showing the rumor centrality is a universally good source estimator. Rumor centrality is also a rather interesting mathematical object, having connections with linear extensions of partial orders, random walks, Markov chains, and Nash equilibria of certain network games.

Related publications can be found here.

This work was done in collaboration with Devavrat Shah.

The figure on the right shows how rumor centrality finds the source. A simulated rumor was spread on this network, and the each node's normalized rumor centrality was calculated, with red=1 and blue = 0. As can be seen, rumor centrality narrows down the set of likely rumor sources. In this figure, the true source is the red node in the center.


Measuring Influence

Does the structure of a social network tell us how much influence people have? In Twitter, this is definitely true. To spread information in Twitter, people do what is known as retweeting. These retweets form topic specific, growing networks. To model these networks, we developed a new network growth model called topological network growth which grows a network using rumor centrality as an influence function. It turns out that this model accurately captures several important properties of the retweet networks in Twitter, such as a power-law degree distribution and the existence of a single very high-degree node which we call a superstar.

Motivated by this empirical evidence, we developed a dynamic influence tracking engine for Twitter called Trumor which is based upon rumor centrality. Trumor has received media coverage, with stories about it appearing in the MIT Tech Review and CBS Smart Planet.

This work was done in collaboration with Devavrat Shah and Ammar Ammar.

The figure on the right shows a screenshot of Trumor. One enters a query, a start date, and a stop date, and clicks the Trumor Search button. Trumor then returns all matching tweets in our database, with the tweets ranked by the Trumor score of their authors on the retweet network for the topic.


Learning Community Structure

What defines a community in a social network? Is it the community's leaders? Or is it the less important followers in a community? We have found that these followers actually provide the identifiable signal for a community in a network. These followers have no friends outside of their community, and in this way provide the community with an identity. Using this observation, we created the leader-follower algorithm (LFA) which finds communities in networks by searching for these followers. The LFA has linear run-time and so can be applied to incredibly large networks. It can find overlapping communities and learns the number of communities naturally from the network structure. We have also proven that the LFA has good performance on a wide class of networks. These strengths give the LFA and advantage over other community detection methods such as spectral clustering or inference based methods.

Related publicatiosn can be found here.

This work is done in collaboration with Devavrat Shah.

The figure on the right shows the communities found by the LFA in my own Facebook network, with appropriate social labels for each community found.