Home » Research » Are Twitter predictions a result of researchers expectations?

Are Twitter predictions a result of researchers expectations?

In the last years, several researchers showed that Twitter data can be used to predict real-world events, like earthquakes [1], the development of stock-market indicators [2], the outcome of political elections [3], the spread of diseases  [4] or movie box-office sales [5]. Indeed studies provide some promising results that Twitter data can be successfully used for predictions, however, recently several researchers questioned both the predictive power of twitter and applied research methods [6, 7].

It seems there are several challenges which make it hard to verify whether and how well proposed methods actually work:

  • It is expensive to obtain historic Twitter data therefore experiments can not be repeated under same conditions
  • A multitude of decisions have to be taken during data collection (Which API is used?, Which keywords or filtering criteria are used? Which time period is captured?) often these decisions are not sufficiently documented which make it hard to repeat experiments and to apply the method in different settings
  • Many of proposed methods require a predefined list of keywords to filter tweets (e.g. “flu”, “cough”, “H1N1″ … if you want to track a disease) however it’s not quite clear how to compile these lists, so methods rely on the ability of the researcher to define such lists and it is difficult to apply methods in a different context, e.g. countries with a different language.

Given this multitude of decisions and predefined knowledge that is required to conduct the experiments combined with the difficulty to repeat experiments for other researchers, it seems in Twitter prediction research could be at risk to be influenced by the observer-expectancy effect, which means that the researcher subconciously effects the research result.

Or as David Hand wrote, in other words:

“It is quite possible that the most interesting patterns we discover during a data mining exercise will have resulted from measurement inaccuracies, distorted samples or some other unsuspected difference between the reality of the data and our perception of it.” [8]

My colleague Amal Almansour from Kings College in London and I, we were particularly interested into the decisions made during Twitter Prediction research, and we just finished a literature survey and cricially analyzed 24 existing Twitter Prediction studies. In this study, we identified the different actors involved in the typical Twitter research process and their potential impact on the prediction method and respectively the prediction result.

This study is currently in the peer-review process, results will be stated here soon.

Leave a Reply