Pinterest Google+

How can big data and data science help policy-making? This question has recently gained increasing attention. Both the European Commission and the White House have endorsed the use of data for evidence-based policy making.

Still, a gap remains between theory and practice. In this blog post, I make a number of recommendations for systematic development paths.

Research trends shaping Data for Policy

‘Data for policy’ as an academic field is still in its infancy. A typology of the field’s foci and research areas are summarised in the figure below.




Besides the ‘data for policy’ community, there are two important research trends shaping the field: 1) computational social science; and 2) the emergence of politicised social bots.

Computational social science (CSS) is an new interdisciplinary research trend in social science, which tries to transform advances in big data and data science into research methodologies for understanding, explaining and predicting underlying social phenomena.

Social science has a long tradition of using computational and agent-based modelling approaches (e.g. Schelling’s Model of Segregation), but the new challenge is to feed real-life, and sometimes even real-time information into those systems to get gain rapid insights into the validity of research hypotheses.

For example, one could use mobile phone call records to assess the acculturation processes of different communities. Such a project would involve translating different acculturation theories into computational models, researching the ethical and legal issues inherent in using mobile phone data and developing a vision for generating policy recommendations and new research hypothesis from the analysis.

Politicised social bots are also beginning to make their mark. In 2011, DARPA solicited research proposals dealing with social media in strategic communication. The term ‘political bot’ was not used, but the expected results left no doubt about the goals:

The general goal of the Social Media in Strategic Communication program is to develop a new science of social networks built on an emerging technology base.  In particular, it will develop automated and semi‐automated operator support tools and techniques for the systematic and methodical use of social media at data scale and in a timely fashion to accomplish four specific program goals:

  1. Detect, classify, measure and track the (a) formation, development and spread of ideas and concepts (memes), and (b) purposeful or deceptive messaging and misinformation.
  2. Recognize persuasion campaign structures and influence operations across social media sites and communities.
  3. Identify participants and intent, and measure effects of persuasion campaigns.
  4. Counter messaging of detected adversary influence operations.

The request for proposals might have been ahead of its time, but five years later, the project politicalbots.org emerged. Just this year, it held its first workshop on ‘Algorithms, Automation and Politics’ in Fukuoka, Japan.

The Gap between data-driven policy making in theory and action

Regardless of these important developments, a gap remains between theory and practice. I propose three approaches for addressing that divide:

  • Using examples from different public sector fields;
  • Leveraging frameworks of thinking from the information visualisation research community;
  • Adopting the Capgemini/MIT roadmap of digital transformation for billion-dollar organisations.

Nanos gigantum humeris insidentes

Analysing successful applications of data-for-policy in various governmental sectors offers new insights into possible new data sources, quantitative methods for analysing them, and their connection with policy design. Data4policy.org is an evolving web collection of articles and research papers about using data science and big data in the public sector. Its contents are categorised by governmental sector.

With every paper, it is important to understand the exact application, connection to policy, stakeholders, type of data (see Coombs’ classic Theory of Data) and method, i.e. which quantitative method is used and whether it’s supervised or unsupervised learning methodology.

Information visualisation and visual analytics

The information visualisation research community has published several very useful studies involving social media analytics, data mining, machine learning and data science which may help social scientists implement these approaches in their work (e.g. Thomas and Cook 2005, Amar et al. 2005).

An interested reader can gain a lot of information from visualised data (Amar et al. 2005). It facilitates many ‘atomic tasks’, such as retrieving value, filtering information, finding extrema, sorting, determining the range, find anomalies and others.

Such visualisations can be made even more effective when evaluated against four key elements (Stasko 2014):

  • Ability to minimise the total time needed to answer a wide variety of questions about the data;
  • Ability to spur and discover insights and/or insightful questions about the data;
  • Ability to convey an overall essence or take-away sense of the data;
  • Ability to generate confidence, knowledge, and trust about the data, its domain and context.

If we use this framework, we can better understand what is missing and discern how to make one’s data tell a better story.

Digital Transformation: A roadmap for billion-dollar organisations

The MIT Center for Digital Business and Capgemini Consulting outlined three main building blocks for digital transformation innovation (Westerman et al, 2011):

  • Customer experience and customer touch points;
  • Operational process and performance improvement;
  • New digital business or digitally-modified business.

With some small modifications, the MIT/Gapgemini digital transformation framework can be easily applied to data for policy. Doing so could offer guidance on how to best pursue novel data collection, storage, analysis and visualisations in public sector organisations.

Business process engineering, management and discovery are frequently deployed in large corporations, but rarely used in public sector. Manual exploration and mapping current “as is” processes or workflows can be a tedious challenge. By logging every action or even a click in an IT system, business process mining and discovery tools offer automatic “as is” process and workflow discovery. They give an overview of cases and documents flowing throughout an organization and enable the detection of bottlenecks and the re-engineering of procedures and workflow, if necessary.

The following visualisation of interdependencies between business sectors has been constructed using classic Wassily Leontief’s input-output tables, but one could imagine the value of the economic insights generated if such dynamic graphs could be created using near real-time data without requiring classical data collection, processing and presentation by national statistics offices (see Diane Coyle’s (2015) new book on GDP calculation history and future for a longer description of this transformation and trend).

Seeing the near real-time visualisation of the flow of economy and the relations between sectors could offer wide range of benefits for policy-makers. Selecting the right level of detail and granularity is a difficult ethical question for society. If we can see that one sector has an increased risk of suffering financially, could we target funding and financial help towards it? Could we do the same on individual company level, when we know that the network effect (propagation and diffusion of new created value) would be optimised through such targeted support? There are no straightforward answers to such questions, but the potential benefits will fuel future discussions, which could hopefully lead to societal agreement about data for policy.

The state must continue to embrace the digital age. Such an approach would be a path towards more efficient public services and constant monitoring of quality. It also enables the automatic detection of new ways citizens are using services, when legal or business environment is changing, giving an early warning when a software would start to become a legacy system amid changes required by society.

Construction permits, social benefits applications, or divorce filings are only few examples where citizens need to visit different government agencies for something that is ‘one and the same thing’ in their minds. Government one-stop-shop initiatives are one approach to this problem, but the data infrastructure has until now been inadequate to do so effectively.

The next wave of e-government innovation

The next wave of e-government innovation will be about analytics and predictive models.  Taking advantage of their potential for social impact will require a solid foundation of e-government infrastructure.

The most important questions going forward are as follows:

  • What are the relevant new data sources?
  • How can we use them?
  • What should we do with the information? Who cares? Which political decisions need faster information from novel sources? Do we need faster information? Does it come with unanticipated risks?

These questions barely scratch the surface, because the complex interplay between general advancements of computational social science and hovering satellite topics like political bots will have an enormous impact on research and using data for policy. But, it’s an important start.


Amar, R., Eagan, J., and Stasko, J. “Low-level components of analytic activity in information visualization.” IEEE Symposium on Information Visualization, 2005. INFOVIS 2005.. IEEE, 2005.

Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., Loreto, V., Moat, S., Nadal, J.P., Sanchez, A. and Nowak, A., Manifesto of computational social science. The European Physical Journal Special Topics, 214(1), pp.325-346, 2012.

Coyle, D.. GDP: A brief but affectionate history. Princeton University Press, 2015.

Letouzé, E. “Big Data for Development: Challenges and Opportunities. ” UN Global Pulse, 2012

Letouzé, E., Vinck P. “The law, politics, and ethics of cell phone data analytics.” Data-Pop Alliance, The World Bank Group (2015).

Poel, M.,  Schroeder, R., Blackman, C., “Data for Policy: A study of big data and other innovative data-driven approaches for evidence-informed policymaking”, 2015

Stasko, J. “Value-driven evaluation of visualizations.” Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization. ACM, 2014.

Thomas, J. J., Cook, K. A. “Illuminating the Path: The Research and Development Agenda for Visual Analytics” Los Alamitos, CA, USA: IEEE, 2005.

Westerman, G., Calméjane, C., Bonnet, D., Ferraris, P., and McAfee, A. “Digital Transformation: A Roadmap for Billion-Dollar Organizations.” MIT Center for Digital Business and Capgemini Consulting, 2011.



Previous post

New Challenges - Is the Taliban in Transition?

Next post

Why The Russia-Azerbaijan Alliance Is Weaker Than It Looks