The Data Revolution is Only Beginning

13 August 2021

Steve MacFeely, Director of Data and Analytics, World Health Organization

What distinguishes revolution from evolution? What events or movements in the world of data have been sufficiently disruptive or transformational to deserve being called revolutionary? Although they might seem like abstract questions, when I hear everyone talking about a “Data Revolution”, they are the ones I find myself asking.

Why is everyone talking about a Data Revolution?

Probably because in 2013 the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda called for a Data Revolution to exploit the opportunities presented by the new data landscape. Saying the idea caught the world’s imagination would be an understatement; the idea of a data revolution seemed to capture the zeitgeist perfectly. In retrospect, it seems as if the term ‘Data Revolution’ had just been waiting for someone to say it aloud. Almost immediately after the words had been uttered, the term was adopted in public discourse and diplomatic declarations.

What is a Data Revolution?

I will try to tackle this question from the perspective of a statistician, although it is not an easy question to answer. For an event or movement to be revolutionary, as opposed to evolutionary, it must be in some way disruptive or transformational. So, the question is: What events or movements in the world of official statistics have been sufficiently disruptive or transformational to deserve being called revolutionary?

Is the emergence of big data sufficiently transformational to qualify as a revolution? Or what about the Open Data movement – is that disruptive enough? Is the data revolution about new partnerships between national statistical offices and civil society or citizen science? Or is something more profound, like a deepening of human capital and statistical literacy, required to justify the term?

The data revolution, if indeed there has been one, is a curious revolution. Notwithstanding the chatter about big data and data analytics, there has been no obvious coup d'état, no shouting or marches on the streets, no Viva la Revolución, no data riots. But that does not mean there hasn’t been one! Perhaps, like many scientific revolutions before it, the data revolution has been a silent one, and one where the implications may not be fully understood until it is too late.

The research question

I set out to investigate the questions above in my paper ‘In Search of the Data Revolution: Has the Official Statistics Paradigm Shifted?’. There are so many different data ecosystems, and the data revolution is such a huge topic. Therefore, I limited my investigation to official statistics. But even doing that, the scope or canvas was still enormous. Using definitions provided by the Independent Expert Advisory Group on a Data Revolution for Sustainable Development in its report ‘A World that Counts’, potential data revolutions were identified. Using a framework derived from Thomas Kuhn’s work ‘The Structure of Scientific Revolutions’, these revolutions were evaluated.

The paper looks at multiple data revolutions: the data privacy revolution; the open data revolution; big data revolution; and the social data revolution, to name a few. A surprise coming from this investigation is that the term data revolution, fittingly, can be traced all the way back to the 1960’s. Not only is the term Data Revolution not new, the meaning of term hasn’t evolved that much, if at all. It is remarkable, in fact, how little has changed, raising the question of whether it is possible for a revolution to continue for 60 years?

Antecedents of the Data Revolution

The paper also examines the origins of the data revolution in the digital revolution, and how digitalisation has utterly changed the concept of data, from a narrow numeric viewpoint to a much broader concept that now comprises audio, visual and text information. This rescoping has led to an explosion of data, including the ‘paradigm destroying phenomena’ of big data, which has facilitated ‘correlation is enough’ algorithmic-based decisions. This Copernican shift in discovery and decision-making poses profound questions for us all.

The impact of secularization, the emergence of risk, and the two great historical tidal waves of industrialisation and empire on the growth in demand for data are examined. As are the ‘statistical revolutions’ that emerged from the Great Depression and World War II, which led to some of our most enduring statistical concepts and indicators, notably the development of national income, labour force and trade statistics. The connections between Taylorist performance metrics, New Public Management and evidence informed policy making is also discussed.[1]

Are data a public good?

Another dimension examined in the paper is the journey of official statistics from serving only the state to that of a public good, serving democracy and accountability. This progressive view is quite a recent development and was only formalised in 1992 when the Fundamental Principles of Official Statistics were first adopted at the Economic Commission for Europe Ministerial Conference. These Principles were subsequently endorsed by the United Nations General Assembly in 2014. In doing so, heads of state from around the world were explicitly saying that official statistics were a public good.

Almost 30 years later, that view is being challenged, as a new cold war for the ownership of our data is underway. Some see Open Data as the solution, and indeed it might be. But there are risks here, too, as most Open Data initiatives are drives to open government data only and this may inadvertently contribute to asymmetries in the treatment of public and private sector data. There are already imbalances in this regard: public data are classified as public goods whereas corporate data are classified as marketable and proprietary assets.

Is general confidentiality sustainable?

The data deluge has created a challenge for both privacy and confidentiality. So much so, one can’t help question whether privacy as an ‘ideal’ is alive and well, and whether privacy in ‘practice’ is on life-support. One of the biggest challenges for official statistics is how to protect the confidentiality of super large multinational enterprises. We are likely to hear a lot more about differential privacy and differential confidentiality in the coming years. But perhaps it is already too late; an unsuspecting public makes Faustian bargains every day under the illusion that by signing away their rights they have secured valuable discounts or better services.

Is the Data Revolution finished?

The short answer is no. We are only at the dawn of the data revolution. In keeping with Kuhn’s theory, new crises will emerge and some of these will spark new data revolutions. Who knows what these will be?[2] One can anticipate the loss of national data sovereignty will trigger a crisis as governments come to the realization that many data and data holders are beyond the reach of their national legal systems. The implication being that governments cannot enforce national laws to protect their citizens. Nor does it take much imagination to see how data property rights will be another lightening rod soon as this will be critical to future commercial and national security.

For official statistics, a more existential crisis may accompany the growing public animosity to evidence informed policy making, the weaponization of data and the growing willingness to believe mendacity and ‘fake news’. As already noted, protecting privacy is already technically challenging, but now the concept itself is being undermined by the captains of industry 4.0 who claim that the concept itself is extinct.

Some concluding thoughts

I argue that there has not been a single data revolution, but many. The Data Revolution is in fact a series of revolutions. Those revolutions are a function or consequence of other revolutions; digital, informational, cultural, and social. I also speculate that future crises will trigger new data revolutions.

The data revolution(s) as we now understand the term is inextricably linked to the SDGs. It began as an aspiration, a plea for better data but quickly transformed into a fact. Diplomats refer to it, not as a future state, but as the solution. But to realize the aspirations I (and other chief statisticians) argue that a Global Data Convention or Bills of Rights is needed to safely access and use data (and by extension statistics) while protecting the rights of citizens.

Such a convention will need to be global to address sovereignty issues. It must re-establish some social contract that strikes a balance between community dataveillance and individual and human rights, between security and privacy, between commerce and public good, between asymmetries in private and public openness, between data ownership and reward.

In an era of faltering multilateralism, it may be convenient to turn a blind eye, but given the importance of data to all our futures, the United Nations cannot ignore this challenge. Governments and the private sector cannot abdicate their responsibility either.

Note: The views expressed herein are those of the author and do not necessarily reflect the views of the United Nations.

[1] Taylorism is the system of scientific management advocated by Fredrick Taylor, where factory tasks should be monitored and measured to determine the most efficient routines and improve performance. New Public Management is an approach to running public service organizations that is used by government and the public service. This school of thinking emerged in the 1980’s in an effort to make the public service more "businesslike" and to improve its efficiency by using private sector management models.

[2] Kuhn, coined the phrase ‘paradigm shift’, arguing that scientific progress is nonlinear and usually arises in response to crises.

Search form