Data Privacy and Protection Assessments in Radio Mining

12 April 2021

By Stefan Lemm, former Head, Military Operations Unit, OICT

Data Privacy and Protection is one of the key aspects of the Secretary-General’s Data Strategy. It is clear that protecting people from indiscriminate data collection is important because it might result in data misuse, for instance, by businesses. But a potentially more dangerous scenario would be that data collected in good faith ends up in the hands of ill-meaning actors who intend to intimidate or even threaten the victim's personal safety.

When we speak about Data Privacy and Protection, the challenge is to understand how to assess the potential risks and harms associated with data collection and, subsequently, how to actively implement protection measures so that data can still be collected for the right cause while mitigating the risks.

My intention is to describe an approach taken by OICT and UN Global Pulse in a specific project for the United Nations Multidimensional Integrated Stabilization Mission in Mali (MINUSMA).
 

Mining Radio Data in MINUSMA

Team of data scientists in Pulse Lab Kampala developing speech-to-text technologies for indigenous languages. This technology is used to collect data for the Big Data Radio Mining and Analysis project in MINUSMA. Credit: UN Global Pulse.

The MINUSMA Big Data Radio Mining and Analysis project is based on a technology developed by UN Global Pulse. This technology collects and records public radio broadcasts, applies filters to remove music and commercials, and analyzes the content for specific keywords. If a specific keyword is found, the audio clip is provided to MINUSMA for further analysis.

So why should we care about radio data? Public talk radio is estimated to be the main source of information for approximately 80 percent of the Malian population – a similar percentage is found in many other countries in which peace operations are conducted. For this reason, it can be argued that radio data can be a very good source of information for understanding public sentiment and concerns. For example, imagine a discussion taking place on a local radio station about the lack of clean water in a village. If the development unit of MINUSMA was aware of this conversation, it could investigate further and potentially help the population find a solution.

So how did we approach the challenge of collecting, storing, and cross-checking data, and making that data available for analysis in way that respects data privacy? My partner in this project, LtCol Stefan Sander, and I are both German military staff officers who have temporarily worked for OICT in the UN Secretariat. Neither of us had previous experience in Data Privacy and Protection assessments. Fortunately, we were able to work with UN Global Pulse and used their Risks, Harms and Benefits Assessment Tool to ensure that the big data radio mining project took data privacy and protection into consideration.

UN Global Pulse's assessment tool (or checklist) outlines a set of minimum points to consider when embarking on a data innovation project. The tool is intended to help minimize risks and harms of a data project, and to maximize its positive impacts. The assessment includes six core sections:

  • Type of Data
  • Data Access and Data Use
  • Communication About the Project
  • Data Transfers
  • Risks and Harms
  • Final Assessment and Rationale for Decision.
For an estimated 80 percent of the Malian population, public talk radio functions as the main source of information. Credit: UN Global Pulse

In the section Type of Data we need to determine if there is intention to use (examples: collect, store, transmit, analyze, etc.) data that directly identifies individuals. Personal data directly relating to an identified or identifiable individual may include, for example, name, date of birth, gender, age, location, user name, phone number, email address, ID/social security number, IP address, device identifiers, and account numbers.

After defining the type of data at hand, the section Data Access and Data Use focuses on questions of legitimacy and fairness of data access and use, including the proportionality and necessity of data use, data retention, and data accuracy requirements.

The section Communication about the Project highlights transparency as a key factor in helping to ensure accountability and is generally encouraged. The use of personal data should be carried out with transparency to the data subjects, as appropriate and whenever possible.

Often, data related initiatives require collaboration with third-parties: data providers (to obtain data); data analytics companies (to assist with data analysis); and cloud or hosting companies (for computing and storage). In cases where collaboration is required, personal data should only be transferred to a third party that will afford appropriate protection for that data. It is therefore important that such potential collaborators are carefully chosen, through a proper due diligence vetting process that also includes minimum check points for data protection compliance, the presence of privacy policies, and fair and transparent data-related activities. The section Data Transfers asks whether the partners, if any, are compliant with at least as strict standards and basic principles regarding data privacy and data protection as outlined in this checklist.

In the section Risks and Harms, key questions asked are: does your use of data pose any risks of harm to individuals or groups of individuals, whether they can be directly identified, visible or known? Are there any steps that can be taken to mitigate these risks? Is the project likely to cause harm to individuals or groups of individuals, whether the individuals can be identified or known?

Based on answers provided for the first five sections, the section Final Assessment and Rationale for Decision looks at whether the risks and resulting harms are disproportionately high compared to the expected positive impacts of this project. Based on the assessment conducted, the project will implement technical and procedural measures to ensure data protection. These will include restricting access to the database to authorized users only, anonymization of names when they do not need to be identified, deletion of data as quickly as possible, and securing the data in every step of the process.

In conclusion, when collecting information from public sources, such as radio, it is crucial to determine potential risks and harms through a guided assessment. UN Global Pulse has developed a tool that supports this task and our team found it very helpful in collecting, storing, and analyzing data in a peacekeeping setting.

Note: The views expressed herein are those of the author and do not necessarily reflect the views of the United Nations.