How Should Government Agencies Regulate Data Science?


In May of this year, The United Kingdoms Government Digital Service released a paper titled, the “Data Science Ethical Framework.”  Although it is not a legally binding document, the paper was designed as a guide for government employees, and is a template for the appropriate and ethical uses of data science.  It gives government employees the confidence to use innovative data science methods, and the tools to avoid ethically questionable projects.

In the paper’s introduction, the United Kingdom’s Minister of State for Digital and Culture, Matt Hancock, said that, “Data science carries both huge opportunities and a duty of care.” He continued, This document isbringing together the relevant law in the context of new technology, and prompting consideration of public reaction so that government data scientists and policymakers can be confident to innovate appropriately with data.

The paper outlined six guiding principles for the use of data science in the public sector:

  1. Start with clear user need and public benefit
  2. Use data and tools which have the minimum intrusion necessary
  3. Create robust data science models
  4. Be alert to public perceptions
  5. Be as open and accountable as possible
  6. Keep data secure

The paper then followed with instances of where these principles were and were not used. For example, to demonstrate the importance of creating robust data science models, an analysis was shown of Hurricane Sandy’s impact on the states of New York and New Jersey.  If one were to simply analyze the data collected via social media, one would come to the conclusion that the epicenter of the damage was in the borough of Manhattan.  However, the damage in neighborhoods such as Coney Island, Breezy Point, and Far Rockaway was much more severe, but this was not reflected on social media, because of less social media chatter concerning these areas.

The United States is facing a similar question as to how data sets can and should be used.  There is presently a debate within the criminal justice system concerning whether or not data should be used as a way of determining sentencing guidelines for convicted criminals. And similar concerns about the robustness of data sets are at play in this situation as well.

In 2014, Attorney General Eric Holder said regarding these systems, “Although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice. They may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society”

The “Data Science Ethical Framework” was specifically set to address laws already in place in the United Kingdom, and so the paper cannot be used as a universal rulebook for data scientists.  But the paper does set a precedent that government agencies should have some role in mitigating advancements in data science.  In an interview with Science Friday, DJ Patil, the Chief Data Scientist at the White House’s Office of Science and Technology Policy, called for every data science training program to include data ethics as a core tenet of data science.

Patil continued, “When we do work with data, you have incredible opportunities to do great things with it, and you also have the ability to do something that could be very problematic. We’re seeing where people have used data in ways that we think are fundamentally not okay. People have started to talk about this and what we should do about it. I think we have to have much stronger conversation. Privacy components are equally important.”