[Cross-posted on http://thegovlab.org/data-prizes-and-challenges-as-data-collaboratives-terms-and-conditions/]
Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)
Over the last few months we have noticed increased discussion and activity around “data collaboratives” in which participants from different sectors — including private companies, research institutions, and government agencies — exchange data to help solve public problems. Efforts such as the Orange Data for Development Challenge, where private sector actors are exploring new ways to make data available to address societal challenges, are encouraging. In parallel with rising interest in such initiatives comes an increased need to consider how to share corporate data while mitigating internal and external risks.
There exist several methods to share corporate data including open API’s, data-enclaves and grand data challenges and prize-induces contests. The latter are unique as they allow a variety of actors to find new solutions using a shared dataset. Leveraging the power of the crowd, this method of open problem solving is particularly fitting for the big data space as it helps give back power to data subjects and their peers. It expands upon other efforts to leverage open innovation tools including prizes and challenges.
While the practice of prizes and challenges is becoming more common, little research and guidance exist on how to design and govern such efforts. For instance, there is no coherent legal framework informing how data prizes and challenges should be designed. Further, there is only a limited amount of research and best practice available that can be used as a foundation.
One way to deconstruct the governance framework of current data sharing practices through prizes and challenges is to analyze their terms and conditions. The value of dissecting the issues addressed in those terms and conditions goes beyond prizes and challenges, as they emerge as well in other data sharing arrangements.
In general, the purpose of terms and conditions is to set limitations on the use of goods and services of all sorts. The way they are formulated reflects the key concerns both parties have when engaging in a transaction. The core focus of most negotiations around terms and conditions tends to relate to finding a balance between openness and security. Analyzing how this balance is met in grand data challenges is important to understand how to accelerate corporate data sharing for public good.
Within the context of the Data Governance Project (‘DGP’, a collaboration between the The GovLab, Leiden University’s Peace Informatics Lab, and the World Economic Forum Data-Driven Development initiative) we analyzed a series of terms and conditions (see below the list of those analyzed) for both public and private data challenges. The analysis is conducted in coordination with UN Global Pulse, as an input for the upcoming Data for Climate Action Challenge that is scheduled to launch in December 2016 at COP21.
We identified the following 11 topics and provisions that frequently occurred across the terms and references analyzed, and that may guide the development of future terms and conditions:
We also identified three important omissions:
In what follows we highlight into more detail the different components that comprise existing terms of reference:
All terms and conditions analyzed have the following generic provisions in one or another form:
1. Defining actors and key terms
Of all the terms and conditions analyzed defining the stakeholders to the data collaborative is the first recurring step. As in any contract, the parties to the engagement must be clearly stated. Further, it must be clear what is meant by terms as ‘data’, ‘analysis’ and ‘outcomes’ as different interpretations of these concepts offer different potential benefits and entail different risks.
Yelp, for round six of its dataset challenge, refers to the definitions used in its general terms of service.
2. Stating eligibility requirements
In order to be eligible for participation in data challenges, applicants are usually subject to a number of varying requirements. For Nesta’s Open Data Challenge Series, ODI and Nesta employees are not allowed to enter the contest (art. 1.2) and the organizers reserve the right to bar anyone from entering at their ‘sole discretion’ (art. 1.7)
3. Determining intellectual property provisions
Besides the data ownership rights discussed further below, there is also the issue of intellectual property rights over the outcomes of corporate data used by the requesting party. Usually, these rights are reserved for the party that processes the data, but often a data providing company will require that it may use the results for its own purposes.
BBVA’s Open Innova Challenge, for example, required that participants grant BBVA a year-long non-exclusive license for internal use of the results, and broad further usage and adaptation rights. The New South Wales Government states explicitly that participants to their challenge retain intellectual property rights over material produced by the participant for entry into the competition.
4. Explaining liability
Corporations sharing data tend, to the extent possible, to absolve themselves from liability resulting from any damage or harm caused by the data they share.
The US Department of Transportation Data Challenge terms and conditions contain a section, for example, in which responsibility is excluded for various errors that might occur during the challenge.
Data-specific provisions and actions
In addition to generic provisions, the terms and conditions we analyzed contain provisions and actions that relate specifically to the data shared in the challenges.
5. Determining data ownership
Currently, most terms and conditions state that data ownership does not pass to the research teams accessing data under the agreement, implying that the corporate data provider holds ownership rights.
The terms and conditions for the Yelp data challenge contain an article that explicitly states that both the data and the results produced by participants during the challenge, will remain the “sole and exclusive property of Yelp”.
6. Guiding data handling
Prescribing the way data should be handled and the purposes for which use is allowed, is generally part of the terms and conditions of data challenges.
For the West Nile Virus Prediction Challenge, for example, it was explicitly stated that the data provided should only be used for the purposes of the contest and also that mixing the data provided with outside data was forbidden.
7. Expressing confidentiality
For most of the challenges examined, access to the data concerned is limited, and sharing data with third parties is not allowed.
BBVA specifically addresses “Confidentiality and handling data of a personal nature” in article 8 of the Innova challenge terms and conditions.
For the Orange Data for Development Challenge Senegal too, the data should be handled confidentially in perpetuity.
Finally, four types of administrative provisions are frequently included in the terms and conditions we analyzed.
8. Covering costs
In most of the challenges we studied, the terms and conditions state that costs related to the acquiring of the data and subsequent analysis, will reside with the participating research teams.
For example, in the Yelp dataset challenge, Yelp absolves itself from carrying any costs associated with the use of the data it provides, under art. 6.
9. Detailing release of outcomes
For almost all of the terms and conditions, the sharing corporation retains the right to decide whether and how the final outcomes of the challenge will be shared.
In the West Nile Virus Prediction Challenge, the organizer and the platform through which the challenge was hosted (Kaggle) reserve the right to “publicly disseminate any entries or models”, and prescribe open licensing of outcomes.
10. Determining jurisdiction for dispute settlement
Parties concluding a data sharing agreement will always assign a jurisdiction within which potential disputes will be settled.
For the Nesta’s Open Data Challenge Series challenge, it is stated explicitly that the terms and conditions shall be governed by the laws of England and Wales and fall under the excusive jurisdiction of the English Courts.
In the Orange Data for Development Challenge Senegal terms and conditions, Senegalese law is applicable, with the ‘Tribunal Régional de Dakar’ as the court that has jurisdiction over dispute settlement.
11. Consenting to acknowledgment
Consenting to public acknowledgment of the participants upon presentation of the results by the research team, and vice versa, is generally required.
In the Transportation.gov Data Innovation Challenge, participants were required to consent to U.S. Department of Transportation (the organize or the challenge) and its agents’ use of certain personal information submitted by the participant.
Some key issues that are central to the data sharing space were not addressed in the terms and conditions we analyzed. Publicly including these topics in the outcomes of dialogue between data provider and recipient would lead to a more holistic and legitimate approach to data sharing.
12. Expressing the value proposition of sharing
In our work with the Data Governance Project, one of the key notions is that data-driven work should be guided by a clearly stated value proposition, especially when there are risks involved in using the data. Although the general aims of a challenge are usually formulated in the terms and conditions, the intent of a specific project taking part in the challenge was not typically required in the terms & conditions we analyzed.
13. Untangling complex issues regarding data rights
Where the ownership rights to (personal) data reside, has become a contentious issue. Corporations collecting data sometimes argue that since they spend time and money doing so, the data they collect belongs to them. Individuals often feel that when data is collected about their (online) life, it is they who should decide how that data flows and what constitutes appropriate use. That said, in general, there are notices and disclosures or T&Cs about data collection and use overall, which are signed on to by data subjects at the time of collection by a digital service provider. Examples of those notices were not reviewed as part of this analysis.
Engaging in a dialogue with ‘data subjects’ and civil society groups to inform the design of terms and conditions for Data Collaboratives or Challenges might change the lack of depth in the current consideration. An interesting avenue in this regard is looking at different sub-rights and responsibilities – i.e. use, holdership, stewardship, etc. – rather than focusing on full ownership by one party or other.
14. Acknowledging third party stakeholders
In concluding a data collaborative, third party stakeholder groups such as the intended ultimate beneficiaries of the effort, are not always mentioned in the terms and conditions. Making clear whom potential other stakeholders might be and allowing these parties to weigh in on the agreement deserves to become more commonplace.
Repository of Terms and Conditions for Data Challenges
Nesta Open Data Challenge Series
BBVA Innova Challenge
New South Wales Government Data Access
US Department of Transportation – Data Innovation Challenge Rules
Yelp Dataset Challenge
Orange Data for Development Challenge – Senegal
West Nile Virus Prediction
The GovLab Selected Readings on Data Governance
[Cross-posted from http://thegovlab.org/the-govlab-selected-readings-on-data-governance/]
Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)
Our work on Data Collaboratives starts from the assumption that sharing and opening-up private sector datasets has great – and yet untapped - potential for promoting social good (See for instance GovLab selected readings on data collaboratives). At the same time, the potential of data collaboratives depends on the level of societal trust in the exchange, analysis and use of the data exchanged. Strong data governance frameworks are essential to ensure responsible data use. Without such governance regimes, the emergent data ecosystem will be hampered and the (perceived) risks will dominate the (perceived) benefits. Further, without adopting a human-centered approach to the design of data governance frameworks, including iterative prototyping and careful consideration of the experience, the responses may fail to be flexible and targeted to real needs.
To help develop new approaches to sharing corporate data assets for social good, GovLab is working with Leiden University (The Netherlands) and the World Economic Forum Data-Driven Development project. Our Data Governance Project aims to design and implement the approaches and tools needed to unleash the datasets that could be used to improve people’s lives. Our work builds upon existing efforts and findings, some of them curated and documented below. For more information about the Data Governance Project please contact Jos Berens or Stefaan Verhulst.
Annotated Selected Readings List (in alphabetical order)
Better Place Lab - Privacy, Transparency and Trust – a report looking specifically at the main risks development organizations should focus on to develop a responsible data use practice.
The Brookings Institution - Enabling Humanitarian Use of Mobile Phone Data – this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
Centre for Democracy and Technology - Health Big Data in the Commercial Context – a publication treating some of the risks involved in using new sources of health related data, and how to mitigate those risks.
Center for Information Policy Leadership - A Risk-based Approach to Privacy: Improving Effectiveness in Practice - a whitepaper on the elements of a risk-based approach to privacy.
Centre for Information Policy and Leadership – Data Governance for the Evolving Digital Market Place – a paper describing the necessary organizational reforms to effectively promote accountability within organizational structures.
Crawford and Schulz – Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm – a paper considering a rigorous ‘procedural data due process’.
DataPop Alliance – The Ethics and Politics of Call Data Analytics – a paper exploring the risks involved in using call detail records for social good, and possible ways of mitigating those risks.
Data for Development External Ethics Panel – Report of the External Review Panel – a report presenting the findings of the external expert panel overseeing the Data for Development Challenge.
Federal Trade Commission - Mobile Privacy Disclosures: Building Trust Through Transparency - a report by the FTC looking at the privacy risks involved in mobile data sharing, and ways to mitigate these risks.
Leo Mirani – How to use mobile phone data for good without invading any ones privacy – a paper on the use of data produced by mobile phone use, and the steps that need to be taken to ensure that user privacy is not intruded upon.
Lucy Bernholz – Several Examples of Digital Ethics and Proposed Practices – a literature review listing multiple sources compiled for the Stanford Ethics of Data conference, 2014.
Martin Abrams - A Unified Ethical Frame for Big Data Analysis - a paper from the Information Accountability Foundation on developing a unified ethical frame for data analysis that goes beyond privacy. NYU Centre for Urban Science and Progress – Privacy, Big Data and the Public Good – a book on the privacy issues surrounding the use of big data for promoting the public good.
Neil M. Richards and Jonathan H. King – Big Data Ethics – a research paper arguing that the growing impact of big data on society calls for a set of ethical principles to guide big data use.
OECD Revised Privacy Guidelines – a set of principles accompanied by explanatory text used globally to inform the governance and policy structures around data handling.
Whitehouse Big Data and Privacy Working Group – Big Data: Seizing Opportunities, Preserving Values – a whitepaper documenting the findings of the Whitehouse big data and privacy working group.
World Economic Forum – Pathways for Progress – a whitepaper considering the global data ecosystem and the constraints preventing data from flowing to those who need it most. A lack of well-defined and balanced governance mechanisms is considered one of the key obstacles.
Annotated Selected Readings List (in alphabetical order)
Bernholz, Lucy. “Several Examples of Digital Ethics and Proposed Practices” Stanford Ethics of Data conference, 2014, Available from: http://www.scribd.com/doc/237527226/Several-Examples-of-Digital-Ethics-and-Proposed-Practices.
This list of readings prepared for Stanford’s Ethics of Data conference lists some of the leading available literature regarding ethical data use.
Better Place Lab, “Privacy, Transparency and Trust.” Mozilla, 2015. Available from: http://www.betterplace-lab.org/privacy-report.
This report looks specifically at the risks involved in the social sector having access to datasets, and the main risks development organizations should focus on to develop a responsible data use practice.
Focusing on five specific countries (Brazil, China, Germany, India and Indonesia), the report displays specific country profiles, followed by a comparative analysis centering around the topics of privacy, transparency, online behavior and trust.
Some of the key findings mentioned are:
Centre for Democracy and Technology, “Health Big Data in the Commercial Context.” Centre for Democracy and Technology, 2015. Available from:
Focusing particularly on the privacy issues related to using data generated by individuals, this paper explores the overlap in privacy questions this field has with other data uses.
The authors note that although the Health Insurance Portability and Accountability Act (HIPAA) has proven a successful approach in ensuring accountability for health data, most of these standards do not apply to developers of the new technologies used to collect these new data sets.
For non-HIPAA covered, customer facing technologies, the paper bases an alternative framework for consideration of privacy issues. The framework is based on the Fair Information Practice Principles, and three rounds of stakeholder consultations.
Center for Information Policy Leadership, “A Risk-based Approach to Privacy: Improving Effectiveness in Practice.” Centre for Information Policy Leadership, Hunton & Williams LLP, 2015. Available from:
This white paper is part of a project aiming to explain what is often referred to as a new, risk-based approach to privacy, and the development of a privacy risk framework and methodology.
With the pace of technological progress often outstripping the capabilities of privacy officers to keep up, this method aims to offer the ability to approach privacy matters in a structured way, assessing privacy implications from the perspective of possible negative impact on individuals.
With the intended outcomes of the project being “materials to help policy-makers and legislators to identify desired outcomes and shape rules for the future which are more effective and less burdensome”, insights from this paper might also feed into the development of innovative governance mechanisms aimed specifically at preventing individual harm.
Centre for Information Policy Leadership, “Data Governance for the Evolving Digital Market Place”, Centre for Information Policy Leadership, Hunton & Williams LLP, 2011. Available from:
This paper argues that as a result of the proliferation of large scale data analytics, new models governing data inferred from society will shift responsibility to the side of organizations deriving and creating value from that data.
It is noted that, with the reality of the challenge corporations face of enabling agile and innovative data use “In exchange for increased corporate responsibility, accountability [and the governance models it mandates, ed.] allows for more flexible use of data.”
Proposed as a means to shift responsibility to the side of data-users, the accountability principle has been researched by a worldwide group of policymakers. Tailing the history of the accountability principle, the paper argues that it “(…) requires that companies implement programs that foster compliance with data protection principles, and be able to describe how those programs provide the required protections for individuals.”
The following essential elements of accountability are listed:
Crawford, Kate; Schulz, Jason. “Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm.” NYU School of Law, 2014. Available from:
Considering the privacy implications of large-scale analysis of numerous data sources, this paper proposes the implementation of a ‘procedural data due process’ mechanism to arm data subjects against potential privacy intrusions.
The authors acknowledge that some privacy protection structures already know similar mechanisms. However, due to the “inherent analytical assumptions and methodological biases” of big data systems, the authors argue for a more rigorous framework.
Letouze, Emmanuel, and; Vinck, Patrick. “The Ethics and Politics of Call Data Analytics”, DataPop Alliance, 2015. Available from:
Focusing on the use of Call Detail Records (CDRs) for social good in development contexts, this whitepaper explores both the potential of these datasets – in part by detailing recent successful efforts in the space – and political and ethical constraints to their use.
Drawing from the Menlo Report Ethical Principles Guiding ICT Research, the paper explores how these principles might be unpacked to inform an ethics framework for the analysis of CDRs.
Montjoye, Yves Alexandre de; Kendall, Jake and; Kerry, Cameron F. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, 2015. Available from:
Focussing in particular on mobile phone data, this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
Key takeaways are the following recommendations for using data for social good:
Abrams, Martin. “A Unified Ethical Frame for Big Data Analysis.” The Information Accountability Foundation, 2014. Available from:
Going beyond privacy, this paper discusses the following elements as central to developing a broad framework for data analysis:
Mirani, Leo. “How to use mobile phone data for good without invading anyone’s privacy.” Quartz, 2015. Available from:
This paper considers the privacy implications of using call detail records for social good, and ways to mitigate risks of privacy intrusion.
Taking example of the Orange D4D challenge and the anonymization strategy that was employed there, the paper describes how classic ‘anonymization’ is often not enough. The paper then lists further measures that can be taken to ensure adequate privacy protection.
Lane, Julia; Stodden, Victoria; Bender, Stefan, and; Nissenbaum, Helen, “Privacy, Big Data and the Public Good”, Cambridge University Press, 2014. Available from: http://www.dataprivacybook.org.
This book treats the privacy issues surrounding the use of big data for promoting the public good.
The questions being asked include the following:
OECD, “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data”. Available from:
A globally used set of principles to inform thought about handling personal data, the OECD privacy guidelines serve as one the leading standards for informing privacy policies and data governance structures.
The basic principles of national application are the following:
Richards, Neil M, and; King, Jonathan H. “Big Data Ethics”. Wake Forest Law Review, 2014. Available from:
This paper describes the growing impact of big data analytics on society, and argues that because of this impact, a set of ethical principles to guide data use is called for.
The four proposed themes are: privacy, confidentiality, transparency and identity.
Finally, the paper discusses how big data can be integrated into society, going into multiple facets of this integration, including the law, roles of institutions and ethical principles.
Whitehouse Big Data and Privacy Working Group, “Big Data: Seizing Opportunities, Preserving Values”, White House, 2015. Available from:
Documenting the findings of the White House big data and privacy working group, this report lists i.a. the following key recommendations regarding data governance:
William Hoffman, “Pathways for Progress” World Economic Forum, 2015. Available from:
This paper treats i.a. the lack of well-defined and balanced governance mechanisms as one of the key obstacles preventing particularly corporate sector data from being shared in a controlled space.
An approach that balances the benefits against the risks of large scale data usage in a development context, building trust among all stake holders in the data ecosystem, is viewed as key.
Furthermore, this whitepaper notes that new governance models are required not just by the growing amount of data and analytical capacity, and more refined methods for analysis. The current “super-structure” of information flows between institutions is also seen as one of the key reasons to develop alternatives to the current – outdated – approaches to data governance.
Thank you for visiting our brand-new data governance weblog. This space will be a repository of original and referenced material produced by or used in the Data Governance Project, a collaboration between Leiden University's Peace Informatics lab, NYU's GovLab and the World Economic Forum Data-Driven Development Initiative. The central theme of these posts will be the governance of data use for social good. Materials will include research findings, conference reports, and suggestions for relevant reading materials.
Contact details can be found on the right hand side of this page. Please do not hesitate to reach out with any questions or comments regarding the posts. We hope that you will find the materials in this space useful for informing your work and look forward to seeing your replies!
Jos Berens coordinates the Data Governance Project. Based intermittently in New York, NY (US) and The Hague (NL), Jos facilitates the coordination between NYU's GovLab, the World Economic Forum Data-Driven Development Initiative, and Leiden University's Peace Informatics Lab.