[Cross-posted on http://thegovlab.org/data-prizes-and-challenges-as-data-collaboratives-terms-and-conditions/]
Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)
Over the last few months we have noticed increased discussion and activity around “data collaboratives” in which participants from different sectors — including private companies, research institutions, and government agencies — exchange data to help solve public problems. Efforts such as the Orange Data for Development Challenge, where private sector actors are exploring new ways to make data available to address societal challenges, are encouraging. In parallel with rising interest in such initiatives comes an increased need to consider how to share corporate data while mitigating internal and external risks.
There exist several methods to share corporate data including open API’s, data-enclaves and grand data challenges and prize-induces contests. The latter are unique as they allow a variety of actors to find new solutions using a shared dataset. Leveraging the power of the crowd, this method of open problem solving is particularly fitting for the big data space as it helps give back power to data subjects and their peers. It expands upon other efforts to leverage open innovation tools including prizes and challenges.
While the practice of prizes and challenges is becoming more common, little research and guidance exist on how to design and govern such efforts. For instance, there is no coherent legal framework informing how data prizes and challenges should be designed. Further, there is only a limited amount of research and best practice available that can be used as a foundation.
One way to deconstruct the governance framework of current data sharing practices through prizes and challenges is to analyze their terms and conditions. The value of dissecting the issues addressed in those terms and conditions goes beyond prizes and challenges, as they emerge as well in other data sharing arrangements.
In general, the purpose of terms and conditions is to set limitations on the use of goods and services of all sorts. The way they are formulated reflects the key concerns both parties have when engaging in a transaction. The core focus of most negotiations around terms and conditions tends to relate to finding a balance between openness and security. Analyzing how this balance is met in grand data challenges is important to understand how to accelerate corporate data sharing for public good.
Within the context of the Data Governance Project (‘DGP’, a collaboration between the The GovLab, Leiden University’s Peace Informatics Lab, and the World Economic Forum Data-Driven Development initiative) we analyzed a series of terms and conditions (see below the list of those analyzed) for both public and private data challenges. The analysis is conducted in coordination with UN Global Pulse, as an input for the upcoming Data for Climate Action Challenge that is scheduled to launch in December 2016 at COP21.
We identified the following 11 topics and provisions that frequently occurred across the terms and references analyzed, and that may guide the development of future terms and conditions:
We also identified three important omissions:
In what follows we highlight into more detail the different components that comprise existing terms of reference:
All terms and conditions analyzed have the following generic provisions in one or another form:
1. Defining actors and key terms
Of all the terms and conditions analyzed defining the stakeholders to the data collaborative is the first recurring step. As in any contract, the parties to the engagement must be clearly stated. Further, it must be clear what is meant by terms as ‘data’, ‘analysis’ and ‘outcomes’ as different interpretations of these concepts offer different potential benefits and entail different risks.
Yelp, for round six of its dataset challenge, refers to the definitions used in its general terms of service.
2. Stating eligibility requirements
In order to be eligible for participation in data challenges, applicants are usually subject to a number of varying requirements. For Nesta’s Open Data Challenge Series, ODI and Nesta employees are not allowed to enter the contest (art. 1.2) and the organizers reserve the right to bar anyone from entering at their ‘sole discretion’ (art. 1.7)
3. Determining intellectual property provisions
Besides the data ownership rights discussed further below, there is also the issue of intellectual property rights over the outcomes of corporate data used by the requesting party. Usually, these rights are reserved for the party that processes the data, but often a data providing company will require that it may use the results for its own purposes.
BBVA’s Open Innova Challenge, for example, required that participants grant BBVA a year-long non-exclusive license for internal use of the results, and broad further usage and adaptation rights. The New South Wales Government states explicitly that participants to their challenge retain intellectual property rights over material produced by the participant for entry into the competition.
4. Explaining liability
Corporations sharing data tend, to the extent possible, to absolve themselves from liability resulting from any damage or harm caused by the data they share.
The US Department of Transportation Data Challenge terms and conditions contain a section, for example, in which responsibility is excluded for various errors that might occur during the challenge.
Data-specific provisions and actions
In addition to generic provisions, the terms and conditions we analyzed contain provisions and actions that relate specifically to the data shared in the challenges.
5. Determining data ownership
Currently, most terms and conditions state that data ownership does not pass to the research teams accessing data under the agreement, implying that the corporate data provider holds ownership rights.
The terms and conditions for the Yelp data challenge contain an article that explicitly states that both the data and the results produced by participants during the challenge, will remain the “sole and exclusive property of Yelp”.
6. Guiding data handling
Prescribing the way data should be handled and the purposes for which use is allowed, is generally part of the terms and conditions of data challenges.
For the West Nile Virus Prediction Challenge, for example, it was explicitly stated that the data provided should only be used for the purposes of the contest and also that mixing the data provided with outside data was forbidden.
7. Expressing confidentiality
For most of the challenges examined, access to the data concerned is limited, and sharing data with third parties is not allowed.
BBVA specifically addresses “Confidentiality and handling data of a personal nature” in article 8 of the Innova challenge terms and conditions.
For the Orange Data for Development Challenge Senegal too, the data should be handled confidentially in perpetuity.
Finally, four types of administrative provisions are frequently included in the terms and conditions we analyzed.
8. Covering costs
In most of the challenges we studied, the terms and conditions state that costs related to the acquiring of the data and subsequent analysis, will reside with the participating research teams.
For example, in the Yelp dataset challenge, Yelp absolves itself from carrying any costs associated with the use of the data it provides, under art. 6.
9. Detailing release of outcomes
For almost all of the terms and conditions, the sharing corporation retains the right to decide whether and how the final outcomes of the challenge will be shared.
In the West Nile Virus Prediction Challenge, the organizer and the platform through which the challenge was hosted (Kaggle) reserve the right to “publicly disseminate any entries or models”, and prescribe open licensing of outcomes.
10. Determining jurisdiction for dispute settlement
Parties concluding a data sharing agreement will always assign a jurisdiction within which potential disputes will be settled.
For the Nesta’s Open Data Challenge Series challenge, it is stated explicitly that the terms and conditions shall be governed by the laws of England and Wales and fall under the excusive jurisdiction of the English Courts.
In the Orange Data for Development Challenge Senegal terms and conditions, Senegalese law is applicable, with the ‘Tribunal Régional de Dakar’ as the court that has jurisdiction over dispute settlement.
11. Consenting to acknowledgment
Consenting to public acknowledgment of the participants upon presentation of the results by the research team, and vice versa, is generally required.
In the Transportation.gov Data Innovation Challenge, participants were required to consent to U.S. Department of Transportation (the organize or the challenge) and its agents’ use of certain personal information submitted by the participant.
Some key issues that are central to the data sharing space were not addressed in the terms and conditions we analyzed. Publicly including these topics in the outcomes of dialogue between data provider and recipient would lead to a more holistic and legitimate approach to data sharing.
12. Expressing the value proposition of sharing
In our work with the Data Governance Project, one of the key notions is that data-driven work should be guided by a clearly stated value proposition, especially when there are risks involved in using the data. Although the general aims of a challenge are usually formulated in the terms and conditions, the intent of a specific project taking part in the challenge was not typically required in the terms & conditions we analyzed.
13. Untangling complex issues regarding data rights
Where the ownership rights to (personal) data reside, has become a contentious issue. Corporations collecting data sometimes argue that since they spend time and money doing so, the data they collect belongs to them. Individuals often feel that when data is collected about their (online) life, it is they who should decide how that data flows and what constitutes appropriate use. That said, in general, there are notices and disclosures or T&Cs about data collection and use overall, which are signed on to by data subjects at the time of collection by a digital service provider. Examples of those notices were not reviewed as part of this analysis.
Engaging in a dialogue with ‘data subjects’ and civil society groups to inform the design of terms and conditions for Data Collaboratives or Challenges might change the lack of depth in the current consideration. An interesting avenue in this regard is looking at different sub-rights and responsibilities – i.e. use, holdership, stewardship, etc. – rather than focusing on full ownership by one party or other.
14. Acknowledging third party stakeholders
In concluding a data collaborative, third party stakeholder groups such as the intended ultimate beneficiaries of the effort, are not always mentioned in the terms and conditions. Making clear whom potential other stakeholders might be and allowing these parties to weigh in on the agreement deserves to become more commonplace.
Repository of Terms and Conditions for Data Challenges
Nesta Open Data Challenge Series
BBVA Innova Challenge
New South Wales Government Data Access
US Department of Transportation – Data Innovation Challenge Rules
Yelp Dataset Challenge
Orange Data for Development Challenge – Senegal
West Nile Virus Prediction
Josje Spierings is head of the Secretariat of the International Data Responsibility Group, a collaboration between the Data & Society Research Institute, Data-Pop Alliance, the GovLab at NYU, UN Global Pulse, Signal Program - Harvard Humanitarian Initiative - Harvard University and Leiden University.