An Exploratory Study of Research Data Governance in the U.S.

Abstract Making decisions regarding data and the overall credibility of research constitutes research data governance. In this paper, we present results of an exploratory study of the stakeholders of research data governance. The study was conducted among individuals who work in academic and research institutions in the US, with the goal of understanding what entities are perceived as making decisions regarding data and who researchers believe should be responsible for governing research data. Our results show that there is considerable diversity and complexity across stakeholders, both in terms of who they are and their ideas about data governance. To account for this diversity, we propose to frame research data governance in the context of polycentric governance of a knowledge commons. We argue that approaching research data from the commons perspective will allow for a governance framework that can balance the goals of science and society, allow us to shift the discussion toward protection from enclosure and knowledge resilience, and help to ensure that multiple voices are included in all levels of decision-making.

in research data governance by entities from outside the traditional academy, including the government and the publishing industry, also points to the urgent need for a long-term and proactive approach to the governance of research data (Harmon, 2017;Lamdan, 2018;Larivière, Haustein, & Mongeon, 2015;Sample, 2012).
Current approaches to research data governance include initiatives that emphasize preservation, infrastructures, and open access (Foster & Deardorff, 2017;Vardigan & Whiteman, 2007;Wilkinson et al., 2016). While the open access movement and repositories go a long way toward facilitating sharing of data and research findings, especially for researchers in developing nations (Chan & Costa, 2005), it remains only one aspect of a data governance ecosystem. To broaden such an ecosystem, researchers and policy makers must grapple with the growing and increasingly diverse landscape of organizations and stakeholders involved in the production and use of research data, seeking to understand the relationships both among these entities and between these entities and individuals who carry out research. Governance strategies will require moving beyond sharing tools and compliance, and approaching data as a collective transdisciplinary object that enables knowledge work in multiple domains and over time.
In this paper, we present results of an exploratory study conducted with individuals from academic and research institutions to understand what entities are perceived as making decisions regarding data, to what degree those entities affect how individuals work with data, and who should be responsible for making key decisions in data governance. Our results show that there is considerable diversity and complexity across stakeholders, both in terms of who they are and their ideas about data governance. To make sense of this diversity and to establish a future research agenda, we propose to frame research data governance in the context of the governance of new commons: a set of resources, such as the Internet or digital culture, that has been identified as sustaining multiple stakeholders and being vulnerable to failure, conflicts, and power imbalances (Hess, 2008). More specifically, we discuss research data as part of a knowledge commons, a shared resource that combines the properties of private, public, and common goods and does not yet have stable rules or institutional arrangements in place. As such, we envision our study to be the starting point of an analysis using the theories and empirical frameworks applied to other commons (Hess & Ostrom, 2007;Ostrom, 2010), with the ultimate goal of making recommendations for governance of research data.

Background
The concept of data governance is rapidly gaining traction. Mostly grounded in the frameworks of information technology and corporate asset management, it has been defined as decision-making about the effective use of information assets (Ladley, 2012;Marco, 2006). Using "data" and "information" interchangeably, the enterprise asset perspective smoothes out the differences and uncertainties in definitions and concerns itself with the value of data for business, thus seeking to identify types of data / information, roles and responsibilities, costs, and risks associated with storage and management of data (ECAR Working Group, 2015;Hagmann, 2013). While such interchangeable use of terms may work in a corporate setting, its casualness and practicality are not applicable in the context of a research enterprise, where the differences between data, information, and knowledge are important. The development of more precise frameworks of research data governance is still at its early stages, and many approaches focus more on concepts and definitions rather than practices, decision-making, and outcomes (Alhassan, Sammon, & Daly, 2016).
Data governance models also come from the research on organizations, as more and more of them invest in data governing initiatives (Panian, 2010). Approaches range from generalized models that stem from information technology governance and define areas that any company can use in their data governance strategies, such as data quality, metadata, and access, to contingency approaches that argue that each organization requires a specific data governance configuration (Khatri & Brown, 2010;Wende & Otto, 2007). Existing models discuss centralized and decentralized approaches that determine who stores and gathers data assets within the organization, and hierarchical versus cooperative decision-making that assigns responsibilities within organizations (Wende, 2007). Case studies find that models of data governance adopted within organizations help to define roles and responsibilities with regard to data processes and requirements (Cheong & Chang, 2007).
As entities with clearly defined boundaries and stakeholders, individual organizations can adopt existing models and develop their own governance approaches for whatever their needs are. Those models, however, cannot be simply transferred to governing research data. Several factors pose challenges to adopting organizational or corporate models of access and governance for research data, including the changing nature of data, the complexity of research networks that include both data producers and consumers, and the insufficiency of open access models (Hilgartner & Brandt-Rauf, 1994). The nature of data is shifting toward more heterogeneity as well as fusion with other components of research such as physical samples, laboratory techniques and protocols, algorithms, documents, and many other inputs of scientific work. These inputs converge into research products that allow not only for answering research questions, but also supporting independent verification and future reuse (Bechhofer, De Roure, Gamble, Goble, & Buchan, 2010). At the same time, such research products complicate the issues of "asset" identification and sharing in the governance context.
Research networks in which data are generated and shared are increasingly complex. As discussed above, sharing involves many stakeholders beyond the primary researcher and audience. The decisionmaking in such networks belongs to many actors who may have differing goals and claims to parts of the data products. Moreover, open access models do not necessarily fit everywhere. For example, data are being exchanged privately, published with embargos, used in training prior to publications, released with nondisclosure agreements, and so on. Finally, the legal system, particularly in terms of regulations around intellectual property and commercialization, affects decision-making with regard to data. Boundaries between what is public domain or public good and what is private and patentable shift all the time.
Much of the work on research data governance focuses on public access and sharing practices rather than broader issues of decision-making throughout the lifecycle of research (Perrier et al., 2017). For decades the U.S. government and funding agencies have been trying to encourage researchers to share their data (Fienberg, Martin, & Straf, 1985; OSTP (US Office of Science and Technology Policy), 2013; Shelby, 2000). Encouraged and sometimes challenged by the open access movement, journals and professional societies joined in and began to establish guidelines for publishing data, although approaches to publishing and sharing recommendations and compliance vary (Pitt & Tang, 2013;Stodden, Guo, & Ma, 2013;Van Noorden, 2013;Vasilevsky, Minnier, Haendel, & Champieux, 2017). Institutions of higher education have also begun to develop policies to establish control over data produced by their employees, with marked differences in types and contents of policies across universities (Briney, Goben, & Zilinski, 2015). Additionally, libraries and data repositories develop policies and guidelines that target data management inefficiencies, including lack of documentation, proprietary formats, duplications and inconsistencies, and so on (Borer, Seabloom, Jones, & Schildhauer, 2009).
Research data governance currently exists in many forms and covers such efforts as sharing, openness, curation, management, and compliance. Research data management, in particular, has gained visibility as many research organizations face challenges of developing tools, policies, and services in support of working with data and encouraging its dissemination and archiving (Pinfield, Cox, and Smith, 2014). Discussing differences and similarities between the terms "data management" and "data governance" and their varying academic and professional roots is beyond the scope of this paper. As will be shown later, we consider "governance" to be the broader term and a nexus between several traditions of research and practice.
While the sheer amount of governance-related effort is encouraging, policies are often siloed, do not promote consistency, and may even provoke contradictory behaviors. Thus, data sharing behavior continues to vary across disciplines, work areas, and geographic regions (Tenopir et al., 2011). In addition to disciplinary cultures and regional differences, sharing depends on many factors, including individual researcher characteristics, desired degree of control, available resources, and institutional pressures (Fecher, Friesike, & Hebing, 2015;Kim & Adler, 2015;Kim & Stanton, 2016).
Individuals responsible for data end up acting on their own, increasing the risks of non-compliance, data hoarding, and data loss (Gormley and Gormley, 2012). The complexities of research products and networks illustrated above, as well as the variety of stakeholders, access strategies, and legal contexts, raise questions of who should coordinate and regulate such complexities as well as who owns and takes care of data at various stages of its lifecycle. As various groups attempt to make and enact data policies, understanding settings where they make decisions and the different groups vying for control are crucial for effective governance (Marshall, 1984).

Methodology
The impetus behind this exploratory study is the need to understand how various stakeholders in research data come together and make decisions with regard to data. However, before we can examine the dynamics of decision-making and the roles groups or individuals play in the creation of data-related standards and policies, we need to identify those groups. This exploratory study is the first step in examining norms and behaviors associated with the governance of research data, and as such it aims to identify entities that, according to various stakeholders and research communities, are perceived to contribute to the governance of research data or should be contributing to it. Thus, the study focused on two essential questions: 1. What entities affect one's data use; and 2. Who should be responsible for making decisions with regard to research data.
The study used a structured survey methodology. An anonymous web survey was open for responses from August 1, 2018 to October 31, 2018. The survey contained approximately 25 questions split into three sections: background and demographics, data sharing and governance, and organizations. The background section included questions about respondents' age, gender, education, main responsibilities at work, and disciplinary orientation. The data sharing and governance section contained questions about data sharing experiences, who should be responsible for governing data, and entities that affect researchers' work with data. The section about organizations asked respondents to identify entities that are involved in data governance in their research areas and describe their own involvement with those organizations. We include the text of our survey instrument as Appendix A.
We designed our questions to be broad to avoid leading the respondents toward a specific understanding and to allow them to provide clarifications if needed. At the same, we provided examples to illustrate what we mean by such terms as compliance and authorization agency, commercial entity or government entity. Most of the questions were multiple choice format and included an option of "Other". The choices were derived from the information science literature and tested in a separate pilot with sixteen researchers. After the feedback from the pilot, we clarified wording, modified response options, and made choice options consistent across the questions about perceived and desired impact.
The study attempted to reach several disciplines with strong histories of data management but with different attitudes to data sharing that range from full and open sharing to embargoes to no sharing, including earth sciences, social sciences, and library and information science. We restricted our region to the United States, asking the survey respondents to acknowledge that they work primarily in the U.S. The survey was disseminated through several academic and professional listservs, including the Federation of Earth Science Information Partners (ESIP), the Research Data Access and Preservation Association (RDAP), the International Association for Social Science Services and Technology (IASSIST), multiple sections of the American Statistical Association, social media groups of the American Sociological Association, and a list of individuals culled by the authors from multiple sources (approximately 1,000 earth scientists and 800 social scientists). Given the overlap in the audiences of many listservs, we estimate that the combined audience that was notified of the survey consisted of at least 3,000 researchers and data professionals. We received 129 responses; therefore our response rate can be estimated to be around 5%.

Demographics
Responses to our survey came from a diverse pool of participants across age, gender, and disciplinary affiliation. In terms of age, the largest proportion of respondents were 31-40 years old (31%; Table 1). With regard to gender, respondents were skewed toward female (53% female, 33% male, 14% prefer not to say or no answer). Respondents also came from a variety of disciplines that define both their areas of degree received and current work ( Table 2). The largest proportion of respondents came from the social sciences (49% and 32% for degree taken and work area respectively), followed by the library and information sciences (20% and 19%) and people who work in areas categorized as "other," i.e., biostatistics, public health, and business (12% and 14%). Fewer respondents identified their disciplinary affiliations as earth sciences, computer science, life sciences, and physical sciences. A number of respondents moved out of the traditional sciences, such as geology, chemistry, biology, or statistics to become computer scientists or professionals in information technology, biostatistics, and other areas. In terms of organizational affiliation, 72% of respondents came from colleges and universities. The remaining were distributed across government (9%), non-profit (6%) and for-profit organizations (7%).
Individuals who selected "Other" as a category added explanations that they were either retired or worked in places that can be considered more than one type, e.g., both government and non-profit. Rather than asking the respondents for their professional title, e.g., professor, librarian, lecturer, and so on, which carries certain assumptions about what people do at work, we asked questions about the types of responsibilities and data-related activities performed at work, encouraging respondents to think about their actual work tasks. The types of responsibilities included administrative work (e.g., office or grant support), research (tenure or non-tenure track), teaching (tenure or non-tenure track), professional (e.g., library, IT managers), and leadership (e.g., chair, upper management, supervisor). Data-related activities included collecting and analyzing original (one's own) data, using data collected by other researchers, using data provided by government agencies (e.g., NASA, Bureau of Labor Statistics), using existing archival and library materials, and assisting others in collecting, managing or analyzing their data.
Many participants divided their responsibilities across most if not all types of data-related responsibilities. We grouped the responses into position orientations based on the percentage majority rule. For example, if a respondent indicated that 50% or more of their time was spent on research, his or her position would be categorized as "research". If research was less than 50% but more than 40%, the position would be "mostly research". Similar rules were applied to all types of responsibilities. Respondents who did not have at least 40% in any of the five responsibility types were categorized as "distributed".
Out of 95 individuals who responded to the question about responsibility types, 34 were in researchoriented positions, 38 were in professional services positions, six each were in teaching and leadership positions, two stated they do mostly administrative work and nine had responsibilities distributed across four or five categories. Table 3 provides distributions of data-related activities per each position orientation. Most of our participants, regardless of their position orientation, collect their own data. Most notably, individuals in teaching, professional, leadership and distributed positions work with the original data they collect. Similarly, many respondents indicated that they assist others in collecting, managing and analyzing their data. Four respondents within professional services position that are not included in the table above selected "Other" as their data-related activity and provided the following activities: data dissemination, metadata documentation, data curation, and collection development for data. Describing their data sharing experiences, less than half of respondents indicated that they published their own data or shared it privately with researchers outside of their collaborator circle (41% and 38% respectively, see Table 4). A larger proportion (57%) indicated that they assisted others in sharing their data. Nineteen percent of respondents never shared their data. Respondents who selected "Other" added that data was already publicly available or that they share it through their own website rather than an established publishing venue, such as a data journal or a repository.

Involvement in Decision-Making
To learn about entities that are believed to be involved in data governance, we asked several questions about the types of entities that currently make decisions in research data governance and affect participants' work with data. Respondents were provided with a choice to rate the following entities with respect to how much they affect their work: individual researchers (other than oneself), academic institutions, scientific community as a whole, US government (e.g., Congress, local municipalities), publishers, funding agencies (e.g., NSF or private foundations, compliance and authorization entities such as Institutional Review Boards), commercial entities involved with data (e.g., Microsoft, Facebook), government or non-profit entities involved with data (e.g., World Health Organization). The analysis of responses did not show any statistically significant difference between participants with different disciplinary backgrounds, therefore the results are provided for the whole sample ( Figure 1). Most respondents identified individual researchers other than themselves as affecting their work very much or somewhat (70% and 15% correspondingly). The scientific community as a whole and funders were the next two groups perceived to affect respondents' work, although the split between "somewhat" and "very much" was more even. Similarly, an even split between "somewhat" and "very much" can be seen in the impact evaluation of academic institutions and compliance units. Publishers received an approximately even split between affecting somewhat / very much and very little / not at all. The least impactful entities include commercial entities and nonprofit organizations (69% and 56% affecting respondents very little or not at all). We also asked respondents to name specific entities they know or believe make decisions with regard to data, such as decisions about data collection, analysis, documentation, and sharing. Respondents were asked to name up to eight specific organizations and then for each named organization identify what types of decisions those organizations make. Several options for decision-making included adding a data management section to the organization's code of conduct, declaring data sharing as a goal for the profession, requesting journals in the domain to require publishing data, and discussing compliance with the emerging or existing regulations. Respondents were also allowed to add their own types of decisions.
Overall, our respondents provided 185 entities that they believe make decisions regarding data, with about half of the respondents providing more than one entity. Even though the question asked about specific entities, many respondents provided general answers, such as "Funders" or "Publishers". We coded all the responses into categories similar to the categories of entities above. Table 7 below provides frequencies of mention of each type of entity, as well as examples of specific entities. Funding agencies were mentioned most of the time (24% of all organizations mentioned). Among the specific agencies mentioned by the respondents were the National Science Foundation (NSF), the National Institutes of Health (NIH) and its institutes and centers, and the Department of Energy. Data organizations refer to government or non-profit organizations that collect and make available large amounts of data. Such organizations included the US Geological Survey (USGS), the National Oceanographic and Atmospheric Administration (NOAA), the Agency for Healthcare Research and Quality (AHRQ), the Census Bureau, the Bureau of Labor Statistics, and some others, for example, the Inter-university Consortium for Political and Social Research (ICPSR), a repository of social science data. Compliance and authorization entities included both administrative units that oversee compliance with the existing rules (e.g., IRB or research administration) and the rules themselves (e.g., HIPAA or FERPA).
Among the publishers, only two journal publishers were mentioned explicitly -SAGE and PLOS One. The rest included general references to journal policies, journals, publishers, and editors. Academic institutions were mentioned 9% of the time and included several specific universities as well as references such as "my university", "university policies", or "academic institutions". Individual researchers who make decisions about research data included respondents themselves ("myself" or "my team") or others ("peers", "collaborators", "supervisors", etc.) A new entity that was not mentioned in responses to the previous question was professional society. While it was mentioned only 11 times out of 185, it nevertheless adds one more type of organization to the landscape of data governance. The American Geophysical Union was mentioned several times, in addition to the American Meteorological Society and Society for Political Methodology. Commercial entities included specific companies as well as references to "companies" and "data vendors". The category "other" included such entities as libraries, IT departments, and data managers.
For every entity named, respondents were also asked about types of data-related decisions that those entities have made. The decisions of some of the entities were described through their primary function, for example, compliance and authorization units address compliance with emerging or existing regulations, or publishers require journals to publish data. Figure 2 illustrates decisions for the three selected entities that had the most volume and variety of data-related decisions: funders, academic institutions, and professional societies. As can be seen from the illustration above, funding agencies were perceived to be making many decisions regarding research data, including addressing compliance, establishing data sharing as a goal for research as a profession, adding data management to codes of conduct and requiring journals to publish data. Academic institutions were seen to be playing a smaller role and contributing in smaller proportions to each decision. Interestingly, the role of professional societies is perceived to be mostly in requiring professional journals to publish data and not in, say, establishing data sharing as a goal of the profession.
As discussed above, professional societies were mentioned as making decisions about research data, but respondents named only five specific societies that do so. In contrast to this small number, respondents indicated that they belong to 93 unique organizations that support academic and professional communities across a wide variety of areas, including biology, mathematics, environmental sciences, statistics, information science, education, social sciences, and so on. This contrast suggests that there is a considerable lack of decision making on the part of most professional organizations.

Governing Responsibilities
Finally, respondents were asked to consider which entities should be responsible for making decisions regarding research data and rate them on the 3-point scale (should be primarily responsible, should be involved but not primarily responsible, and should not be involved; Figure 3).
The majority of respondents agreed that individual researchers should be primarily responsible for making decisions about data (65%). Many other entities, including academic institutions, the scientific community, US government, funding agencies, compliance units, and nonprofit organizations should be involved but not primarily responsible, with the scientific community also having a larger share of primary responsibilities assigned to it (37%). Overall, all entities except publishers and commercial entities received strong support from our respondents for being involved and/or primarily responsible.

Ambivalence toward Responsibilities
Our exploratory study shows that many entities are perceived to be involved with research data governance and many of those same entities should have some level of responsibility for governing research data. These entities are an essential part of creating and supporting communities that produce scientific knowledge. Many individuals also belong to multiple communities, identifying themselves with a specific discipline and with interdisciplinary communities and with communities that are involved in various aspects of the data lifecycle.
The complex and overlapping nature of data, its role in knowledge production, and the associated communities present difficulties in coordinating these various actors. As individuals in these communities carry out a broad range of activities related to data, including data collection, analysis, management, and reuse, they rely upon data produced by or under the control of others, and tend to be exposed to alternative positions, guidance, and influence around data collection and use. For example, government agencies such as NASA provide data for research openly and without restrictions (Murphy, 2019), however, as that data can be used in various domains and combined with other data, researchers may end up working with policies that vary from data sharing as a condition for publication to no guidance about data at all (Vasilevsky et al., 2017). Adding to the complexity of the data sharing environment is consideration that not all data can be shared and researchers have to grapple with the economic, political, and ethical implications of their sharing or non-sharing decisions (Simon et al., 2017).
Almost all our findings point to the complexities of research data networks and an increasing overlap in data responsibilities, which, in turn, increases the ambivalence toward assigning specific responsibilities to specific agents. Our survey respondents did not report clear demarcations of their professional and data orientations, and instead indicated that they are involved in many aspects of data work, including data collection, analysis, management, and assistance to others. They were also ambivalent about who should be primarily responsible for decision-making regarding research data and distributed the responsibilities across many entities. Wallis and Borgman (2011) discussed similar ambivalences in the context of data ownership and accountability. Their exploration indicated that data does not necessarily fit with researchers' interpretations of authorship and responsibility for the products of research. We suggest that as the digital ecosystem of data grows, the complexity of the causes and responses to this ambivalence will have an even greater impact on the resilience and sustainability of research data.
Individuals in our study believed that research data governance is the responsibility of the entire data ecosystem (Table 7, Figure 3). However, the extent to which parties should be involved varied. For example, most of our respondents agreed that individual researchers should be primarily responsible for making decisions about research data. Half of respondents felt that commercial entities should not be involved at all, and close to half felt the same way about publishers. Many concerns have already been raised about publishers and other well-funded entities behaving as rule and norm creators in a data ecosystem that is heavily reliant on external funding and increasingly implicates research data in innovation, collaboration, and return on investment in science (Janicke Hinchliffe, 2018;Maxson Jones, Ankeny, & Cook-Deegan, 2018;McCain, 1995). In addition to economic influence and decision-making models, some academics were equally concerned with the growing influence of politically motivated institutions within the research data ecosystem (Edwards, 1999;Rosen, 2017;Ruppert, Isin, & Bigo, 2017). These questions of who should have a seat at the table, and what the associated costs might be, will be difficult to reconcile as individuals and organizations continue discussing and developing research data governance.

Public and Market Forces
The reluctance of our respondents to include publishers and commercial entities in decision-making with regard to research data points to the tensions between science as a public institution and commercialization of many aspects of knowledge production, including overreliance on market solutions in information technology, academic administration, and even dissemination of research. Parallels can be drawn between the governance of research data and the governance of the Internet, another complex and diffuse community that faced many sociotechnical dilemmas and can serve as an example of a fragile equilibrium among powerful players (Mueller, 2012). While historically the governance of the Internet belonged to the Internet Engineering Task Force (IETF) and an international community of network designers, researchers, operators, and vendors, which alleviated the fears of one player becoming dominant and preventing open exchange of information and services, over time that governance structure has been challenged as private actors such as network and content providers started performing governance functions (Raymond, 2013). The debates over net neutrality, languages on the Internet, and domain name system (DNS) illustrate how the increasing authority of market-based and technocratic forces and tensions among competing national interests may necessitate stronger regulation and enforcement of rules against preferential treatment based on payments and favoring private or government interests rather than the interests of communities (Abbate, 2000;Electronic Frontier Foundation, n.d.). This is not intended to suggest that the data ecosystem should not include economic and political considerations or relevant entities. We do, however, suggest that a thoughtfully developed collective action community, one that takes on board the concerns and comments of all community members, is a necessity in situations when economically or politically motivated individuals have the potential to become the primary influencers in the creation of community standards. The results of our exploratory study suggest that those who work with data in academia are already aware of the growth and potential influence of non-academic entities. Half of our respondents were against publishers and commercial entities making decisions about data. This raises a question about forms of governance that would enable collective action and balance the influence of public and market forces in deciding whether research data should be open or closed, where it should reside, and who gets to access and control it.

Data as Part of the Knowledge Commons
Our study further confirms that the complexity of the data ecosystem is characterized by the following: (1) the varied and substantial number of organizations and institutions involved; (2) individual actors that act both on behalf of their employing organizations and the larger collective communities; (3) the presence of organizations with economic interests; (4) the belief that everyone is responsible for data governance; (5) and the absence of a shared vision and collective action organizations that balance the influences of the various actors. We suggest placing research data within the knowledge commons as one of the approaches that can advance the research and action in data governance.
Knowledge commons refers to making cultural and intellectual resources accessible to all members of society and treating those resources as common pool, i.e., available to all without exclusion (Frischmann, Madison, & Strandburg, 2014;Ostrom, 1994). The proposal to treat data as commons is not new; many commons frameworks have emerged that focus on intangible resources (Hess, 2008;Jimenez, 2019). The most common approach in the context of data, however, is to call data repositories "data commons" and view them as the main governing mechanism that supplies rules for managing data and helps negotiate ownership between individual and institutional actors (Eschenfelder & Johnson, 2014). Many repositories focus on developing the mechanisms of access with a particular emphasis on technological capabilities and policy-making (Anderson, 2017;Grossman, Heath, Murphy, Patterson, & Wells, 2016;Kindling et al., 2017).
The researchers and data professionals in our study believe that researchers, or broadly, data producers, should be responsible for making decisions regarding data, and yet, not many professional organizations and even academic institutions are visibly involved in data governing activities. The discussions about what to do with research data are often framed in the context of funder-compliant data management and repository-driven data curation (Corti, Van den Eynden, Bishop, & Woollard, 2014). With such a strong emphasis on compliance and mandated sharing, and without deeper disciplinary and institutional guidance on navigating data production and consumption, the individual researcher and the community are going to be left with hundreds of decontextualized data sets that become digital graveyards, at best, or lead to an inaccurate scientific record at worst.
While the development of a governance framework is out of the scope of this study, we would like to emphasize that making decisions about storage and licensing is not enough for governing research data. Moreover, according to our study, funding agencies, academic institutions, and repositories are not the main stakeholders in research data decision-making. Therefore, governing initiatives should engage the research communities in wider conversations and discuss the role of data in knowledge production and consumption, its relevance to societal concerns, and the misalignment of incentives that leads to the prioritization of individual private interests over the collective and public interests.
Understanding knowledge commons governance, and research data governance as part of it, is in its nascent state. Given the central role of data in the production of scientific knowledge, ensuring its availability requires explicit thinking about the nature of data as a hybrid collective resource that combines the public, private, and common-pool models of goods. Such thinking therefore necessitates further understanding of research data's boundaries, actors, and outcomes. Strandburg, Frischmann and Madison (2017) proposed the Governing Knowledge Commons (GKC) framework that provides the tools for empirical studies of knowledge resources and their governance. Following the GKC (and other previous research) terminology, we propose to consider research data as an "action arena" -a space in which actors interact with one another and deal with the dilemmas of sharing and sustaining the resource. We have identified many of the actors in this arena, including the researchers and data practitioners, academic institutions, professional organizations, US government, federal agencies, and other organizations.
The GKC framework also calls for defining the goals and objectives of the commons under examination. Research data is not only evidence in support of scientific claims or an object that is imbued with certain value and demarcates the boundaries of scientific communities (Baker & Millerand, 2012), it is a resource that contributes to the sustainability of humankind and the increased solidarity for the common good that is built on trust between academia and the public (Fitzpatrick, 2019). As such, the goals and the challenges of building and sharing research data and the collective effort to govern it will have to go beyond the development of repositories and formal policies and cross other dimensions, such as a collective mission and action, cultural principles and social norms, design of platforms of participation, self-management of contributions, and conflict resolution systems (Fuster Morell, 2014). Key questions for future research include the complex interplays of various actors and resources involved in research data action arenas, the dilemmas and dependencies that are being created by non-profit, university, and commercial infrastructure and policy provisions in research communities, and how governance plays out in practice -at the levels of an individual, an institution, a professional society, and networks of stakeholders. Larger data collection efforts can also shed light on how various dimensions of research, such as field of study, regional context, types of data, individuals' rank and position and so on, affect data governance models and frameworks.

Conclusion
We began this project with a simple goal -to learn about what specific entities are involved in research data governance and who, according to individual researchers, should be involved. What we learned is that the responsibility for decision-making in research data is perceived to be distributed. Furthermore, this distribution appears to be uneven, and researchers' perceptions of decision making across entities is not consistent. Acknowledging the obvious and inevitable limitations of an exploratory study and a convenience sample, we combine our findings with the existing literature on data governance and management and posit that these perceptions point to gaps in research data governance that need to be filled. Namely, those gaps concern issues that go beyond tools and compliance, i.e., current research data governance models do not address the goal and missions of research, social and cultural norms, forms of conflict resolution, and dilemmas in sharing and use.
To begin to address these gaps and move toward a model of research data governance, we propose to use the conceptual frameworks of the commons and view data through the lens of the knowledge commons. This will allow us to expand the conception of data commons beyond the tools for storage, sharing, and compliance to include discussions of the normative aspects of governance in a way that a) ties together all professional practices around research data and associated means and objects of production and b) articulates ethical commitments and responsibilities of multiple stakeholders in research data, including individuals, research communities and government and commercial entities. Without deeper understanding of the norms, rules, subjects, and objects of research data it will be increasingly difficult to create and sustain just and socially relevant environments of knowledge production. 18. ORG_INTERESTS Thinking about the entities you just named (we can have qualtrics push this info forward from the last q into this one), in whose interest do you think they mostly act and make decisions about data