The increasing centrality of persistent identifiers (PIDs) to scholarly ecosystems and the contribution they can make to the burgeoning “PID graph” has the potential to transform scholarship. Despite their importance as originators of PID data, little is known about researchers’ awareness and understanding of PIDs, or their efficacy in using them. In this article, we report on the results of an online interactive test designed to elicit exploratory data about researcher awareness and understanding of PIDs. This instrument was designed to explore recognition of PIDs (e.g. Digital Object Identifiers [DOIs], Open Researcher and Contributor IDs [ORCIDs], etc.) and the extent to which researchers correctly apply PIDs within digital scholarly ecosystems, as well as measure researchers’ perceptions of PIDs. Our results reveal irregular patterns of PID understanding and certainty across all participants, though statistically significant disciplinary and academic job role differences were observed in some instances. Uncertainty and confusion were found to exist in relation to dominant schemes such as ORCID and DOIs, even when contextualized within real-world examples. We also show researchers’ perceptions of PIDs to be generally positive but that disciplinary differences can be noted, as well as higher levels of aversion to PIDs in specific use cases and negative perceptions where PIDs are measured on an “activity” semantic dimension. This work therefore contributes to our understanding of scholars’ “PID literacy” and should inform those designing PID-centric scholarly infrastructures that a significant need for training and outreach to active researchers remains necessary.
Persistent identifiers (PIDs), such as Digital Object Identifiers (DOIs) or the Open Researcher and Contributor ID (ORCID), are becoming central to the operation of scholarly systems, especially open scholarly infrastructure. PIDs enable persistent reference to nodes on the web – people, places, and things – facilitating the generation of vast scholarly “PID graphs” capable of mapping connections between different entities within scholarly landscapes (Cousijn et al., 2021). CERN note that PIDs make these entities “discoverable,” by identifying them uniquely and reliably; “accessible,” by successfully resolving them even if the entity changes its location; “useable,” by referencing a specific version or state of an entity; “interoperable,” by expressing trust through transparency and provenance; and, finally, “assessable,” by contributing to an interconnected network of entities (CERN, 2020). These properties present opportunities for a variety of innovations in the scholarly communications space, not least in the identification, discovery, and retrieval of scholarly entities (Ananthakrishnan et al., 2020). The transformational potential of PIDs in future open research ecosystems is such that a growing number of research organizations now insist upon their use wherever possible (Bosman, Frantsvåg, Kramer, Langlais, & Proudman, 2021).
The adoption and use of PIDs by scholarly systems is predominantly a concern for the operational administrators of these systems, and they are typically features with which many scholarly users have only passive interaction. However, many users are also important contributors of PID data comprising the PID graph and are frequently the origin of these PID data (Dappert, Farquhar, Kotarski, & Hewlett, 2017). Such contributions are made through the creation, supply, and interlinking of heterogeneous textual objects, datasets, software, research instruments, equipment, and the related PIDs these items may generate, such as for people, organizations, or other abstract entities. If the PID graph is to demonstrate reliable growth and adequate relational depth, it will be necessary for scholarly contributors to participate meaningfully with PID centric systems and to demonstrate a level of “PID literacy” in their (re)use and creation. Despite their importance as originators of PID data, little is known about scholars’ awareness and understanding of PIDs, or their efficacy in using them. A knowledge gap also exists in our understanding of academics’ perceptions of PIDs and the extent to which their introduction to scholarly infrastructure is perceived positively.
In this article, we report on the results of an online interactive test designed to elicit exploratory data about the awareness and understanding of PIDs among scholars. This test instrument was designed to explore scholars’ recognition of PIDs and the extent to which scholars correctly apply PIDs within digital scholarly ecosystems, thereby measuring their “PID literacy” – a concept we introduce in this article. The instrument also sought to measure scholars’ perceptions of PIDs. Data elicitation in the study of perception in information science is often qualitative in nature (Cox & Abbott, 2021; Purvis et al., 2017). However, using aspects of Osgood’s original measurement approaches (Osgood, 1957), the instrument instead deployed semantic differential measurement techniques to better capture metrics on the “semantic dimensions” of scholars’ perceptions of PIDs. The relative novelty of our approach is the measurement of scholars’ perceptions using these semantic dimensions and the measurement of perceptual distances across participant groups.
While this exploratory work is motivated from a data perspective and its implications for the PID graph, it also provides a useful snapshot of scholarly users’ digital scholarship competencies and therefore also contributes to wider body of information literacy knowledge. It also provides useful perceptual insights into scholars’ thinking around the increased use of PIDs in scholarly ecosystems. Taken together, this not only greatly informs future technical approaches to the implementation of PIDs but also informs the educational and training requirements necessary to support scholars in their (re)use of PIDs.
The article will first provide explanatory background on persistent identification, PIDs, and the PID graph before going on to review the related research and theory. A formal statement of the research motivation and research questions to be addressed then follows. We continue by describing the research instrument design, followed by results and a discussion of those results.
1.1 Persistent Identification and the PID Graph
A PID provides a unique and persistent reference to an entity which is normally accessible over the Internet (McMurry et al., 2017). Such PIDs not only provide long-term identification for these entities but also actionability by being encoded as a Uniform Resource Identifier (URI) (e.g. Weigel, Kindermann, & Lautenschlager, 2014). A PID registry service – usually designed to assign PID URI prefixes, register new PIDs, and provide the necessary technical infrastructure to manage PIDs – will also typically provide metadata describing the entity, contextualizing its potential (re)use. One common form of PID which has come to typify the identification of scholarly resources on the web is the DOI. The notion underpinning a DOI is that it remains static (“persistent”) over time and, when dereferenced on the web, will always resolve to the identified resource, irrespective of whether its digital location may have changed in the intervening period (International DOI Foundation, 2017). This general PID approach has since been adopted within the digital scholarly infrastructure to persistently identify a wide variety of people, places, and things on the web, from researchers (e.g. ORCID) to research projects (e.g. Research Activity Identifier [RAiD]) to cell lines (e.g. Research Resource Identifier [RRID]), some reusing existing Handle.net infrastructure or devising different but similar service mechanisms to enable the assignation, registration, and resolution of PIDs (Cousijn et al., 2021). Figure 1 provides an example for the following Research Organization Registry (ROR) PID, the European Organization for Nuclear Research (CERN). ROR is used to persistently identify research-related organizational entities, and, in this case, it identifies CERN: https://ror.org/01ggx4157. In the same way that DOI registries gather metadata about the entities to which they are assigning DOIs, similarly, associated structured data for this PID are gathered by the ROR registry, available via the ROR API: https://api.ror.org/organizations/01ggx4157.
Although the concept of source identification in scholarship is not new (Kaplan, 1965), PID approaches under discussion in this work have emerged within the context of digital object management where the issue of “link rot” and persistence in the scholarly record have become important considerations in maintaining a healthy scholarly publication and research ecosystem (Jones et al., 2016). PIDs enable the improved citation and tracking of scholarly entities because their identifiers – whether of people, places, or things – remain persistent over time. This mitigates problems arising from link or reference rot and enables continued reuse of whatever is identified by the PID, either by user or machine. For this reason, PIDs are increasingly considered an important insurance mechanism within scholarship, assuring that scholarly verification, reproducibility, and replicability remain possible within an age of evolving digital scholarly infrastructure (Ivie & Thain, 2018; Meadows, Haak, & Brown, 2019). Additionally, the improvements in tracking made possible by PIDs offer scientometric potential, with services like Crossref becoming valuable datasets in better measuring research impact, knowledge growth, and scholarly communication trends (Hendricks, Tkaczyk, Lin, & Feeney, 2020).
It should be noted, however, that PIDs in themselves do not and cannot guarantee persistence. For example, investigations by Klein and Balakireva (2022) of DOIs – arguably the most ubiquitous PID type – suggest widespread DOI request failures and inconsistent machine responses from organizations using them. Members of the same research team have also proposed their Memento “Robust Links” approach as a means of improving the reliability of Uniform Resource Locator (URL) and URI-based referencing on the web, including with respect to PIDs (Klein, Shankar, & Van de Sompel, 2018). PIDs are therefore only persistent insofar as a PID registration service commits to resolving them, or insofar as a publisher commits to updating a PID registry with the current location of a web resource. Meanwhile, there are explorations of alternative PID models for services like open repositories, where the use of repository Open Archive Initiative (OAI) identifiers – a unique identifier for a metadata record held within a repository or OAI compliant software platform – offers a decentralized mechanism for contributing to scholarly data graphs (Knoth, Budko, Pavlenko, & Cancellieri, 2022). Despite these caveats and developments, PIDs can and do foster greater persistence of scholarly entities and help to unambiguously identify them, thereby encouraging resource discovery and reuse.
The ability to unambiguously identify entities – combined with greater certainty of persistence – has enabled the encoding of relational associations between PIDs and their associated metadata, in turn, generating complex but valuable scholarly graph-based networks (Um, Choi, Kim, & Lee, 2020). By extending many of the conventions established by the Resource Description Framework (RDF) (Macgregor, 2009) and its use within Linked Open Data (LOD) in particular (Treloar, 2011), the so-called “PID graph” presents opportunities for improved resource discovery, inferencing/reasoning, research aggregation, and so forth by establishing connections between different entities within the research landscape. RAiD, for instance, can persistently identify research activities such as an ongoing research project. This RAiD can then be referenced within scholarly article metadata thereby creating a relational association and the colocation of articles arising from the same research project. But the RAiD itself can also be used to reference associated entities within its “metadata envelope,” as in Figure 2. In this diagrammatic example, the RAiD PID references DOIs of research datasets associated with the research activity, an identifier from the Global Research Identifier Database and the International Standard Name Identifier (ISNI) scheme, a group identifier, and even another related RAiD, referenced as a sub-project (Janke, McCafferty, & Duncan, 2017).
Numerous scholarly graphs have emerged in recent years but not all generate their entity relations entirely using PIDs and instead use knowledge extraction and inferencing techniques, or combinations thereof (Atzori, Manghi, & Bardi, 2018; Manghi, Atzori, De Bonis, & Bardi, 2020; Schirrwagen et al., 2020). Though the PID graph is dependent upon metadata that demonstrates high levels of expressiveness, the graph is simpler because graph construction, maintenance, and relational associations are based entirely on the PIDs themselves. This aids graph scaling as the computational overhead associated with knowledge extraction, inferencing, and de-duplication does not exist (Cousijn et al., 2021). Suffice to state that important contributors to the PID graph include – but is not limited to – DataCite, Crossref, ORCID, ROR, ISNI, FundRef, RRID, and RAiD. The nature of this article is such that a fuller treatment of the PID graph exceeds our scope; however, useful technical background is provided by many others in the literature (Aquino et al., 2017; Cousijn et al., 2021; Dappert et al., 2017; Klump & Huber, 2017; Meadows et al., 2019).
It is also apposite to note that emerging models of scientific communication entail even greater levels of PID specificity. The so-called “nanopublication” models (Bucur, Kuhn, & Ceolin, 2020) and emerging platforms such as Octopus.ac seek to disaggregate the components of research into verifiable, citable chunks. Such models disaggregate research into persistently identifiable components, such as research problem, hypothesis, method, analysis, and so forth (e.g. Freeman, 2022); however, within other models, disaggregation can extend to specific paragraphs within a scientific paper (Bucur et al., 2020). These models are beyond the indicative cases to be noted later in this article but nevertheless point to an emerging trend which is seeking to expose scholars to the “PID-ification” of almost all aspects of their research.
1.2 User Research and Introducing “PID Literacy”
An extensive body of PID literature has emerged since the early 2000s, especially within digital library, digital repository, and scholarly infrastructure research, but focuses almost exclusively on PID infrastructure, technical governance, PID types, policies, metadata profiles, and so forth (e.g. Chandrakar, 2006; Foulonneau & André, 2008; Koehler, 1999; Nelson & Allen, 2002; Simons & Richardson, 2013; Weigel, Lautenschlager, Toussaint, & Kindermann, 2013). This literature has been critical to the technical evolution of PID-based technologies but does not address the socio-technical side of PID growth. For example, little is known about scholars’ understanding and perception of PIDs, or the extent of users’ “PID literacy” – a concept which we will introduce in this section.
PIDs are largely concerned with citing entities. Their broader efficacy is dependent upon their accurate reproduction by others. Since our direct understanding of scholars’ interaction with PIDs is minimal, it is useful to consider tangential areas of research, especially work exploring scholars’ capacity for identifying scholarly entities, such as their citation habits and competency. Although the competencies required to use and understand PIDs is higher, the routine of citing sources within academic work is somewhat cognate since it requires that sources – i.e. entities – be accurately identified (Kaplan, 1965). Such work most notably investigates scholars’ citation habits and can inform our understanding of the present research context (Cano, 1989). For example, observations by Garfield (1974, 1990) in the early phases of bibliometrics found frequent and numerous citation errors in scholarly articles, introduced by authors during drafting. Misidentification of journals, misspelling of author names, and name mis-orderings were found to be common. Some errors have been found to be associated with a lack of citation verification (e.g. authors copying faulty citations from a faulty source) (Broadus, 1983), a phenomenon which persists and has been termed “referencing misbehaviour” (Liang, Zhong, & Rousseau, 2014). There are indications too of variation in the citing quality and behaviour of scholars across subjects, with patterns noted as being peculiar to specific subjects, compounding the problem (dos Santos, Peroni, & Mucheroni, 2022).
Early studies also found inaccuracies within 40% of all sampled citations (Key & Roland, 1977), with subsequent work by Asano et al. finding up to 48% of all articles containing one or more errors, of which author names and article titles were found to be the source of 70% of errors (Asano, Mikawa, Nishina, Maekawa, & Obara, 1995). Interestingly, a recent longitudinal study investigating citation accuracy in articles published at 10-year intervals in a well-known journal title (between 1991 and 2019) reported an overall citation error rate of 40% (Logan, 2022), with little fluctuation in this error rate between decade intervals. This body of work suggests that in many cases, scholars’ ability to accurately identify and cite scholarly sources is not as good as it should be, despite a perceived improvement in digital scholarship ability among scholars.
Citation is as old as scholarship itself – but fluency with PIDs includes an additional understanding of web technology and the notion that URIs can reference abstract entities on the web. The information science sub-domain of information literacy has for some time noted the general challenges arising from the growth in digital academia (Behrens, 1994; Bruce, 1995; Sorapure, Inglesby, & Yatchisin, 1998). Findings from studies produced in previous decades have alerted academia to information literacy deficiencies present within some academic disciplines (Webber, Boon, & Johnston, 2007; Secker, 2004). Influential information literacy conceptual models, most notably the “Framework for Information Literacy for Higher Education” proposed by The Association of College & Research Libraries (ACRL), have subsequently recognized potential deficiencies within the scholarly communities (ACRL, 2016). The ACRL framework presents “Scholarship as Conversation” as a core component of information literacy, within which scholars should develop “familiarity with the sources of evidence, methods, and modes of discourse” used to engage in “scholarly conversations.”
More recently, conceptions of information literacy have broadened to “digital literacy,” encompassing notions of web literacy and technical skills sets (Alexander, Becker, Cummins, & Giesinger, 2017), or the more expansive overarching concept of “metaliteracy” (Mackey & Jacobson, 2017). These broader digital competencies have been severely tested in recent years as scholars have been forced – some with great difficulty – to engage in online teaching and digital research as a consequence of the Covid-19 pandemic (Heriyanto, Christiani, & Rukiyah, 2022). Suffice to state that recent work studying digital literacy competencies suggest that such skill deficiencies are common among some academic groups (Ong, 2021), with deficiencies identified to be found across scholarship more generally. For example, a literature review by Basilotta-Gómez-Pablos et al. analysed 56 articles that studied the digital competencies of university teaching staff, finding that “low or medium–low” digital competence was dominant (Basilotta-Gómez-Pablos, Matarranz, Casado-Aranda, & Otto, 2022).
Though increasingly unreflective of current information literacy habits given its publication age, Wouters and de Vries (2004) found wide variability in the way in which web-hosted scholarly resources were referenced using hyperlinks by scholars, noting a lack of standardization in hyperlink conventions. Notwithstanding that these problems have a long history in academia, it could be speculated that more recent neglect among doctoral student cohorts is inherited from an assumption that younger scholars are “digital natives” and therefore more likely to be digitally competent or information literate, a notion that has been widely refuted by recent evidence (Judd, 2018). Indeed, information literacy experiments conducted by Greer and McCann (2018) involving the citation behaviour of final year undergraduate students revealed that supposedly “digitally literate” students “do not understand URLs,” with many unable to distinguish between official and unofficial URLs or redirects, or even correctly identify a digital source on the web (Greer & McCann, 2018). It is therefore reasonable to assume that these behaviours may be carried forward to doctoral level, and perhaps beyond; and an inability to even understand URLs does raise questions about the potential extent of PID competencies in academia more generally.
The notion of “PID literacy” is not a concept which has been defined in the literature. We introduce it here to help us conceptualize the expected competencies a typical scholar might need in order to interact with PIDs effectively. It is possible to identify some typical scenarios which help us understand how PID literacy might manifest itself in users. Indicative cases where PID literacy might be exercised could include one or more of the following scenarios:
Correctly supplying an ORCID for all contributing authors when submitting an article via a manuscript submission system.
Providing PIDs to related datasets within an article manuscript (e.g. a data availability statement); but could equally apply to providing PIDs to related software, research instruments, samples, etc.
Correctly reproducing and referencing funders, grants, and other abstract entities using PIDs.
Curating research-related PIDs (e.g. funders, organizations, collaborators, etc.) within a RAiD metadata “envelope” (a location for descriptive information about research activities to be organized within RAiD).
Correctly identifying and citing (by PID) different expressions of the same scholarly work, as expressed through preprints, accepted author manuscripts, versions of records, etc.
Understanding when creation, use, or reuse of a PID might be relevant and knowledge of suitable or applicable PID types.
The above noted examples are merely indicative and any number of alternative PID-related scenarios could be imagined. Nevertheless, on this basis, we could propose that a PID literate scholar might display the following competencies:
An understanding of persistent identification in scholarship, when it should be used, and its importance to the scholarly record and the wider PID graph.
An ability to accurately identify, reproduce, and cite PIDs in scholarship activities.
Cognizance of adjacent PID types relevant to scholars’ community of practice, such as those devised to identify scholarly “things” other than academic papers.
PID literacy, as conceived here, is not a holistic conception of information literacy but instead could be considered an extension or sub-literacy of existing frameworks, most notably ACRL’s “Scholarship as Conversation” frame (ACRL, 2016). This “frame” and its companion document on “Research Competencies in Writing and Literature” (ACRL, 2021) highlight the expected “knowledge practices” and “dispositions” of scholars if they are to meaningfully interact with research and scholarly communication processes. These practices and dispositions clarify the expected competencies of scholars but stop short of capturing PID literacy behaviours.
1.3 Motivation and Research Questions
We noted in earlier sections that, despite being largely ignored in the discussions surrounding PID development, scholarly users are frequently an important source of data to the PID graph, whether they are aware that they are the source of these data or not. Even perfunctory contributions greatly enrich the PID graph, furthering the relational and inferential potential of the graph – and are essential to the creation and interlinking of heterogeneous entities, including different “expressions” of textual objects, datasets, software, research instruments, equipment, and the related PIDs these items may in turn generate. Better understanding the nature of scholars’ understanding and perceptions of PIDs therefore motivates our research. It motivates our work, not only because of the contributions that scholars may or may not make to the PID graph because of their PID literacy but because scholars’ literacy with PIDs within wider scholarly infrastructures will increasingly be exercised.
The research questions that emerge from our motivation are therefore as follows:
RQ1 – PID familiarity : To what extent are users familiar with the notion of the persistent identification of scholarly entities, their purpose, and understanding of the URIs that underpin them?
RQ2 – Identifying identifiers : To what extent can users correctly identify and distinguish between common PID types and their purpose?
RQ3 – PID perceptions : How are PIDs perceived by users and what levels of disciplinary or job role differences can be observed across academic groups, if any – and can any specific PID literacy gaps be identified?
RQ4 – Habits : Current state of PID (re)use; what are the habits? Do they routinely contribute (knowingly or unknowingly) to the PID graph? Are they habitually (re)using them (e.g. citation/references, in the creation and/or linking of scholarly entities, etc.)?
2.1 Research Instrument Design
An online research instrument was designed as the principal data collection method. This instrument included aspects of an online interactive test designed to elicit experimental data about scholars’ awareness, perceptions, and understanding of PIDs. For simplicity of administration, the instrument was delivered using online questionnaire technology but designed to test participants’ capacity for PID recognition and PID perception. The research instrument is openly available as a .qsf file or .pdf, along with research data arising from this study (Macgregor, Lancho-Barrantes, & Rasmussen Pennington, 2022).
The instrument was divided into four distinct sections: (1) computer self-efficacy (CSE), (2) PID recognition tests, (3) PID perception measurement, and (4) PID (re)use habits. The details of each section are explained in the sections below.
Participants were asked to complete a total of 33 tasks (including some questions in Section 4), producing a combination of descriptive, nominal, and ordinal data. Owing to the exploratory nature of this research, some tasks were multifaceted, particularly in Section 3 where semantic differential measurement techniques were deployed, resulting in 13 bipolar adjectives per PID concept (of which there were four). Others, such as Section 2, attempted to simulate as best as possible real-world PID challenges by using screenshots taken from active scholarly infrastructure. A brief additional section at the end of the instrument captured basic demographic data (e.g. country of origin, academic job role, discipline – as defined by the All Science Journal Classifications [ASJC] scheme) (BARTOC, 2021). Participant response data were anonymized.
2.1.2 Instrument Sections
Since we are using quite a diverse population of scholarly participants, there is the possibility that computer efficacy in some of these sub groupings may differ significantly, thereby influencing their ability to complete the PID-related tests. In other words, a large determinant of their ability to complete the PID tests might be their general lack of CSE. Section 1 of the instrument therefore sought to benchmark participants’ computer efficacy prior to the PID recognition tests and perception measurement, thereby enabling cross-disciplinary analyses of PID literacy by efficacy metric. The instrument made use of Howard’s CSE measure (Howard, 2014). Howard’s CSE measure includes 12 items and was determined to be preferable because it demonstrated improvements over popular alternatives, such Murphy, Coover, and Owen (1989), including a reduction in the number of questions and the incorporation of post-2010 computer language. Howard’s measure also demonstrates improved internal reliability and psychometric properties (Howard, 2014, 2020). The 12-item measure includes 12 statements relating to computer efficacy, measured using a 7-point Likert scale (Strongly disagree (1) – Strongly agree (7)).
Section 2 was designed to elicit data about participants’ recognition of PIDs and test the extent to which PIDs are understood, thereby contributing to our answering of RQ1 (PID familiarity) and RQ2 (Identifying identifiers). Each of the four tasks included screenshots taken from prominent scholarly publishers, articles, and repositories, each displaying several PIDs in context (Figure 3). Each of the four tasks challenged participants to identify the PIDs in the screenshots and indicate to which entity they pertained (e.g. Publications on a publisher website or platform, Research data or open data, People, etc.). Each challenge had only two correct responses. Participants were challenged on DOIs, ORCIDs, and Handle.net, with the PIDs contextualized differently in each task.
Additional six test challenges were included in this section. Each provided an example of a prominent PID type and requested participants to indicate the extent to which they were recognized and, if so, to which entity type they mostly associated them. For example, a DOI might be strongly recognized by the participant and mostly associated with Publications on a publisher website or platform and Research data or open data, and so forth. The creation of response options for the entities was informed by the “landscape analysis” published in the literature (Cousijn et al., 2021). Participants were challenged on their recognition of the following PID types: DOI, ORCID, Handle.net, ROR, ISNI, and Uniform Resource Name (URN).
Data to evaluate participants’ perceptions of PIDs were captured and operationalized using the semantic differential measurement technique (Osgood, 1957; Snider & Osgood, 1969). The semantic differential measurement technique is one that has long been deployed to measure attitudes and perceptions towards concepts (or in Osgood’s words, to “measure meaning”). Each semantic differential scale consists of a series of bipolar adjective scales on which a participant responds, in relation to the object or concept being examined. Each adjective also corresponds to one of three “semantic space dimensions”: evaluation, potency, and (oriented) activity. These semantic space dimensions, or “factors,” are hypothesized to be used by humans in their assessment of almost all phenomena and, when combined with appropriate factor analysis, can yield a reliable measure of a participant’s overall reaction to something (Stoutenborough, 2008). Though emerging from psychology, the semantic differential technique was quickly adopted for information science purposes (e.g. Allen & Matheson, 1977; Geus, Mulder, Zuurke, & Levine, 1982; Katzer, 1972) and continues to be deployed and refined to better understand a wide variety of information science research topics. Despite being a commonly used technique, it has been observed that few studies in information science use the approach optimally (Verhagen, Hooff, & Meents, 2015), with many erroneously conflating scale design and interpretation with Likert scales (Schrum, Johnson, Ghuy, & Gombolay, 2020).
In this instrument, PID perceptions were measured on the use of PIDs against the following four concepts: Scholarly Communications; and the use of PIDs to refer to People, Places, and Things (Huber, Diepenbroek, Brown, Demeranville, & Stocker, 2016; Meadows et al., 2019). A total of 13 bipolar adjective pairs were created to assist in the measurement of PID perception against these concepts, using 9 points ranging from –4 to +4 to construct the semantic differential scale. These values were not displayed to participants. Each of the adjective pairs addressed the aforementioned “semantic space” dimensions of Osgood’s approach: evaluation, potency, and (oriented) activity (Osgood, 1957) (Table 1). Osgood notes that a degree of relative importance exists within the semantic space dimensions, with evaluation representing a more powerful dimension in human thinking than either potency or activity (Osgood, 1957). The conception and design of this section of the research instrument were therefore optimized to ensure that both bipolar adjectives and semantic space dimensions were appropriately addressed. Data from this section contribute to our answering of our PID perceptions research question (RQ3).
|Semantic space dimension||Adjective pair|
Section 4 simply comprised five questions exploring participant’s PID (re)use behaviour, as per the Habits research question (RQ4). This was a brief question section and attempted to elicit data on how PIDs were or were not being created, used, or reused by participants. The extent of PID creation among participants is a significant behaviour to measure; but reuse is arguably a more significant characteristic of PIDs (e.g. reusing PIDs that already exist within the PID graph).
2.2 Data Collection
Scholarly participants were recruited via social media (Twitter, Mastodon) and through established email lists. Email lists pertaining to open science, scholarly communications, and open repositories were targeted (e.g. code4lib, UKCORR). The memberships of these lists predominantly comprise individuals associated with the delivery of scholarly communications and research publishing support at scholarly organizations and who would have internal academic staff lists to whom the invitation to participate could be directed. The authors also circulated details of the participant invitation on local academic institutional lists. Only scholars who were outside the orbit of open science, scholarly communications, and open repositories were invited to participate, since participants from these groups would inevitably demonstrate abnormal PID literacy relative to other scholarly groupings. Participation was restricted to only those scholars who were active in the publication or dissemination of research.
The data collection approach generated a convenience sample, a limitation that is discussed in a later section. Data collection occurred during 26 May 2022–24 June 2022, with all data extracted shortly thereafter in .csv for cleaning, analysis, and visualization, all performed in standard spreadsheet software. A total of 106 academic users participated in the test, with the results of 27 removed during cleaning (i.e. invalid responses).
Tables 2 and 3 report basic demographic data on the nature of the test participants. Those included were drawn primarily from the Social Sciences (ss) (47%), Physical Sciences (ps) (38%), and also, to a lesser extent, Life Sciences (ls) (15%). There were zero participants from the Health Sciences. Participants occupied the full spectrum of job role options available in the instrument, with the greatest proportion noted as being “Professor/Reader,” “PhD Research Student,” and “Other.” The clustering of participants in “Other” was unanticipated and suggests that the instrument lacked sufficient specificity in this area. Given that those participating were active in research publishing and dissemination, we could infer that some of those participants in this category occupied teaching fellow or associate positions, or were research/knowledge exchange associates, adjunct professors, and so forth.
|Origin discipline of academic participant (by ASJC)||N||%|
|Physical Sciences (includes Chemical Engineering, Chemistry, Computer Science, Earth and Planetary Sciences, Energy Engineering, Environmental Science, Material Science, Mathematics, Physics and Astronomy, Multidisciplinary)||30||38|
|Health Sciences (includes Medicine, Nursing, Veterinary, Dentistry, Health Professions, Multidisciplinary)||0||0|
|Social Sciences (includes Arts and Humanities, Business, Management and Accounting, Decision Sciences, Economics, Econometrics and Finance, Psychology, Social Sciences, Multidisciplinary)||37||47|
|Life Sciences (includes Agricultural and Biological Sciences, Biochemistry, Genetics and Molecular Biology, Immunology and Microbiology, Neuroscience, Pharmacology, Toxicology and Pharmaceutics, Multidisciplinary)||12||15|
|Academic participant job role||N||%|
|PhD research student||21||27|
Figure 4 summarizes the geographic origin of test participants, which were largely from the United Kingdom (57%), United States (16%), Germany and Canada (5%), and Italy (3%). A long tail of solitary participants came from countries including China, Brazil, South Africa, Norway, France, Austria, Czech Republic, The Bahamas, Croatia, Belgium, and Lithuania.
Recall that Section 1 of the test instrument benchmarked CSE across participants using the specified CSE 12-item 7-point measure (Howard, 2014). Internal consistency of the scales was tested using Cronbach’s alpha and demonstrated strong reliability (α = 0.94). Data from the measure are summarized in Table 4. CSE results across the group revealed a moderate level of efficacy (M = 4.98; Mdn = 5) with a high level of variation around the mean (SD = 1.52). The highest possible total score for each participant in the CSE measure was 84. Measures of central tendency on this dimension (M = 59.09; Mdn = 60; SD = 14.60) highlight the large spread of reported computer efficacy across the participant group.
|All participants||Life Sciences||Physical Sciences||Social Sciences|
|CSE statements||M||Mdn||SD||M||Mdn||SD (IQR)||M||Mdn||SD (IQR)||M||Mdn||SD (IQR)|
|I can always manage to solve difficult computer problems if I try hard enough||4.90||5||1.52||4.33||5||1.87||5.10||5.5||1.65||4.92||5||1.26|
|If my computer is “acting-up,” I can find a way to get what I want||5.24||5.5||1.50||4.91||5||1.92||5.50||6||1.43||5.14||5||1.42|
|It is easy for me to accomplish my computer goals||5.32||6||1.30||5.17||6||1.70||5.50||5.5||1.17||5.22||6||1.27|
|I am confident that I could deal efficiently with unexpected computer events||4.92||5||1.53||4.42||5||1.98||5.23||5||1.52||4.84||5||1.36|
|I can solve most computer programs if I invest the necessary effort||5.11||6||1.58||4.75||5.5||2.18||5.47||6||1.33||4.95||5||1.53|
|I can remain calm when facing computer difficulties because I can rely on my abilities||4.68||5||1.66||4.42||4.5||1.78||5.10||5||1.67||4.42||5||1.59|
|When I am confronted with a computer problem, I can usually find several solutions||4.86||5||1.47||5.42||5.5||1.44||4.93||5||1.46||4.61||5||1.48|
|I can usually handle whatever computer problem comes my way||5.00||5||1.40||5.17||5||1.47||5.28||5||1.44||4.72||5||1.32|
|Failing to do something on the computer makes me try harder||4.72||5||1.50||4.50||5||1.78||4.93||5||1.53||4.61||5||1.40|
|I am a self-reliant person when it comes to doing things on a computer||5.31||5||1.56||5.42||6||1.51||5.47||6||1.76||5.14||5||1.42|
|There are few things that I cannot do on a computer||4.57||5||1.82||4.00||5||2.41||4.43||4.5||1.94||4.89||5||1.43|
|I can persist and complete almost any computer-related task||5.08||5||1.46||5.58||5.5||1.16||5.30||5.5||1.51||4.72||5||1.45|
|CSE measure across participants||4.98||5||1.52||4.84||5||1.77||5.19||5.25||1.53||4.85||5||1.41|
|Total CSE score across participants (max = 84)||59.09||60||14.60||57.67||63.50||17.09 (20.50)||62.07||61.50||15.19 (15.50)||58.24||60||12.59 (15.25)|
Mean, median, and standard deviations are presented across all participants and segmented by ASJC area.
Table 4 also segments CSE data by the ASJC discipline grouping of participants. Life Sciences (M = 4.84; Mdn = 5; SD = 1.77), Physical Sciences (M = 5.19; Mdn = 5.25; SD = 1.53), and Social Sciences (M = 4.85; Mdn = 5; SD = 1.41) all demonstrated considerable dispersion from the mean. Similarly, summarizing total CSE scores demonstrated dispersion but revealed that those in Physical Sciences (ps) demonstrated a slightly higher mean CSE score suggesting possible differences across discipline groupings (M ps = 62.07; Mdnps = 61.50; SDps = 15.19; IQRps = 15.50; M ls = 57.57; Mdnls = 63.50; SDls = 17.09; IQRls = 20.50; M ss = 58.24; Mdnss = 60; SDss = 12.59; IQRss = 15.25). A one-way analysis of variance (ANOVA) (α = 0.05) of CSE scores across the discipline groups was performed and post-hoc comparisons were performed using the Games-Howell post-hoc procedures. The Games-Howell test is noted by recent statistical research to be most suitable when data do not satisfy homogeneity of variance assumptions, with unequal sample sizes and unequal variances. (Rusticus & Lovato, 2014; Sauder & DeMars, 2019). The ANOVA reported no statistically significant differences.
Participants were further segmented by job role to observe possible CSE differences arising from academic duties, experience, etc. Summary data are set out in Table 5. Here we can observe that participants belonging to specific job roles reported greater efficacy than others. Interestingly, Lecturers (M = 47.67; SD = 19.09; IQR = 27.25) and Professors/Readers (M = 54.30; SD = 13.28; IQR = 16.75) reported the lowest mean levels of efficacy but with greater levels of variation, while Research Assistants (M = 75; SD = 9.85; IQR = 9.5) and Postdocs (M = 67.80; SD = 5.81; IQR = 4) reported the inverse. Despite observable variances between groups, a one-way ANOVA (α = 0.05) of CSE scores across job role groups reported no statistically significant differences.
|Academic job role||CSE score per measure||Total CSE score across participants|
Mean, median, and standard deviations are presented across job roles for scores per CSE measure and total CSE scores across participants (where possible max = 84).
3.2 PID Recognition Challenges
Data arising from the first batch PID recognition challenges are set out in Table 6. These data relate to the four tasks within Section 2 of the instrument, which challenged participants to identify the PIDs in screenshots and indicate which entities they identified. Each of the four tasks had only two correct responses. The correct responses in each task are indicated in Table 6 by an asterisk; those without an asterisk denote erroneous responses to the task.
|Available responses||Task #1||Task #2||Task #3||Task #4|
|Publications||84.81 (67)*||78.48 (62)*||75.95 (60)*||86.08 (68)*|
|Publication on repository||50.63 (40)||65.82 (52)*||41.77 (33)||60.76 (48)*|
|Research data||8.86 (7)||6.33 (5)||74.68 (59)*||7.59 (6)|
|Research grants||5.06 (4)||0 (0)||1.27 (1)||0 (0)|
|Organizations||24.05 (19)||0 (0)||8.86 (7)||3.80 (3)|
|Software||0 (0)||0 (0)||2.53 (2)||0 (0)|
|People||84.81 (67)*||5.06 (4)||17.72 (14)||2.53 (2)|
|Instruments||1.27 (1)||0 (0)||3.80 (3)||1.27 (1)|
|Equipment||0 (0)||0 (0)||1.27 (1)||0 (0)|
|Projects||7.59 (6)||2.53 (2)||2.53 (2)||1.27 (1)|
|Audiovisual||1.27 (1)||0 (0)||0 (0)||0 (0)|
|Metadata||35.44 (28)||12.66 (10)||30.38 (24)||7.59 (6)|
|None||0 (0)||1.27 (1)||1.27 (1)||5.06 (4)|
Responses provided as %, with n in parentheses. Correct responses to the challenge are denoted by an asterisk (*).
A significant proportion of participants were successful in the challenges and correctly interpreted PIDs, with a high and low rate of 86% and 61% for the two components of each specific challenge, respectively. However, we can note that some participants failed the task #1 challenge, with ∼15% of participants (n = 12) failing to correctly interpret the PIDs. The failure rate was higher for task #2, with ∼22% (n = 17) and ∼34% (n = 27) observed. Rates for tasks #3 and #4 noted similar failure rates, with ∼24% (n = 19) and ∼25% (n = 20) for task #3 and ∼14% (n = 11) and ∼39% (n = 31) for task #4.
These results can be segmented by discipline (Table 7) and job role (Table 8), within which we can observe the overall performance of participants in these challenges. Participants from Social Sciences performed least successfully, achieving a score of ∼66%, ergo 34% of responses were incorrect. The maximum score that any single participant could achieve across the four challenges was 8; yet those in Social Sciences demonstrated huge levels of dispersion (M = 5.24; SD = 2.89; IQR = 5), exposing a wide variation in the success of Social Science participants in the task. This contrasts with Life Sciences participants, whose overall performance yielded a higher score (∼85%), with individual participants demonstrating greater homogeneity (M = 6.83; SD = 1.40; IQR = 1.25). Those from Physical Sciences were found to sit between Life Sciences and Social Sciences, with an overall success score of 70%. The differences appeared notable but were nevertheless tested. Owing to the statistical nature of the data group, a Levene test was performed and found that the homogeneity of variance assumption was not satisfied (p ≥ 0.001). A one-way ANOVA (with Welch F test – α = 0.05) of participants’ scores by discipline was therefore performed, indicating a statistically significant difference (F(2, 39.52) = 3.86, p < 0.03) between the performances of discipline groups. Post-hoc comparisons using the Games-Howell procedure (described previously) confirmed a significant difference between the performance of Life Science and Social Science participants only (p < 0.04).
|Available responses||Physical Sciences||Social Sciences||Life Sciences|
|Task #1||Task #2||Task #3||Task #4||Task #1||Task #2||Task #3||Task #4||Task #1||Task #2||Task #3||Task #4|
|Publication on repository||60.00||56.67*||50.00||53.33*||43.24||67.57*||35.14||56.76*||43.24||83.33*||41.67||91.67*|
Responses provided as %. Correct responses to the challenge are denoted by an asterisk (*).
|Total possible score||Total score attained||Total score as % of possible||Mean (participant score)||Median (participant score)||SD (participant score)||IQR (participant score)|
Summary results by job role are set out in Table 8, alongside the summary results by discipline and measures of central tendency provided for individual performance on recognition tests by discipline. Participants identifying as Research Assistant demonstrated the highest mean score in the PID recognition challenges (∼96%), while those identifying as Research Fellow scored the lowest (50%). A huge range in individual scores can be observed from some SD figures. For example, some of those identifying as Research Support (M = 64.06; SD = 3.31; IQR = 5.50) got none of the challenges correct. Following verification of data as demonstrating significant variance (Levene, p ≥ 0.02), a one-way ANOVA (with Welch F test – α = 0.05) of participants’ scores by job role suggested a significant difference (F(7, 19.65) = 3.11, p < 0.03); but post-hoc comparisons using the Games-Howell procedure failed to confirm this (p > 0.05). Despite many participants identifying the PIDs correctly in the challenges, a wide variety of additional erroneous responses were often provided (Table 6). Using the same statistical procedures as previously, a statistically significant difference was found between job roles when erroneous responses were provided (F(7, 21.53) = 5.98, p < 0.001), in which those designated as PhD Students (p < 0.02) and Others (p < 0.04) were more likely to offer erroneous responses, and in greater number.
Recall that additional six test challenges were included in this section of the research instrument. Each test provided an example of a prominent PID type and requested participants to indicate the extent to which they were recognized and, if so, to which entity type they most associated them. It was anticipated that some of these types would be better known to participants than others. Data are summarized in Figure 5.
|Publications (on publisher website or platform)||77.22||12.66||12.66||2.53||5.06||7.59|
|Publications (on repository)||36.71||31.65||13.92||1.27||3.80||5.06|
|Research data or open data||18.99||13.92||5.06||2.53||2.53||6.33|
|People (e.g. authors, editors, PIs, etc.)||0.00||0.00||68.35||0.00||6.33||1.27|
|Research equipment or facilities||1.27||0.00||0.00||0.00||0.00||0.00|
|Projects or research activities||2.53||2.53||1.27||1.27||0.00||0.00|
|Metadata (bibliographic data)||6.33||2.53||6.33||0.00||2.53||0.00|
The proportion of participants strongly recognizing PIDs was unsurprisingly highest for DOIs and ORCIDs, at 68.35% and 63.29%, respectively, and lowest for URNs (5.06%) and ISNIs (5.06%). Test challenges relating to other PID types generated mixed responses. For example, recognition of Handle as a type of PID generated considerable spread across the response options (Do not recognize = 34.18%; Unsure = 12.66%; Somewhat recognize = 16.46%; Strongly recognize = 22.78%). In indicating to which entity type participants most associated the PIDs, those who recognized the PIDs provided a spread of responses, summarized in Table 9. Owing to the nonparametric nature of data in this section, Kruskal–Wallis test was performed between discipline and job role groupings. Statistically significant differences were observed by discipline in the recognition of Handles (H(2) = 8.14, p < 0.02) and URNs (H(2) = 6.08, p < 0.05) only, to varying levels of significance. Differences were also confirmed in the recognition of ORCID (H(2) = 12.77, p < 0.02), ROR (H(2) = 13.96, p < 0.03), and URNs (H(2) = 12.11, p < 0.03) by job role.
3.3 PID Perception Measurement
Combined with the PID recognition challenges, the PID perception measurement section represented the next significant portion of the research instrument. Table 10 presents the “factor scores” for the ratings of PID concepts across all participants. Factor scores are derived from averaging the results for each bipolar adjective pair by the number of semantic dimension subjects present. For example, potency features four times as a semantic dimension (Table 1) and therefore “4” is the denominator.
|Concept tested||All participants|
From Table 10, we can report that participants indicated their lowest collective response to the activity dimension, across all PID concepts (M = 0.40). Within this dimension, the use of PIDs to identify People – an allusion to PID types such as ORCID – was considered most positively (M = 0.72) albeit low in comparison to the scores attained in other dimensions. Conversely, use of PIDs within Scholarly Communications (M = 0.55) or to identify Places (M = 0.21) or Things (M = 0.13) was perceived more negatively in the activity dimension. The most positive perceptions were observed in the potency dimension, yielding a factor mean of 1.70, in which the scores for the tested concepts (excepting Places) were highest. Here the perceptions of PIDs in Scholarly Communications (M = 2.09), People (M = 1.87), and Things (M = 1.81) were perceived most positively and higher than the evaluation dimension in most cases.
Results for PID perception measures across discipline groupings by semantic dimension are provided in Table 11 and reveal some disciplinary differences. Participants from Life Sciences were generally found to perceive the use of PIDs more favourably, particularly in relation to the People concept, where participant rating across all dimensions was >3, considerably higher than other disciplines. The exception to this positive appraisal from Life Sciences was in relation to the Place concept, where some of the most negative perceptions across all concepts, dimensions, and disciplines were observed. Among all three disciplines, those from Physical Sciences displayed a generally more negative perception towards PIDs, with generally lower factor scores across all the tested concepts and their semantic dimensions. For example, the combined concept score (CCS) across all concepts – the overall perception measure – was lower than either Social Sciences or Life Sciences participants and was similarly lower based by dimension, with factor means of 1.27, 1.24, and 0.22 for evaluation, potency, and activity dimensions, respectively.
|Concept tested||Physical Sciences||Social Sciences||Life Sciences|
* CCS = “combined concept score” by mean across all dimensions (evaluation, potency, activity) for a test concept.
Osgood and others describe how the semantic distance can be charted using the distance notion (Osgood, 1957; Rosnow, 2000; Snider & Osgood, 1969; Stoutenborough, 2008). This can assist in identifying specific semantic dimensions or bi-polar adjectives within the semantic scales, which have triggered specific responses, and how those responses may relate to others. Figure 6 provides four semantic distance charts, one charting data across all participants (a) and three by discipline (b, c, and d). Data supplementing these charts are provided in Table 12. The semantic distance between concepts (D = distance) – derived using the generalized distance formula – is provided in Table 13.
|Semantic scale item – bi-polar adjective pairs||Semantic space dimension (factor)||Physical Sciences||Social Sciences||Life Sciences|
|Scholarly communications||People||Places||Things||Scholarly communications||People||Places||Things||Scholarly communications||People||Places||Things|
|Bad – Good||Evaluation||2.92||1.75||1.54||2.25||2.66||2.38||1.86||2.52||3.75||3.40||2.00||3.80|
|Unimportant – Important||Potency||2.88||1.63||0.63||2.04||2.83||2.83||1.83||2.66||3.50||3.75||0.67||3.75|
|Complex – Simple||Activity||0.33||0.46||0.29||−0.29||0.76||0.59||0.14||0.55||0.00||3.50||−0.67||1.50|
|Unintuitive – Intuitive||Evaluation||−0.13||0.42||0.04||−0.04||−0.17||0.24||0.24||0.38||0.00||3.75||−1.00||1.75|
|Foolish – Wise||Evaluation||2.42||1.42||1.08||1.21||2.45||1.97||1.48||2.03||3.50||3.75||0.67||3.00|
|Unscientific – Scientific||Potency||2.04||1.25||0.92||1.96||2.00||1.93||1.38||1.90||1.75||2.75||1.67||2.50|
|Laborious – Effortless||Activity||0.13||0.13||0.13||−0.42||0.59||0.45||−0.03||−0.21||0.25||2.75||−0.33||0.25|
|Useless – Valuable||Evaluation||2.54||1.50||0.71||2.13||2.59||2.55||1.83||2.03||3.00||3.75||1.00||3.00|
|Unintelligible – Intelligible||Evaluation||−0.13||0.92||0.54||0.58||0.79||1.72||1.03||0.93||0.00||3.25||1.33||0.75|
|Abstract – Concrete||Potency||0.04||0.71||0.67||0.50||1.00||0.90||0.72||0.79||1.25||3.25||0.67||2.00|
|Difficult – Easy||Activity||0.71||0.46||0.54||0.13||1.17||0.76||0.31||0.28||1.00||3.00||−0.67||0.50|
|Negative – Positive||Evaluation||2.63||1.46||1.04||1.75||2.76||2.10||1.34||1.89||3.75||3.75||1.00||3.00|
|Unnecessary – Necessary||Potency||2.17||1.17||0.04||1.25||2.86||2.34||1.28||2.03||2.50||3.50||0.33||2.25|
Data are arranged by semantic scale item and semantic space dimension, and by participant discipline grouping semantic distance. Scores on each scale item are the calculated means.
|PID concept||Scholarly comms.||People||Places||Things|
The significance of the chart profiles will be explored in more detail within the discussion section; suffice to state that visual inspection of the Life Sciences chart (D) demonstrates a greater propensity for extreme perception differences across semantic dimensions as well as on the tested PID concepts. By contrast, Social Sciences participants demonstrated generally consistent perceptions across the concepts, with fewer extremities noted. A tendency for the perception of different PID concepts to track each other can also be observed in Social Sciences. Within Physical Sciences, we note that People and Places track each other closely while Scholarly Communications and Things deviate from this chart profile, displaying comparatively irregular perceptions.
A Wilcoxon signed-rank test can be performed to examine D, as detailed in Table 13. This was used (α = 0.05) to determine whether D between specific discipline groupings were significant. Results are summarized in Table 14. Significant D differences were observed between Scholarly Communications and Things; Physical Sciences and Social Sciences demonstrated a significant distance to Life Sciences, with the former reporting significance at α = 0.05 (T = 3.58, z = 2.11, p = 0.02) and the latter at α = 0.01 (T = 3.74, z = 2.19, p = 0.01). Significant results were also observed for D between Places and Things, with the discipline groupings of Physical Sciences and Social Sciences notable (T = 2.52, z = 2.56, p = 0.01).
|Participant discipline groupings tested||Semantic distance tested||T||z||p-value|
|Physical Sciences/Life Sciences||Scholarly communications/things||3.58||2.11||0.02*|
|Physical Sciences/Social Sciences||0.76||0.32||0.37|
|Social Sciences/Life Sciences||3.74||2.19||0.01**|
|Physical Sciences/Life Sciences||People/places||1.46||1.36||0.09|
|Physical Sciences/Social Sciences||0.82||0.51||0.31|
|Social Sciences/Life Sciences||1.15||1.26||0.10|
|Physical Sciences/Life Sciences||Scholarly communications/places||0.58||0.72||0.24|
|Physical Sciences/Social Sciences||0.28||0.32||0.37|
|Social Sciences/Life Sciences||0.42||0.58||0.28|
|Physical Sciences/Life Sciences||People/things||0.14||0.03||0.49|
|Physical Sciences/Social Sciences||0.96||0.27||0.39|
|Social Sciences/Life Sciences||0.45||0.19||0.42|
|Physical Sciences/Life Sciences||Places/things||0.22||0.20||0.42|
|Physical Sciences/Social Sciences||2.52||2.56||0.01**|
|Social Sciences/Life Sciences||1.34||1.36||0.09|
|Physical Sciences/Life Sciences||Scholarly communications/people||0.77||0.38||0.35|
|Physical Sciences/Social Sciences||0.09||0.53||0.30|
|Social Sciences/Life Sciences||0.78||0.92||0.18|
Note: p-values at <0.05 indicated by asterisk (*); p-values at 0.01 indicated by double asterisk (**).
3.4 PID (re)use Habits
Recall that our research instrument concluded with five simple questions eliciting participant’s PID (re)use behaviour. On their (un)familiarity with using PIDs in scholarly communications, as measured on a 9-point bi-polar adjective scale (Unfamiliar – Familiar), participants reported themselves to be generally familiar, with median scores inferring considerable confidence (M = 7.68; Mdn = 9.00; SD = 1.81) (Table 15). On their understanding of the purpose of PIDs (Unknowledgeable – Knowledgeable), participants also considered themselves generally knowledgeable (M = 7.29; Mdn = 8.00; SD = 1.89), although with less confidence than their familiarity. This observation held across all discipline groupings and all job groups, except “Others” which reported equal familiarity and knowledge (Table 16). Results indicate less confidence from Physical Sciences relative to Social Sciences and Life Sciences (Table 15) but also within specific job groups, most notably Research Assistant (Table 16). Results indicate that Social Sciences participants considered themselves most familiar and knowledgeable about PIDs. Incrementally higher levels of dispersion from the mean can also be observed as job “seniority” declines. For example, (un)familiar and (un)knowledgeable for Professor/Reader (SD = 0.41; 0.41) and Others (SD = 3.25; 3.25).
|All participants||Physical Sciences||Social Sciences||Life Sciences|
|Professor/Reader||Lecturer||Research Fellow||Research assistant||Postdoc||PhD research student||Research support||Other|
Participants’ views on the purpose(s) of PIDs are detailed in Table 17, with >80% of participants noting that they exist to ensure the persistent and unambiguous citation of scholarly entities on the web. The use of PIDs as a way of mitigating “link rot” or “reference rot” in scholarly communications was also noted (>78%). The importance of PIDs in contributing to global scholarly graphs was, however, noted as a consideration for only 49% of participants.
|Purpose of PIDs||Participant response to stated PID purpose (%)|
|To protect against links (URLs or “web addresses”) that may become broken over time (i.e. “link rot”)||78.26|
|To ensure the persistent and unambiguous citation of scholarly objects on the web||81.16|
|To promote interlinking between scholarly objects on the web||68.12|
|To promote the findability of my scholarly work||56.52|
|To ensure long-term maintenance and integrity of the published scholarly record on the web (e.g. for the purposes of verification, reanalysis, study reproduction, replication, etc.)||73.91|
|To enrich global bibliographic data about scholarly objects on the web and beyond||49.28|
|To support more accurate counting and tracking of citations of my work and the work of others||56.52|
|To assist in the tracking of the alternative impact of scholarly objects||39.13|
Specific questions on the creation and (re)use of PIDS over the past 4 years indicated that close to 74% of participants reported creating a PID to identify a preprint or accepted author manuscript, with 54% reporting that it had been subsequently reused (Figure 7). Many participants indicated their creation and reuse of PIDs for people, at 57% and 49%, respectively. The creation and reuse of PIDs for identifying (open)research data were also noted, at 36% and 35%, respectively, with a longer tail of responses relating to PIDs for software, projects, and so forth. Few participants reported use of PIDs to identify research instruments (3%) or research equipment (1%).
Several interesting discussion points emerge from the findings. We use the previously introduced research question labels to structure the discussion: PID familiarity (RQ1), Identifying identifiers (RQ2), PID perceptions (RQ3), and Habits (RQ4).
4.1 PID Familiarity and “Identifying Identifiers”
First, although results from the PID recognition challenges were varied, it is possible to conclude that many participants across all discipline groups failed to satisfy the components of what we defined as PID literacy. The first four PID recognition tests challenged users to identify common PID types within context (e.g. within real-world published articles). A generous interpretation of data indicates that even in Life Sciences, where participants’ performance was best, circa 15% of responses were still incorrect. Rates were much lower in Physical Sciences (30%) and Social Sciences (34%), highlighting that almost one third of participants in these discipline groupings were unable to correctly identify PIDs, as they might commonly be presented within a scholarly journal article or repository. In the case of Social Sciences, we can also conclude that this result is statistically significant insofar as it stresses a distinct disciplinary divide between the aptitude of participants from Life Sciences and Social Sciences in this respect.
Closer examination of the results helps us to observe a level of uncertainty among scholarly participants when challenged in the four tasks. Despite many participants identifying the PIDs correctly in the challenges, a wide variety of accompanying erroneous responses were also often provided. For example, data for task #1 – the correct responses for which were “Publications (on a publisher website or platform)” (i.e. DOI) and “People (e.g. authors, editors, PIs, etc.)” (ORCID) – revealed that some of those who identified the two correct responses also included additional incorrect responses (equivalent to 21.6% of cases). It is worth adding that the responses to the challenge in task #1 included two of the most widely used PIDs (i.e. a DOI identifying a publication and ORCIDs identifying authors of that publication). That some participants considered these PIDS – that were contextualized within a real-world academic article – to also identify projects, research grants, a publication on a repository, etc. is a concerning indicator of PID literacy. Suffice to state that many participants who submitted correct responses were similarly uncertain about whether these PIDs identified other entities too. This is significant because such responses inadvertently reveal that these individuals did not appear to understand the notion of PIDs as unique identifiers.
One possible explanation is that social scientists typically publish less often (Hicks, 2013) and are more likely to view long-form publications as a vehicle for research dissemination (De Filippo & Sanz-Casado, 2018), resulting in reduced exposure to the emerging centrality of PIDs to scholarly communications. They are also less likely to be in receipt of research funding and are less likely to generate research data (Curty, 2016; Jarolimkova & Drobikova, 2019), both of which might ordinarily bring them into contact with adjacent PID concepts; for example, FAIR data (David et al., 2020), research funders (Lammey, 2020), software (Li, Lin, & Greenberg, 2016), research instruments (Stocker et al., 2020), and so forth. Such an explanation may prove unsafe in the long-term as recent evidence suggests that the publication behaviours of social scientists may be evolving in line with a global growth in national research evaluation exercises (Savage & Olejniczak, 2022). However, this does not explain the performance of participants from Physical Sciences. Though their performance was not found to be statistically significantly different to Life Sciences, visual inspection of data indicates that they were only marginally better than those in Social Sciences.
While disciplinary differences were clearly observable, disentangling job role effects from these is difficult without a larger sample to better represent individual job roles. No statistically significant differences between job role performance were detected. We can at least infer, perhaps unsurprisingly, that users’ CSE appears to have little connection to PID literacy. The CSE benchmarking and the analysis of the results found no significant differences between disciplinary or job role groups, suggesting that participants represented a relatively consistent level of computer efficacy. Their level of PID literacy, as measured in this research, therefore appears independent of this efficacy. However, we can note in the analysis of erroneous responses (as segmented by job role) that PhD students were particularly more likely to offer erroneous responses and that the number of these erroneous responses tended to be higher. The embryonic nature of a PhD student’s research career dictates a relative lack of experience with scholarly publishing and scholarly infrastructure, which may explain what was observed (Hatch & Skipper, 2016). It may also be reflective of the levels of digital illiteracy observed in the reviewed literature. The recent emergence of “researcher development” initiatives at research organizations has highlighted the need to better equip postgraduate researchers for their research career (Rospigliosi & Bourner, 2019), with useful examples emerging from research library contexts where training has coalesced with aspects of information literacy education (Fazal & Chakravarty, 2021). Though such initiatives are likely to assist researchers in navigating many aspects of the publication lifecycle and stress digital scholarship competencies, it is conceivable that the emerging centrality of PIDs to scholarly communications has been absent from 2020s teaching content.
Results from the additional six test challenges appeared to confirm our findings from the previous four challenges. Low recognition of esoteric PID types was expected (e.g. URNs and ISNIs). While DOIs and ORCIDs were the most strongly recognized PID types, it is somewhat surprising that PID types of such ubiquity and visual distinctiveness were recognized by only 68% and 63% of participants. This may signify that some researchers display a disconnection with two of the longest standing and most widely used PID types, perhaps because of their publication culture (e.g. publication venues do not support DOIs or ORCID) and, as statistically confirmed, their job role precludes them from experiencing them (e.g. their role is such that they lack exposure to these types). This, however, is an unsatisfactory explanation. At the time of writing, ORCID penetration within research organizations is high. More than 10 million ORCIDs have been added to the ORCID registry at the time of writing (Petro, 2020). Research funders and national research assessment exercises increasingly consider ORCIDs to be mandatory for research active staff (Choraś & Jaroszewska-Choraś, 2020). Most participants would be expected to therefore be ORCID literate, have an ORCID, or at least have recognition of them.
Uncertainty continues within this context too. For example, in indicating to which entity types the PIDs were most associated, DOIs – which can theoretically be coined for any web entity, but which tend to have specific applications within scholarship – were most noted as being associated with publications (77%), publications on a repository (37%), research data (19%), software (3%), and so forth. But ORCIDs were most noted for not only identifying people (68%), but also publications (13%), publications on a repository (14%), research data (5%), and other entities. This would indicate that even those individuals who recognize ORCIDs and perhaps even have an ORCID are unsure of their ultimate purpose. As we shall see in Section 4.2, PIDs as a mechanism to identify “people” were nevertheless perceived positively. Individuals therefore perceive ORCIDs to be a positive thing, irrespective of any confusion surrounding how they might work or cultural disciplinary differences.
The low recognition of Handles is notable, especially being statistically linked by discipline grouping (alongside URNs). Like DOIs, Handles (Handle.net) can be coined for virtually any web entity and forms the basis of a number of identifier types, such as RAiD (Janke et al., 2017). They are perhaps most commonly used in open repository infrastructure to ensure persistent identification of open research content (e.g. manuscripts of research articles, research data, etc.). The huge volume of open research content now being served by repositories (de Castro & CESAER, 2022) and their centrality to the wider open research ecosystem would suggest that users’ familiarity with Handle should be greater. Interpreting this lack of familiarity may be disciplinary, as statistically inferred. But it may also infer that some participants would tend not to use Handles when citing such content, instead rely on more transient URLs, and have therefore tended to ignore Handles when they are provided by repositories. This might be because they have not understood their purpose, although may also reflect the findings of the reviewed literature that some scholars’ struggle to correctly cite digital entities. The increased preference for DOIs to be used by repositories instead (cOAlition, 2022) is partly because they are considered more recognizable to users than Handles and therefore better contribute towards initiatives such as FAIR data (Dunning, De Smaele, & Böhmer, 2017) and Pubfair (Ross-Hellauer, Fecher, Shearer, & Rodrigues, 2019). Their data contribution to the PID graph is also currently superior (Cope, 2021).
4.2 PID Perceptions
Measuring participants’ perceptions of PIDs is important to our research motivation since such perceptions are likely to influence future PID user behaviour. Despite the findings associated with PID familiarity (RQ1) and Identifying identifiers (RQ2), it is significant to observe that participants perceived the use of PIDs to be generally a positive thing. This is especially clear in relation to the semantic dimensions of evaluation and potency across all the tested concepts, with Scholarly communications and People enjoying the highest perception ratings. Of the concepts tested, these are likely the most familiar to participants and therefore enjoy a level of demonstrable utility that Places and Things do not, owing to their association with the identification of scholarly publications and authors. However, it demonstrates that PID perceptions, in terms of participants’ evaluative attitudes and the potency with which these attitudes are held, are generally high. This observation is less true when we consider the activity semantic dimension, where factor scores were much lower. The activity dimension is an indicator that while positive perceptions exist on evaluative and potency terms, PIDs are perceived less favourably when action is required. For example, they are closer to being perceived by participants as “laborious” and “complex.” The significance of negative perceptions around activity become more obvious at a disciplinary level, particularly in Physical Sciences and Life Sciences, which held some of the most negative perceptions on this semantic dimension.
Of the PID concepts that were measured for perception, Places were universally perceived most negatively, irrespective of discipline. The persistent identification of places represents a typical PID application, for example, to identify research organizations such as universities or research funders. Interpreting this finding is difficult without additional qualitative data but we may speculate that the notion of PIDs for places was considered too abstract for some participants, whereas PIDs for People or Things was considered more tangible and more relatable to scholarly practice.
Results of PID perception measurements across disciplines revealed compelling differences. Overall, Life Sciences participants demonstrated the most favourable perceptions of PIDs, perceiving the use of PIDs to identify People, Things, and within Scholarly communications most positively, but they also demonstrated a greater inclination towards extreme perceptions on specific semantic dimensions or concepts. This was clearly observable from the corresponding semantic distance chart (D). For example, the People and Places concept yielded a perception rating of 3.75 and −1.00, respectively for the bi-polar pair, “Unintuitive–Intuitive” (belonging to the evaluation dimension). More extreme differences were observed across bi-polar scales belonging to the activity dimension (e.g. Complex – Simple, Laborious – Effortless, Difficult – Easy). The concepts of People and Places too were, in general perception terms, far wider apart than either Physical Sciences, Social Sciences, or all participants taken collectively. Also notable is the extent to which the profile of the semantic distance chart for People and Places covary, with the perception of one tracking the perception of the other but with large distances between them. In other words, participants in Life Sciences were the most likely to perceive PIDs for People positively but simultaneously the most likely to perceive PIDs for Places negatively. Conclusions about Life Sciences were corroborated with the measurement of D, with D calculated to be considerably higher across all but one concept when compared to other disciplines. That D was found to be statistically different to Physical Sciences and Social Sciences for specific concepts is noteworthy and highlights a potential perceptual distinction between the disciplines.
An explanation for this finding is difficult without additional qualitative data from Life Sciences participants, but we may suggest that it is fuelled by the rapid prominence of PIDs within Life Sciences. Whether through the recent but rapid growth in pre-printing (Johansson, Reich, Meyers, & Lipsitch, 2018; Sarabipour et al., 2019), a phenomenon accelerated by the COVID-19 pandemic (Fraser et al., 2021; Majumder & Mandl, 2020) and joined by an accompanying consciousness of FAIR data sharing approaches (Austin et al., 2021), or the recent proliferation of PID types for entities such as samples (Bandrowski & Martone, 2016; Lehnert & Klump, 2018), specimens (Hardisty et al., 2021), equipment (Haak, Meadows, & Brown, 2018), materials (Lehnert, Klump, Wyborn, & Ramdeen, 2019), instruments (Plomp, 2020), and so on, it is conceivable that the increased visibility of PIDs has exposed these researchers to their function. The increased exposure may also help to explain why stronger perceptions were found on specific concepts as well as specific semantic dimensions (i.e. activity).
By contrast, Social Sciences participants demonstrated generally consistent perceptions across the tested concepts, with fewer extremities noted. A cursory visual inspection of the corresponding semantic distance chart (C) shows this consistency, relative to both Life Sciences and Physical Sciences; confirmed in most of the calculations of D. But we should also acknowledge that activity was again the dimension attracting least positive perceptions. The differences of D between Physical Sciences and Social Sciences for Places and Things, though statistically significant, appear to be significant because of the generally lower perception ratings displayed by Physical Sciences to PIDs. Excepting an isolated outlier for Places on the activity dimension, those from Physical Sciences displayed a generally more negative perception towards PIDs across all the tested concepts and their semantic dimensions. Such a finding is counterintuitive since it might be expected that Physical Sciences participants would be more favourably disposed to PIDs than those in Social Sciences, particularly as the latter group demonstrated lower familiarity with PIDs in the recognition tests.
In interpreting results on PID (re)use habits, we can immediately observe that participants indicated high levels of familiarity and knowledge with PIDs (optimum levels if only median values are considered) but that this confidence was in no way reflected in the results of their corresponding PID recognition challenges, described in PID familiarity and Identifying the identifiers sections above. In other words, participants considered themselves to be generally familiar and knowledgeable about PIDs, despite simultaneously demonstrating that they were often both unfamiliar and unknowledgeable about them in practice. This is especially true of Social Science participants. But on these data, it appears that interpretation of the results by job role may be more meaningful, since far greater variation between roles than between disciplines is observable. Notwithstanding the reported high levels of familiarity and knowledge as per mean and median values for particular job role (e.g. Postdoc, Research Support, other), the high dispersion from the mean suggests high data spread – spread which appears to grow as job seniority declines. The incrementally higher standard deviation as job roles become less senior is again likely indicative of uncertainty among some participants. We should, however, acknowledge that it also suggests those at Professor/Reader or Lecturer level considered themselves to be more familiar and knowledgeable, and to be more consistent in this perception than, say, a participant from Research Support.
The more the PIDs are reused, the more the new nodes can be defined by existing PIDs, thereby enriching the graph. Our definition of PID literacy emphasizes the importance of users’ understanding of when PIDs should be used and when they should be reused. PID reuse is critical to the vitality of the resulting scholarly graph (Dappert et al., 2017). Typical enrichment might take place surrounding, say, an individual (via a person PID such as ORCID) or a research dataset (via explicit dataset linking from associated publications using a DOI). The recent rise in preprint publication (Lin, Yu, Zhou, Zhou, & Shi, 2020) presents a common point of PID creation for many researchers, since preprint deposit will often entail PID minting (i.e. DOI). It was therefore unsurprising to find that this was the most highly reported context for PID creation, even if reuse was some 20% lower. The creation of PIDs for research data may appear low at 36% but demonstrated high levels of subsequent reuse, most probably because research data are likely to form the basis of more than one associated research publication and/or be used repeatedly within a wider research project (Borgman & Wofford, 2019).
Accepting that only a significant minority of the study participants would have the need to create a PID for research data, software, or instrument, the proportion of participants who indicated reusing a PID for people was disappointing. Though several PID types exist for defining people, our research instrument was clearly alluding to ORCID, with which most participants would be familiar. That fewer than half of the participants indicated they had reused a PID for people was therefore a noteworthy finding, and one that should be the subject of further research. It is possible that the description of “PIDs for people” within the research instrument was too opaque for some participants to make the connection with ORCID, resulting in under reporting. However, it could also be a legitimate finding influenced by the disciplinary context of some participants. In the case of preprint PID reuse, is it that these PIDs became superseded by an accepted publication, rendering the PID defunct for reuse? Understanding participants’ PID reuse behaviour is something which requires an additional, separate piece of qualitative research, preferably through user interviews or protocol analysis approaches.
It is possible that these lower levels of creation and reuse are linked to responses surrounding the purpose of PIDs. PIDs perform a wide number of functions but no specific “PID purpose” was considered universally relevant by participants (Table 17), thereby demonstrating a degree of PID illiteracy among participants. The notion of PIDs as enabling “persistent and unambiguous citation of scholarly objects” was a purpose that only circa 80% considered core. The importance of PIDs in contributing to global scholarly graphs was therefore considered even less relevant, at 49% of participants. With no widely accepted understanding of why PIDs exist and why they are necessary, it seems apparent that unsatisfactory and erratic (re)use of PIDs will always occur among scholarly users.
5 Future Research and Limitations
Our research suffers from several limitations, which also stimulate further research. First, not all possible variables were tested or discussed in this work, due to the limits of space. Second and more importantly, the exploratory nature of the research topic, and the consequent research instrument designed for this study, could not elicit all possible data needed to better understand the research area. Operating as a remote instrument, the instrument design, such as the PID challenges, were necessarily artificial and could not replicate the control of laboratory conditions. Nor could it assist us in better understanding why participants performed the way they did. The instrument was therefore satisfactory at surfacing perception data, as well as identifying differences between groups, but less satisfactory at understanding why these perceptions or differences existed. There is consequently a need for further research to address this weakness, ideally incorporating mixed methods. Such further work might use a smaller cohort of participants by studying users within a controlled task-based setting perhaps using, for example, protocol analysis (or “think aloud”) (Ericsson & Simon, 1993) or stimulated recall user study approaches (Lazar, Feng, & Hochheiser, 2014), thereby generating rich qualitative data which could be mined for insight. The resulting qualitative data would likely enable improved understanding of user uncertainty surrounding PIDs (especially ORCIDs) and shed light on why users’ level of PID literacy and perception can vary across disciplines and job roles. This would especially assist in better differentiating between levels of PID literacy and PID perceptions at a disciplinary and job level, and under which circumstances each becomes a relevant factor in scholars’ PID literacy. It may also provide a sound basis for proposing a model of PID literacy, capable of specifying the suite of competencies today’s digital scholars require in order to interact meaningfully with PIDs.
The disciplinary differences detected between participants also prompts greater benchmarking of PID literacy across disciplines, perhaps generating a maturity model or PID literacy model to guide the PID expectations emerging from research funders and proponents of open scholarly infrastructure. Of course, additional work would be welcomed to corroborate statistically significant findings, as well as consolidate findings on users’ PID perceptions.
Finally, the recruitment and sampling of participants was designed to accommodate the exploratory, unfunded nature of the work. This approach may, however, have introduced a level of data bias, potentially limiting the generalizability of the results (Bornstein, Jager, & Putnick, 2013; Rusticus & Lovato, 2014). It can certainly be noted that participants from Health sciences were not included and that unequal group sizes were also used, although steps were taken in subsequent data analysis to control this. We nevertheless remind the reader of the small sample size used and that the results of statistical tests should be accepted with caution. Suffice to state, it would be useful for future research to combine the need for additional qualitative evidence with the equal recruitment of participants from all disciplinary areas.
The increased use of PIDs in open scholarly infrastructures is evident to those who interact with it. Trends indicate that a growing number of PID types will soon emerge to better enable the persistence, discovery, citability, traceability, and verification of scholarly research entities (Hardisty et al., 2021). What our exploratory research shows is that many scholars, even allowing for disciplinary or job role differences, may demonstrate inadequate levels of PID literacy. They are either uncertain about the function of PIDs and what they might identify or cannot discriminate between different PID types, even when they are contextualized within real-world examples. While this study exposed participants to lesser known PID types, such uncertainty and confusion was found to exist in relation to types that would otherwise be considered dominant or typical, such as ORCIDs, or DOIs, for academic objects such as journal articles or research datasets. Indeed, despite self-reporting high levels of PID cognizance, irregular patterns of PID literacy and certainty were found to exist across all participants, though statistically significant disciplinary differences were observed in some instances, notably between Life Sciences and Social Sciences. This work therefore contributes to our understanding of scholars’ PID literacy and functions as an alert to those pioneering PID-centric scholarly infrastructures that a significant need for training and outreach to active researchers remains.
It may be recalled that our research was motivated from a data perspective. PID-centric scholarly infrastructures are largely predicated on the notion of the PID graph. But without addressing scholars’ levels of PID literacy, it is probable that, at least initially, the resulting graph will not only lack relational depth and ergo inferential potential but also suffer empty nodes owing to low PID (re)use by researchers. The education and training offered by learned societies, research funders, open research and scholarly communication teams based at academic higher education organizations should therefore also be informed by this work.
Despite the low levels of PID literacy found by this study, our work nevertheless found scholars’ perceptions of PIDs to be generally positive. By applying semantic differential techniques, we discovered that positive perceptions of PIDs within scholarly ecosystems were offset by some pronounced disciplinary differences, as well as higher levels of aversion to PIDs in specific use cases. Negative perceptions were found to exist when concepts associated with PIDs were measured on an activity semantic dimension. These perceptual insights should inform future technical approaches to the implementation of PIDs; that, scholars perceive PIDs positively in evaluative and potency terms, as a mechanism to support their work, but view the actions or activities involved in PID (re)use or creation less favourably. While this exploratory work was motivated from a data perspective and its implications for the PID graph, it also offers a valuable snapshot of academic users’ digital scholarship competencies and therefore contributes to the wider literature on information literacy. It also provides useful perceptual insights into academic thinking around the increased use of PIDs in scholarly ecosystems.
This study was approved by the Department of Computer & Information Sciences Ethics Committee, University of Strathclyde (ID: 1804).
Funding information: Authors state no funding involved.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: All data and research instruments underpinning the research documented in this article are available at: https://doi.org/10.17868/strath.00083073.
ACRL. (2016). Framework for information literacy for higher education. ALA American Library Association. https://www.ala.org/acrl/standards/ilframework.Search in Google Scholar
ACRL. (2021). Research competencies in writing and literature [Companion document to the ACRL framework for information literacy for higher education] (p. 16). ALA American Library Association. https://www.ala.org/acrl/standards/ilframework.Search in Google Scholar
Alexander, B., Becker, S. A., Cummins, M., & Giesinger, C. H. (2017). Digital literacy in higher education, Part II: An NMC horizon project strategic brief (pp. 1–37). The New Media Consortium. https://www.learntechlib.org/p/182086/.Search in Google Scholar
Allen, S., & Matheson, J. (1977). Development of a semantic differential to access users’ attitudes towards a batch mode information retrieval system (ERIC). Journal of the American Society for Information Science, 28(5), 268–272. doi: 10.1002/asi.4630280506.Search in Google Scholar
Ananthakrishnan, R., Chard, K., D’Arcy, M., Foster, I., Kesselman, C., McCollam, B., … Wagner, R. (2020). An open ecosystem for pervasive use of persistent identifiers. In Practice and Experience in Advanced Research Computing, 99–105. Association for Computing Machinery. https://doi.org/10.1145/3311790.3396660.10.1145/3311790.3396660Search in Google Scholar
Aquino, J., Allison, J., Rilling, R., Stott, D., Young, K., & Daniels, M. (2017). Motivation and strategies for implementing Digital Object Identifiers (DOIs) at NCAR’s Earth Observing Laboratory – past progress and future collaborations. Data Science Journal, 16. doi: 10.5334/dsj-2017-007.Search in Google Scholar
Asano, M., Mikawa, K., Nishina, K., Maekawa, N., & Obara, H. (1995). Improvement of the accuracy of references in the Canadian Journal of Anaesthesia. Canadian Journal of Anaesthesia, 42(5), 370–372. doi: 10.1007/BF03015478.Search in Google Scholar
Atzori, C., Manghi, P., & Bardi, A. (2018). De-duplicating the OpenAIRE scholarly communication big graph. 2018 IEEE 14th International Conference on E-Science (e-Science), 372–373. doi: 10.1109/eScience.2018.00104.Search in Google Scholar
Austin, C. C., Bernier, A., Bezuidenhout, L., Bicarregui, J., Biro, T., Cambon-Thomsen, A., … Alliance, R. D. (2021). Fostering global data sharing: Highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome open research 5, 267, doi: 10.12688/wellcomeopenres.16378.2.Search in Google Scholar
Bandrowski, A. E., & Martone, M. E. (2016). RRIDs: A simple step toward improving reproducibility through rigor and transparency of experimental methods. Neuron, 90(3), 434–436. doi: 10.1016/j.neuron.2016.04.030.Search in Google Scholar
BARTOC. (2021, March 19). All Science Journal Classification Codes. BARTOC.Org. https://bartoc.org/en/node/20290.Search in Google Scholar
Basilotta-Gómez-Pablos, V., Matarranz, M., Casado-Aranda, L.-A., & Otto, A. (2022). Teachers’ digital competencies in higher education: A systematic literature review. International Journal of Educational Technology in Higher Education, 19(1), 8. doi: 10.1186/s41239-021-00312-8.Search in Google Scholar
Boon, S., Johnston, B., & Webber, S. (2007). A phenomenographic study of English faculty’s conceptions of information literacy. Journal of Documentation, 63(2), 204–228. doi: 10.1108/00220410710737187.Search in Google Scholar
Bornstein, M. H., Jager, J., & Putnick, D. L. (2013). Sampling in developmental science: Situations, shortcomings, solutions, and standards. Developmental Review, 33(4), 357–370. doi: 10.1016/j.dr.2013.08.003.Search in Google Scholar
Broadus, R. N. (1983). An investigation of the validity of bibliographic citations. Journal of the American Society for Information Science, 34(2), 132–135. doi: 10.1002/asi.4630340206.Search in Google Scholar
Bucur, C.-I., Kuhn, T., & Ceolin, D. (2020). A Unified Nanopublication Model for Effective and User-Friendly Access to the Elements of Scientific Publishing. https://doi.org/10.48550/ARXIV.2006.06348.10.1007/978-3-030-61244-3_7Search in Google Scholar
Cano, V. (1989). Citation behavior: Classification, utility, and location. Journal of the American Society for Information Science, 40(4), 284–290. doi: 10.1002/(SICI)1097-4571(198907)40:4 < 284: AID-ASI10 > 3.0.CO;2-Z.Search in Google Scholar
CERN. (2020). Why use persistent identifiers? https://web.archive.org/web/20221001165705/https://sis.web.cern.ch/submit-and-publish/persistent-identifiers/why-pids.Search in Google Scholar
Choraś, M., & Jaroszewska-Choraś, D. (2020). The scrutinizing look on the impending proliferation of mandatory ORCID use from the perspective of data protection, privacy and freedom of science. Interdisciplinary Science Reviews, 45(4), 492–507. doi: 10.1080/03080188.2020.1780773.Search in Google Scholar
cOAlition, S. (2022). Plan S: Principles and Implementation. https://www.coalition-s.org/addendum-to-the-coalition-s-guidance-on-the-implementation-of-plan-s/principles-and-implementation/.Search in Google Scholar
Cousijn, H., Braukmann, R., Fenner, M., Ferguson, C., van Horik, R., Lammey, R., … Lambert, S. (2021). Connected research: The potential of the PID graph. Patterns, 2(1), 100180. doi: 10.1016/j.patter.2020.100180.Search in Google Scholar
Cox, A., & Abbott, P. (2021). Librarians’ perceptions of the challenges for researchers in Rwanda and the potential of open scholarship. Libri, 71(2), 93–107. doi: 10.1515/libri-2020-0036.Search in Google Scholar
Curty, R. G. (2016). Factors influencing research data reuse in the social sciences: An exploratory study. International Journal of Digital Curation, 11(1), Article 1. doi: 10.2218/ijdc.v11i1.401.Search in Google Scholar
Dappert, A., Farquhar, A., Kotarski, R., & Hewlett, K. (2017). Connecting the persistent identifier ecosystem: Building the technical and human infrastructure for open research. Data Science Journal, 16. doi: 10.5334/dsj-2017-028.Search in Google Scholar
David, R., Mabile, L., Specht, A., Stryeck, S., Thomsen, M., Yahia, M., … Group, T. R. D. (2020). FAIRness literacy: The Achilles’ heel of applying FAIR principles. Data Science Journal, 19(1), Article 1. doi: 10.5334/dsj-2020-032.Search in Google Scholar
De Filippo, D., & Sanz-Casado, E. (2018). Bibliometric and Altmetric analysis of three social science disciplines. Frontiers in Research Metrics and Analytics, 3, Article 34. doi: 10.3389/frma.2018.00034.Search in Google Scholar
dos Santos, E. A., Peroni, S., & Mucheroni, M. L. (2022). The way we cite: Common metadata used across disciplines for defining bibliographic references. In G. Silvello, O. Corcho, P. Manghi, G. M. Di Nunzio, K. Golub, N. Ferro, & A. Poggi (Eds.), Linking theory and practice of digital libraries (pp. 120–132). Cham: Springer International Publishing. doi: 10.1007/978-3-031-16802-4_10.Search in Google Scholar
Fraser, N., Brierley, L., Dey, G., Polka, J. K., Pálfy, M., Nanni, F., & Coates, J. A. (2021). The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLOS Biology, 19(4), e3000959. doi: 10.1371/journal.pbio.3000959 Search in Google Scholar
Freeman, A. (2022). Investigating the effects of the format that numbers are presented in, on people’s perception of the risk of dying from COVID-19. Octopus.Ac. https://doi.org/10.57874/7TTF-K120.Search in Google Scholar
Garfield, E. (1974). Errors – Theirs, ours and yours. Essays of an Information Scientist, 2(25), 5–6.Search in Google Scholar
Garfield, E. (1990). Journal editors awaken to the impact of citation errors. How we control them at ISI. Essays of an Information Scientist, 13(41), 367–375.Search in Google Scholar
Geus, J. D., Mulder, F., Zuurke, B., & Levine, M. M. (1982). A replication of the Nelson and Mitroff experiment in teaching “bothsides” thinking. Journal of the American Society for Information Science, 33(2), 76–81. doi: 10.1002/asi.4630330204.Search in Google Scholar
Greer, K., & McCann, S. (2018). Everything online is a website: Information format confusion in student citation behaviors. Communications in Information Literacy, 12(2), 150–165. doi: 10.15760/comminfolit.2018.12.2.6.Search in Google Scholar
Haak, L. L., Meadows, A., & Brown, J. (2018). Using ORCID, DOI, and other open identifiers in research evaluation. Frontiers in Research Metrics and Analytics, 3, Article 28. doi: 10.3389/frma.2018.00028.Search in Google Scholar
Hardisty, A., Addink, W., Glöckler, F., Güntsch, A., Islam, S., & Weiland, C. (2021). A choice of persistent identifier schemes for the distributed system of scientific collections (DiSSCo). Research Ideas and Outcomes, 7, e67379. doi: 10.3897/rio.7.e67379.Search in Google Scholar
Hatch, T., & Skipper, A. (2016). How much are PhD students publishing before graduation?: An examination of four social science disciplines. Journal of Scholarly Publishing, 47(2), 171–179. doi: 10.3138/jsp.47.2.171.Search in Google Scholar
Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1(1), 414–427. doi: 10.1162/qss_a_00022.Search in Google Scholar
Heriyanto, Christiani, L., & Rukiyah. (2022). Lecturers’ information literacy experience in remote teaching during the COVID-19 pandemic. PLOS ONE, 17(3), e0259954. doi: 10.1371/journal.pone.0259954.Search in Google Scholar
Hicks, D. (2013). One size doesn’t fit all: On the co-evolution of national evaluation systems and social science publishing. Confero: Essays on Education, Philosophy and Politics, 1(1), Article 1. doi: 10.3384/confero13v1121207b.Search in Google Scholar
Howard, M. C. (2014). Creation of a computer self-efficacy measure: Analysis of internal consistency, psychometric properties, and validity. Cyberpsychology, Behavior, and Social Networking, 17(10), 677–681. doi: 10.1089/cyber.2014.0255.Search in Google Scholar
Howard, M. C. (2020). The effect of training self‐efficacy on computer‐based training outcomes: Empirical analysis of the construct and creation of two scales. Performance Improvement Quarterly, 32(4), 331–368. doi: 10.1002/piq.21301.Search in Google Scholar
Huber, R., Diepenbroek, M., Brown, J., Demeranville, T., & Stocker, M. (2016). THOR: Connecting people, places, and things. Geophysical Research Abstracts, 18, EPSC2016-15330. https://ui.adsabs.harvard.edu/abs/2016EGUGA. 1815330H.Search in Google Scholar
International DOI Foundation. (2017). DOI Handbook. https://doi.org/10.1000/182.Search in Google Scholar
Jarolimkova, A., & Drobikova, B. (2019). Data sharing in social sciences: Case study on Charles university. In S. Kurbanoğlu, S. Špiranec, Y. Ünal, J. Boustany, M. L. Huotari, E. Grassian, … L. Roy (Eds.), Information literacy in everyday life (pp. 556–565). Springer International Publishing. doi: 10.1007/978-3-030-13472-3_52.Search in Google Scholar
Johansson, M. A., Reich, N. G., Meyers, L. A., & Lipsitch, M. (2018). Preprints: An underutilized mechanism to accelerate outbreak science. PLOS Medicine, 15(4), e1002549. doi: 10.1371/journal.pmed.1002549.Search in Google Scholar
Jones, S. M., Sompel, H. V., de, Shankar, H., Klein, M., Tobin, R., & Grover, C. (2016). Scholarly context adrift: Three out of four URI references lead to changed content. PLOS ONE, 11(12), e0167475. doi: 10.1371/journal.pone.0167475.Search in Google Scholar
Katzer, J. (1972). The development of a semantic differential to assess users’ attitudes towards an on-line interactive reference retrieval system. Journal of the American Society for Information Science, 23(2), 122–128. doi: 10.1002/asi.4630230206.Search in Google Scholar
Key, J. D., & Roland, C. G. (1977). Reference accuracy in articles accepted for publication in the archives of physical medicine and rehabilitation. Archives of Physical Medicine and Rehabilitation, 58(3), 136–137.Search in Google Scholar
Klein, M., & Balakireva, L. (2022). An extended analysis of the persistence of persistent identifiers of the scholarly web. International Journal on Digital Libraries, 23, 5–17. doi: 10.1007/s00799-021-00315-w.Search in Google Scholar
Klein, M., Shankar, H., & Van de Sompel, H. (2018). Robust links in scholarly communication. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 357–358). Association for Computing Machinery. https://doi.org/10.1145/3197026.3203885.10.1145/3197026.3203885Search in Google Scholar
Knoth, P., Budko, V., Pavlenko, V., & Cancellieri, M. (2022, June 7). OAI identifiers: Decentralised PIDs for research outputs in repositories. The 17th International Conference on Open Repositories 2022, Denver, USA. https://www.slideshare.net/petrknoth/oai-identifiers-decentralised-pids-for-research-outputs-in-repositories.Search in Google Scholar
Koehler, W. (1999). Digital libraries and World Wide Web sites and page persistence. Information Research, 4(4). http://informationr.net/ir/4-4/paper60.html.Search in Google Scholar
Koehler, W. (2002). Web page change and persistence – A four-year longitudinal study. Journal of the American Society for Information Science and Technology, 53(2), 162–171. doi: 10.1002/asi.10018.Search in Google Scholar
Lazar, J., Feng, J. H., & Hochheiser, H. (2014). Research methods in human-computer interaction. New York, NY: Wiley Global Education.Search in Google Scholar
Lehnert, K., & Klump, J. F. (2018). IGSN: Toward a Mature and Generic Persistent Identifier for Samples. 2018, IN21A-01.Search in Google Scholar
Lehnert, K., Klump, J., Wyborn, L., & Ramdeen, S. (2019). Persistent, Global, Unique: The three key requirements for a trusted identifier system for physical samples. Biodiversity Information Science and Standards, 3, Article e37334. doi: 10.3897/biss.3.37334.Search in Google Scholar
Li, K., Lin, X., & Greenberg, J. (2016). Software citation, reuse and metadata considerations: An exploratory study examining LAMMPS. Proceedings of the Association for Information Science and Technology, 53(1), 1–10. doi: 10.1002/pra2.2016.14505301072.Search in Google Scholar
Liang, L., Zhong, Z., & Rousseau, R. (2014). Scientists’ referencing (mis)behavior revealed by the dissemination network of referencing errors. Scientometrics, 101(3), 1973–1986. doi: 10.1007/s11192-014-1275-x.Search in Google Scholar
Lin, J., Yu, Y., Zhou, Y., Zhou, Z., & Shi, X. (2020). How many preprints have actually been printed and why: A case study of computer science preprints on arXiv. Scientometrics, 124(1), 555–574. doi: 10.1007/s11192-020-03430-8.Search in Google Scholar
Logan, S. W. (2022). Reference accuracy in research quarterly for exercise and sport: A 30-Year Follow-Up to Stull et al. (1991). Research Quarterly for Exercise and Sport, 93(2), 401–411. doi: 10.1080/02701367.2020.1853019.Search in Google Scholar
Macgregor, G. (2009). E-resource management and the Semantic Web: Applications of RDF for e-resource discovery. In The E-resources management handbook (pp. 1–20). Witney, UK: UKSG. doi: 10.1629/9552448-0-3.20.1.Search in Google Scholar
Macgregor, G., Lancho-Barrantes, B. S., & Rasmussen Pennington, D. (2022). Research instrument and data for exploring the concept of PID literacy: User perceptions and understanding of persistent identifiers in support of open scholarly infrastructure. Glasgow, UK: University of Strathclyde. doi: 10.17868/strath.00083073.Search in Google Scholar
Majumder, M. S., & Mandl, K. D. (2020). Early in the epidemic: Impact of preprints on global discourse about COVID-19 transmissibility. The Lancet Global Health, 8(5), e627–e630. doi: 10.1016/S2214-109X(20)30113-3.Search in Google Scholar
Manghi, P., Atzori, C., De Bonis, M., & Bardi, A. (2020). Entity deduplication in big data graphs for scholarly communication. Data Technologies and Applications, 54(4), 409–435. doi: 10.1108/DTA-09-2019-0163.Search in Google Scholar
McMurry, J. A., Juty, N., Blomberg, N., Burdett, T., Conlin, T., Conte, N., … Parkinson, H. (2017). Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLOS Biology, 15(6), e2001414. doi: 10.1371/journal.pbio.2001414.Search in Google Scholar
Meadows, A., Haak, L. L., & Brown, J. (2019). Persistent identifiers: The building blocks of the research information infrastructure. Insights, 32(1), Article 1. doi: 10.1629/uksg.457.Search in Google Scholar
Murphy, C. A., Coover, D., & Owen, S. V. (1989). Development and validation of the computer self-efficacy scale. Educational and Psychological Measurement, 49(4), 893–899. doi: 10.1177/001316448904900412.Search in Google Scholar
Ong, L.-T. (2021). Information technology literacy: The crucial factor in aged second-career academics’ sustainability. SHS Web of Conferences, 124, 06005. doi: 10.1051/shsconf/202112406005.Search in Google Scholar
Osgood, C. E. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press.Search in Google Scholar
Petro, J. (2020, November 20). 10M ORCID iDs! ORCID. https://info.orcid.org/10m-orcid-ids/.Search in Google Scholar
Purvis, R. S., Abraham, T. H., Long, C. R., Stewart, M. K., Warmack, T. S., & McElfish, P. A. (2017). Qualitative study of participants’ perceptions and preferences regarding research dissemination. AJOB Empirical Bioethics, 8(2), 69–74. doi: 10.1080/23294515.2017.1310146.Search in Google Scholar
Rospigliosi, A., & Bourner, T. (2019). Researcher development in universities: Origins and historical context. London Review of Education, 17(2), 206–222. doi: 10.18546/LRE.17.2.08.Search in Google Scholar
Ross-Hellauer, T., Fecher, B., Shearer, K., & Rodrigues, E. (2019). Pubfair: A distributed framework for open publishing services. Confederation of Open Access Repositories (COAR). https://www.coar-repositories.org/files/Pubfair-version-2-November-27-2019-2.pdf.Search in Google Scholar
Rusticus, S., & Lovato, C. (2014). Impact of sample size and variability on the power and type I error rates of equivalence tests: A simulation study. Practical Assessment, Research, and Evaluation, 19(1), Article 11. doi: 10.7275/4s9m-4e81.Search in Google Scholar
Sarabipour, S., Debat, H. J., Emmott, E., Burgess, S. J., Schwessinger, B., & Hensel, Z. (2019). On the value of preprints: An early career researcher perspective. PLOS Biology, 17(2), e3000151. doi: 10.1371/journal.pbio.3000151.Search in Google Scholar
Sauder, D. C., & DeMars, C. E. (2019). An updated recommendation for multiple comparisons. Advances in Methods and Practices in Psychological Science, 2(1), 26–44. doi: 10.1177/2515245918808784.Search in Google Scholar
Savage, W. E., & Olejniczak, A. J. (2022). More journal articles and fewer books: Publication practices in the social sciences in the 2010’s. PLOS ONE, 17(2), e0263410. doi: 10.1371/journal.pone.0263410.Search in Google Scholar
Schirrwagen, J., Bardi, A., Czerniak, A., Loehden, A., Rettberg, N., Mertens, M., & Manghi, P. (2020). Data sources and persistent identifiers in the open science research graph of OpenAIRE. International Journal of Digital Curation, 15(1), Article 1. doi: 10.2218/ijdc.v15i1.722.Search in Google Scholar
Schrum, M. L., Johnson, M., Ghuy, M., & Gombolay, M. C. (2020). Four years in review: Statistical practices of Likert scales in human-robot interaction studies. Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (pp. 43–52). doi: 10.1145/3371382.3380739 Search in Google Scholar
Secker, J. (2004). Developing the e-literacy of academics: Case studies from LSE and the Institute of Education, University of London. JeLit, 1(2), Article 2.Search in Google Scholar
Snider, J. G., & Osgood, C. E. (1969). Semantic Differential Technique; a Sourcebook. Chicago, IL: Aldine Publishing Company.Search in Google Scholar
Sorapure, M., Inglesby, P., & Yatchisin, G. (1998). Web literacy: Challenges and opportunities for research in a new medium. Computers and Composition, 15(3), 409–424. doi: 10.1016/S8755-4615(98)90009-3.Search in Google Scholar
Stocker, M., Darroch, L., Krahl, R., Habermann, T., Devaraju, A., Schwardmann, U., … Häggström, I. (2020). Persistent Identification of Instruments. Data Science Journal, 19(1), Article 1. doi: 10.5334/dsj-2020-018.Search in Google Scholar
Stoutenborough, J. W. (2008). Semantic differential technique. In P. Lavrakas (Ed.), Encyclopedia of survey research methods. Thousand Oaks, CA: Sage Publications, Inc. doi: 10.4135/9781412963947.n527.Search in Google Scholar
Um, J.-H., Choi, M., Kim, H., & Lee, S. (2020). Making reproducible research data by utilizing persistent ID graph structure. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), (pp. 597–600). doi: 10.1109/BigComp48618.2020.00018.Search in Google Scholar
Verhagen, T., Hooff, B., & Meents, S. (2015). Toward a better use of the semantic differential in IS research: An integrative framework of suggested action. Journal of the Association for Information Systems, 16(2), 108–143. doi: 10.17705/1jais.00388.Search in Google Scholar
Webber, S., Boon, S., & Johnston, B. (2005). A comparison of UK academics’ conceptions of information literacy in two disciplines: English and marketing. Library and Information Research, 29(93), 4–15. doi: 10.29173/lirg197.Search in Google Scholar
Weigel, T., Lautenschlager, M., Toussaint, F., & Kindermann, S. (2013). A framework for extended persistent identification of scientific assets. Data Science Journal, 12. doi: 10.2481/dsj.12-036.Search in Google Scholar
Wouters, P., & Vries, R. de. (2004). Formally citing the Web. Journal of the American Society for Information Science and Technology, 55(14), 1250–1260. https://doi.org/10.1002/asi.20080.10.1002/asi.20080Search in Google Scholar
© 2023 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.