SEARCH CONTENT

You are looking at 1 - 10 of 7,107 items :

  • Databases and Data Mining x
  • Upcoming Publications x
  • Just Published x
Clear All
The impact of Big Data on the organization of the European market

Abstract

Purpose

Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaboration (IRC).

Design/methodology/approch

We propose, implement, and evaluate a method that leverages the Web-based knowledge graph Wikidata to resolve publication affiliation data to particular countries. The method is tested with general and domain-specific data sets.

Findings

Our evaluation covers the magnitude of improvement, accuracy, and consistency. Results suggest the method is beneficial, reliable, and consistent, and thus a viable and improved approach to measuring IRC.

Research limitations

Though our evaluation suggests the method works with both general and domain-specific bibliographic data sets, it may perform differently with data sets not tested here. Further limitations stem from the use of the R programming language and R libraries for country identification as well as imbalanced data coverage and quality in Wikidata that may also change over time.

Practical implications

The new method helps to increase the accuracy in IRC studies and provides a basis for further development into a general tool that enriches bibliographic data using the Wikidata knowledge graph.

Originality

This is the first attempt to enrich bibliographic data using a peer-produced, Web-based knowledge graph like Wikidata.

Abstract

Purpose

This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register (ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.

Design/methodology/approach

The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data.

Findings

We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.

Research limitations

The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.

Practical implications

The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.

Originality/value

The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.

Abstract

Rule-based modeling is an approach that permits constructing reaction networks based on the specification of rules for molecular interactions and transformations. These rules can encompass details such as the interacting sub-molecular domains and the states and binding status of the involved components. Conceptually, fine-grained spatial information such as locations can also be provided. Through “wildcards” representing component states, entire families of molecule complexes sharing certain properties can be specified as patterns. This can significantly simplify the definition of models involving species with multiple components, multiple states, and multiple compartments. The systems biology markup language (SBML) Level 3 Multi Package Version 1 extends the SBML Level 3 Version 1 core with the “type” concept in the Species and Compartment classes. Therefore, reaction rules may contain species that can be patterns and exist in multiple locations. Multiple software tools such as Simmune and BioNetGen support this standard that thus also becomes a medium for exchanging rule-based models. This document provides the specification for Release 2 of Version 1 of the SBML Level 3 Multi package. No design changes have been made to the description of models between Release 1 and Release 2; changes are restricted to the correction of errata and the addition of clarifications.

Abstract

Purpose

This study examines acknowledgments to libraries in the journal literature, as well as the efficacy of using Web of Science (WoS) to locate general acknowledgment text.

Design/methodology/approach

This mixed-methods approach quantifies and characterizes acknowledgments to libraries in the journal literature. Using WoS's Funding Text field, the acknowledgments for six peer universities were identified and then characterized. The efficacy of using WoS to locate library acknowledgments was assessed by comparing the WoS Funding Text search results to the actual acknowledgment text found in the articles.

Findings

Acknowledgments to libraries were found in articles at all six peer universities, though the absolute and relative numbers were quite low (< 0.5%). Most of the library acknowledgments were for resources (collections, funding, etc.), and many were concentrated in natural history (e.g. zoology). Examination of Texas A&M University zoology articles found that 91.7% of the funding information came from “acknowledgments” and not specifically a funding acknowledgment section. The WoS Funding Text search found 56% of the library acknowledgments compared to a search of the actual acknowledgment text in the articles.

Research limitations

Limiting publications to journals, using a single truncated search term, and including only six research universities in the United States.

Practical implications

This study examined library acknowledgments, but the same approach could be applied to searches of other keywords, institutions/organizations, individuals, etc. While not specifically designed to search general acknowledgments, WoS's Funding Text field can be used as an exploratory tool to search acknowledgments beyond funding.

Originality/value

There are a few studies that have examined library acknowledgments in the scholarly literature, though to date none of those studies have examined the efficacy of using the WoS Funding Text field to locate those library acknowledgments within the journal literature.

Abstract

Purpose

To give a theoretical framework to measure the relative impact of bibliometric methodology on the subfields of a scientific discipline, and how that impact depends on the method of evaluation used to credit individual scientists with citations and publications. The authors include a study of the discipline of physics to illustrate the method. Indicators are introduced to measure the proportion of a credit space awarded to a subfield or a set of authors.

Design/methodology/approach

The theoretical methodology introduces the notion of credit spaces for a discipline. These quantify the total citation or publication credit accumulated by the scientists in the discipline. One can then examine how the credit is divided among the subfields. The design of the physics study uses the American Physical Society print journals to assign subdiscipline classifications to articles and gather citation, publication, and author information. Credit spaces for the collection of Physical Review Journal articles are computed as a proxy for physics.

Findings

There is a substantial difference in the value or impact of a specific subfield depending on the credit system employed to credit individual authors.

Research limitations

Subfield classification information is difficult to obtain. In the illustrative physics study, subfields are treated in groups designated by the Physical Review journals. While this collection of articles represents a broad part of the physics literature, it is not all the literature nor a random sample.

Practical implications

The method of crediting individual scientists has consequences beyond the individual and affects the perceived impact of whole subfields and institutions.

Originality/value

The article reveals the consequences of bibliometric methodology on subfields of a disciple by introducing a systematic theoretical framework for measuring the consequences.

Abstract

Purpose

The use of in vitro cell culture and experimentation is a cornerstone of biomedical research, however, more attention has recently been given to the potential consequences of using such artificial basal medias and undefined supplements. As a first step towards better understanding and measuring the impact these systems have on experimental results, we use text mining to capture typical research practices and trends around cell culture.

Design/methodology/approach

To measure the scale of in vitro cell culture use, we have analyzed a corpus of 94,695 research articles that appear in biomedical research journals published in ScienceDirect from 2000–2018. Central to our investigation is the observation that studies using cell culture describe conditions using the typical sentence structure of cell line, basal media, and supplemented compounds. Here we tag our corpus with a curated list of basal medias and the Cellosaurus ontology using the Aho-Corasick algorithm. We also processed the corpus with Stanford CoreNLP to find nouns that follow the basal media, in an attempt to identify supplements used.

Findings

Interestingly, we find that researchers frequently use DMEM even if a cell line's vendor recommends less concentrated media. We see long-tailed distributions for the usage of media and cell lines, with DMEM and RPMI dominating the media, and HEK293, HEK293T, and HeLa dominating cell lines used.

Research limitations

Our analysis was restricted to documents in ScienceDirect, and our text mining method achieved high recall but low precision and mandated manual inspection of many tokens.

Practical implications

Our findings document current cell culture practices in the biomedical research community, which can be used as a resource for future experimental design.

Originality/value

No other work has taken a text mining approach to surveying cell culture practices in biomedical research.

Abstract

Purpose

Elaboration of an indicator to include the dynamic aspect of citations in bibliometric indexes.

Design/methodology/approach

A new bibliometric methodology—the f2-index—is applied at the career level and at the level of the recent 5 years to analyze the dynamic aspect of bibliometrics. The method is applied, as an illustration, to the field of corporate governance.

Findings

The compound F2-index as an extension of the f2-index recognizes past achievements but also values new research work with potential. The method is extended to the h-index and the h2-index. An activity index is defined as the ratio between the recent h’-index to the career h-index.

Research limitations

The compound F2 and H-indexes are PAC, probably approximately correct, and depend on the selection and database.

Practical implications

The F2- and H compound indexes allow identifying the rising stars of a field from a dynamic perspective. The activity ratio highlights the contribution of younger researchers.

Originality/value

The new methodology demonstrates the underestimated dynamic capacity of bibliometric research.

Abstract

Purpose

This work aims to consider the role and some of the 42-year history of the discipline impact factor (DIF) in evaluation of serial publications. Also, the original “symmetric” indicator called the “discipline susceptibility factor” is to be presented.

Design/methodology/approach

In accordance with the purpose of the work, the methods are analytical interpretation of the scientific literature related to this problem as well as speculative explanations. The information base of the research is bibliometric publications dealing with impact, impact factor, discipline impact factor, and discipline susceptibility factor.

Findings

Examples of the DIF application and modification of the indicator are given. It is shown why research and university libraries need to use the DIF to evaluate serials in conditions of scarce funding for subscription to serial publications, even if open access is available. The role of the DIF for evaluating journals by authors of scientific papers when choosing a good and right journal for submitting a paper is also briefly discussed. An original indicator “symmetrical” to the DIF (the “discipline susceptibility factor”) and its differences from the DIF in terms of content and purpose of evaluation are also briefly presented.

Research limitations

The selection of publications for the information base of the research did not include those in which the DIF was only mentioned, used partially or not for its original purpose. Restrictions on the length of the article to be submitted in this special issue of the JDIS also caused exclusion even a number of completely relevant publications. Consideration of the DIF is not placed in the context of describing other derivatives from the Garfield impact factor.

Practical implications

An underrated bibliometric indicator, viz. the discipline impact factor is being promoted for the practical application. An original indicator “symmetrical” to DIF has been proposed in order of searching serial publications representing the external research fields that might fit for potential applications of the results of scientific activities obtained within the framework of the specific research field represented by the cited specialized journals. Both can be useful in research and university libraries in their endeavors to improve scientific information services. Also, both can be used for evaluating journals by authors of scientific papers when choosing a journal to submit a paper.

Originality/value

The article substantiates the need to evaluate scientific serial publications in library activities—even in conditions of access to huge and convenient databases (subscription packages) and open access to a large number of serial publications. It gives a mini-survey of the history of one of the methods of such evaluation, and offers an original method for evaluating scientific serial publications.