Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Data and Information Management

4 Issues per year

Open Access
Online
ISSN
2543-9251
See all formats and pricing
More options …

The Second-order h-type Indicators for Identifying Top Units

Fred Y. Ye
  • Corresponding author
  • School of Information Management, Nanjing University, Nanjing 210023, China
  • Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Lutz Bornmann
  • Administrative Headquarters of the Max Planck Society, Division for Science and Innovation Studies, Hofgartenstr. 8, D-80539 Munich, Germany
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-04-24 | DOI: https://doi.org/10.1515/dim-2017-0011

Abstract

The second-order h-type indicators are suggested to identify top units in scientometrics. Basically, the re-ranking of h-type series leads to the second-order h-type indicator. The second-order h-type indicators provide an interesting and natural method to identify top units, yielding fixed h-top. Differentiating from the series of artificially defined highly cited percentile classes, the h-top contributes a natural definite top in the series of highly cited classes. When studying theoretically, the second-order h-index concerns 3% of the h-top whereas the first-order h-index refers to 10% of the h-core. The ratio of the first- and second-order h-index, hT/h, is 30%. When studying empirically, the ratio of the first- and second-order h-index, hT/h, is <30%. The approach of calculating second-order h-type indicators is exemplified based on journals in two fields.

Keywords: h-index; h-type indicator; second-order h-index; h-top; top metrics

1 Introduction

Hirsch introduced the idea of the h-index in 2005, which has been compared with other bibliometric indicators (Bornmann et al., 2008, 2011) and its theoretical aspects have been discussed (Egghe and Rousseau, 2006; Glänzel, 2006; Schubert and Glänzel, 2007; Ye, 2009, 2011). Its applications have been expanded from single researchers to various other units (e.g., journals or countries) as well as networks (Korn et al., 2009; Schubert et al., 2009; Schubert and Soos, 2010; Zhao et al., 2011). One of the most important reasons for the success of the h-index is its ability to delimit the core part (in terms of citation impact) of a publication set in a simple way. Furthermore, it is one of the few indicators which combine output and impact in a single number. When the h-index is used within a single-subject category (or related subject categories) and with publications from a same time period, it might be an interesting complement to other bibliometric indicators. Since the introduction of the h-index in 2005, the results of many studies on the index have been published. Recently, various reviews of the literature have appeared (see, e.g., Alonso et al., 2009; Egghe, 2010; Norris & Oppenheim, 2010).

Shortly after the introduction of the h-index, the concepts of first-order h-index and second-order h-index were proposed by Prathap (2006): the first-order h-index h1=h if the unit (e.g., an institution) has published h papers with at least h citations each, and the second-order h-index h2=h if the unit (e.g., an institution) has h individuals each having an individual h-index of at least h. Furthermore, some h-series, such as successive h-indices (Schubert, 2007; Ruane & Tol, 2008), have been discussed. However, Prathap’s “second-order” h-index is not really “of order two,” and the h-series are h-indices at different objects. The further consideration of the h-index of h-series has not yet been performed because real ‘second-order h-indexes’ have never been used. Thus, the real second-order h-index remains an unanswered question, and our study focuses on the question of how a second-order h-index can be defined and identified for the same unit.

Recently, the topic of research excellence has received increasing attention in scientometrics, and many different methods have been proposed for identifying excellent papers (Bornmann, 2013, 2014). The concept of “core documents” (Glänzel, 2012) was introduced, mostly with a focus on “highly cited papers,” “most frequently cited papers,” or “top cited papers,” by the methodology of similarity. According to the review of Bornmann (2014), some different methods have been used in scientometric studies to identify excellent papers. However, most methods are applied in arbitrary ways by artificially setting proportions, such as setting excellence at top 1%, top 5%, or top 10%. In this study, we propose a simple and natural method for identifying top units at fixed proportions, based on the h-index concept.

2 Methodology

Suppose S denotes sources (e.g., publications, P) and T denotes items (e.g., citations, C) in a source-item model (Egghe, 2005) as well as R denotes the order number of sources ranked by items and TR denotes the number of items of source R. Then, there exists the number series:

R=(1,2R,Z),(1)

T=(T1,T2,TR,TZ:)T1T2TRTZ.(2)

The h-index is defined as

h=max{R:RTR}.(3)

If there is an h-index series {hr(r =,1,2,…r..)} and we re-rank the h-index series from high to low value as

r=(1,2,r,x),(4)

hr=(h1,h2,hr,hx:)h1h2hrhx,(5)

then we obtain the second-order h-index of this h-index series as

hT=max{r:rhr}.(6)

The procedure is illustrated in Figure 1, in which each Pi–Ci (i=1, 2, … r…s) plane contributes a single h-index. Linking all h-indices in all Pi–Ci planes and projecting them onto the hr–r plane, a distributed curve of h-indices emerges (let us call it the h-curve). The second-order h-index is the h-index of the h-curve.

The projective curve of h-indices and the second-order h-index
Figure 1

The projective curve of h-indices and the second-order h-index

In Figure 1, the h-indices {h1, h2, h3…} are located on different planes {C1O1P1, C2O2P2, C3O3P3,…}. The projective curve of all h-indices defines the h-curve {linking points h1, h2, h3…hT…}. The h-index hT of the h-curve (crossing point of the h-curve and line OA) is the second-order h-index, which provides a simple method to identify top units. We call the core of the second-order h-index as h-top.

Definition

The second-order h-index in a ranked h-index series {hr (r =,1,2,…r)} is equal to hT. if hT is the largest natural number such that the h-index reaches hT with the corresponding h-index equaling at least hT. The second-order h-core is defined as h-top including units with hhT.

When the first-order h-index is h and the second-order h-index is hT; in any system, hT/h can be defined as the radio of the first-order to the second-order h-index.

A second-order variant can not only be defined for the h-index itself but also for variants of the h-index. For example, one of the most important variants is the g-index which is defined as the largest number n of highly cited publications, for which the mean number of citations is at least n (Egghe, 2006).

If there is a cumulative series corresponding to (1) and (2)

CT=(CT1,CT2,CTR,CTZ);CT1=T1;CTR=i=1RCTi,(7)

then the g-index is defined as

g2=max{R2:R2CTR}.(8)

The second-order g-index can be defined on a cumulative series following (3) and (4) as

Chr=(Ch1,Ch2,Chr,Chx);Ch1=h1;Chr=i=1rhi(9)

leading to

gT2=max{r2:r2Chr}.(10)

In a series of percentile classes focusing on highly cited papers, one can set {top 1%, top 2%, top 3%,…} series constructing an artificially processed series. As h-top is a naturally fixed number, it becomes a natural definite top in the series of highly cited classes. As the processed objects keep concordance, the second-order h-type indices and h-top are unique.

We will illustrate the proposed method with the following empirical cases.

3 Empirical cases

For the empirical examples, we extracted publications from the Web of Science (WoS), including Science Citation Index – Expanded (SCI-E), Social Science Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI). We downloaded papers with the document type Article, Letter, and Review published between 2001 and 2011 (in May 12, 2016). Two fields were selected as examples: one field is Mathematics (Math), covering 425 journals, and the other field is Library and Information Science (LIS), covering 103 journals. The following results are based on publications and their citations. The journal h-indices are used as first-order h-indices, and the second-order h-cores of these journals are listed in the Appendix.

3.1 (1) h-top journals in Mathematics

Ranking the h-indices of math journals, we obtained hT=34 in the h-series. The top 34 journals are shown in Figure 2.

h-top journals in Mathematics.
Figure 2

h-top journals in Mathematics.

Figure 2 and the data in the Appendix show the h-top journals in Mathematics, for example, the Journal of Mathematical Analysis and Application, the Annals of Mathematics, Communication on Pure and Applied Mathematics, the Journal of the American Mathematical Society, etc.

With 34/425=0.08, the h-top of the math journals refers to the top 8% journals in the field of Mathematics.

3.2 (2) h-top journals in LIS

Ranking the h-indices of LIS journals, we obtained hT=28. The top 28 journals are shown in Figure 3.

h-top journals in library and information science.
Figure 3

h-top journals in library and information science.

With 28/103=0.27, the h-top of the LIS journals is 27%. Among these journals, there are MIS Quarterly, Scientometrics, the Journal of Informetrics, the Journal of Information Science, and JASIST, etc.

As both examples show, h-tops refer to different top percentages, but each h-top is field-specifically fixed. This is a natural way to generate h-top.

Here, we see that a larger h-set produces a smaller h-top in proportion (8% in 425 Math journals), while a smaller h-set produces a larger h-top in proportion (27% in 103 LIS journals).

4 Analysis and Discussion

In this section, we discuss both static and dynamic cases.

4.1 (1) Static case

Let h(r) be the second-order h-curve in the continuous case, its h-core (as h-top) is

RT2=1hTh(r)dr,(11)

which is equal to the integral area of the h-curve in the h-top.

If all units of the h-series are equal to x (number of units), then the second-order h-tail will be

tT2=hTxh(r)dr.(12)

Historically, we have three theoretical models for estimating the h-index.

In Hirsch’s original paper (Hirsch, 2005), the mathematical model for the h-index is given as follows:

h=Ca,(13)

where C is the total number of citations and a is a constant ranging between 3 and 5.

Egghe and Rousseau derived the Egghe–Rousseau model (Egghe and Rousseau, 2006) in the framework of the Lotkaian informetrics, which can be re-written as

h=P1/α,(14)

where P is the total number of publications and α>1 is the Lotka’s exponent.

Glänzel and Schubert proposed the Glänzel–Schubert model (Glänzel, 2006; Schubert and Glänzel, 2007) with the formula:

h=cP1/3(C/P)2/3(15)

in which C/P isassociated with the Journal Impact Factor (JIF) and c is a constant near 1.

Under Heaps’ law of Herdan’s law (Egghe, 2007), the three models can be unified (Ye, 2011), whereby the h-index is linked to total items (such as citations, C) and total sources (such as publications, P) following the formula:

h=cP1/(α+1)(C/P)α/(α+1),(16)

where α>1 is Lotka’s exponent and c>0 is a constant.

With H2 items and sources X in a second-order h-curve, it results in

hT=cX1/(α+1)(H2/X)α/(α+1),(17)

where α>1 is Lotkaian exponent and c>0 is a constant.

In the framework of the Loktaian informetrics, Eq. (21) can be simplified by using the Egghe–Rousseau formula, i.e.,

hT=X1/α(18)

By using the Egghe–Rousseau formula with α=2 and P=100, we estimate h=10 according to Eq. (14). When α=2 and X=10, we estimate hT ≈3.3 according to Eq. (18). This means that the first-order h-core refers to 10% and the second-order h-top to about 3% of the sources. The ratio of the first-order h-core to the second-order h-top is 3/10=30%. Since the Egghe–Rousseau formula is highly simplified and is used only as a reference in this study, the estimated values can be referenced only.

Let us record h_H, h_E-R, and h_G-S as the Hirsch estimate, the Egghe–Rousseau estimate, and the Glänzel-Schubert estimate of the h-index. Suppose α=2, a=5, and c=1, we obtain the following estimates as theoretical reference values of the h-index (Ye, 2011):

hH(C/5)1/2(19)

hERP1/2(20)

hGSP1/3(C/P)2/3(21)

Using our empirical cases, we computed the theoretical estimations based on the original data (P and C, c.f. Appendix). The results are shown in Figures 4 and 5.

Three estimations upon h-index of Math journals.
Figure 4

Three estimations upon h-index of Math journals.

Three estimations upon h-index of LIS journals.
Figure 5

Three estimations upon h-index of LIS journals.

Visually, the Glänzel–Schubert estimation and the Hirsch estimation look better than the Egghe–Rousseau estimation. The Egghe–Rousseau formula is strictly limited by á=2 in the fitting. This situation has been discussed by Ye (2011) and can be quantitatively measured by Pearson correlation coefficients. Table 1 shows that the Glänzel–Schubert estimation and the Hirsch estimation correlate higher with the real h than the Egghe–Rousseau estimation.

Table 1

Pearson correlation coefficients with p-values.

The analytical results in the table reveal that both the Glänzel–Schubert estimation and the Hirsch estimation can be applied as a theoretical reference for computing the h-index.

However, in the second-order case, only sources X show clear numbers, so that it is convenient to apply the Egghe–Rousseau estimation. The comparable results are shown in Table 2, where α=2.

Table 2

Egghe–Rousseau estimation of h-top in two cases.

We see that the Egghe–Rousseau estimates are not correct in two cases; both are smaller than the practical values. An important reason for the result is the choice of α=2. Generally, 1<α<3⍰X ≥ 1, and hT ≥ 1 link with α as

α1α<logX(22)

or

11α<loghT.(23)

These are static results.

4.2 (2) Dynamic case

Following Egghe (2007), the dynamic h-index is

h=((1at)α1P)1/α,(24)

where t is the time period, a is the aging rate, and α>1 is the Lotkaian exponent (in the Lotkaian informetrics).

Since the calculations are made in a single field, the dynamic second-order h-index is

hT=((1at)α1X)1/α.(25)

When X is a constant and α>1 is stable in the field, changes over time are

hT(t)=dhTdt=(1at)1αX1αatlna.(26)

For all t≥ 0, hT’(t) > 0 and hT’’(t) < 0, there is

limthT(t)=X1/α,(27)

which means that hT(t)is a concavely increasing function for a fixed X, α, and a. This is a dynamic reference system. Taken all together, the analyses reveal that the second-order h-index and h-top are relatively unique, simple, and robust, such as the first-order h-index and h-core. With h-top, a core can be efficiently extracted from large datasets. Thus, h-top might be especially useful in the analysis of big-data. However, the second-order h-index and h-top take – as simple concepts – only a few information into account, and both can only provide ‘core’ information.

5 Conclusions

The second-order h-index hT can be differentiated from the h-index of highly cited papers by finding a fixed value in the series of highly cited percentile classes. According to a rough estimation on the basis of the Egghe–Rousseau formula, hT approximately indicates the top 3% if h denotes the 10% core. This means that the second-order h-index assigns 30% of the first-order h-core to h-top.

We studied the journals from two fields empirically. The results show a percentage of 8% for h-top in Mathematics and 27% in LIS. These values are smaller than the theoretically expected values. The exploration of reasons for the differences between expectations and empirical results is a question for future research.

Differentiating from the series of highly cited percentile classes, which are artificially defined, the h-top is defined as the natural definite top in the series of highly cited classes. The second-order h-index and h-top have unique fixed values, which is beneficial to other methods based on arbitrarily set proportions.

Although both the second-order h-index hT and the h-index of highly cited papers can be used as top indicators, they reflect different concepts, whereas the second-order h-index hT measures h-top, the h-index of highly cited papers represents the h-core in highly cited papers. Both top metrics can be applied to any informetric unit.

Note that the use and comparison of the second-order h-type indicators are only applicable in one and the same field. If one wants to compare citation impact across different fields, field-normalized indicators have to be used.

Acknowledgements

We acknowledge the National Natural Science Foundation of China Grant No. 71673131 and Jiangsu Key Laboratory Fund for financial supports and thank Mr. Eric P. Qi for the data collection.

References

  • Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E., & Herrera, F. (2009). h-Index: a review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273-289. Web of ScienceCrossrefGoogle Scholar

  • Bornmann, L. (2013). How to analyse percentile citation impact data meaningfully in bibliometrics: the statistical analysis of distributions, percentile rank classes and top-cited papers. Journal of the American Society for Information Science and Technology, 64, 587–595. CrossrefGoogle Scholar

  • Bornmann, L. (2014). How are excellent (highly cited) papers defined in bibliometrics? A quantitative analysis of the literature. Research Evaluation, 23, 166–173. CrossrefWeb of ScienceGoogle Scholar

  • Bornmann, L., Mutz, R.& Daniel, H.-D.(2008). Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59(5), 830-837. CrossrefWeb of ScienceGoogle Scholar

  • Bornmann, L., Mutz, R., Hug, S. E. & Daniel, H.-D. (2011). A multi level meta-analysis of studies reporting correlations between the h index and 37 different h index variants. Journal of Informetrics, 5(3), 346-359. CrossrefGoogle Scholar

  • Egghe, L. (2005), Power laws in the information production process: Lotkaian informetrics. Elsevier, Oxford. Google Scholar

  • Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131-152 CrossrefGoogle Scholar

  • Egghe, L. (2007), Dynamic h-index: the Hirsch index in function of time. Journal of the American Society for Information Science and Technology, 58(3), 452-454 Web of ScienceCrossrefGoogle Scholar

  • Egghe, L. (2008). Examples of simple transformations of the h-index: Qualitative and quantitative conclusions and consequences for other indices. Journal of Informetrics, 2: 136-148.Web of ScienceCrossrefGoogle Scholar

  • Egghe, L. (2010). The Hirsch index and related impact measures. Annual Review of Information Science and Technology, 44, 65-114. CrossrefWeb of ScienceGoogle Scholar

  • Egghe, L. & Rousseau, R. (2006), An informetric model for the Hirsch-index. Scientometrics, 69(1), 121-129. Google Scholar

  • Egghe, L. & Rousseau, R. (2012). Theory and practice of the shifted Lotka function. Scientometrics, 91(1), 295-301.CrossrefWeb of ScienceGoogle Scholar

  • Glänzel, W. (2006), On the h-index – A mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67(2), 315-321. Google Scholar

  • Glänzel, W. (2012). The role of core documents in bibliometric network analysis and their relation with h-type indices. Scientometrics, 93(1), 113-123. Web of ScienceCrossrefGoogle Scholar

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 6569-16572. Google Scholar

  • Jin, B.H., Liang, L.M., Egghe, L. & Rousseau, R. (2007). The R- and AR-indices: Complementing the h-index. Chinese Science Bulletin, 52(6), 855-863.Web of ScienceCrossrefGoogle Scholar

  • Norris, M., & Oppenheim, C. (2010). The h-index: a broad review of a new bibliometric indicator. Journal of Documentation, 66(5), 681-705. Web of ScienceCrossrefGoogle Scholar

  • Prathap, G. (2006). Hirsch-type indices for ranking institutions’ scientific research output. Current Science, 91(11), 1439. Google Scholar

  • Ruane, F., & Tol, R. (2008). Rational (successive) h -indices: an application to economics in the Republic of Ireland. Scientometrics, 75(2), 395-405. Web of ScienceCrossrefGoogle Scholar

  • Schubert, A. (2007). Successive h-indices. Scientometrics, 70 (1), 201–205. CrossrefWeb of ScienceGoogle Scholar

  • Schubert, A. and Glänzel, W. (2007), A systematic analysis of Hirsch-type indices for journals. Journal of Informetrics, 1(2), 179-184. Google Scholar

  • Schubert, A., Korn, A., & Telcs, A. (2009). Hirsch-type indices for characterizing networks. Scientometrics, 78(2), 375–382. CrossrefWeb of ScienceGoogle Scholar

  • Ye, F.Y. (2009). An investigation on mathematical models of the h-index. Scientometrics, 2009, 81(2), 493-498. Web of ScienceCrossrefGoogle Scholar

  • Ye, F. Y. (2011). A unification of three models for the h-index. Journal of the American Society for Information Science and Technology, 62(1), 205–207. Web of ScienceCrossrefGoogle Scholar

  • Ye, F. Y.& Rousseau, R. (2008), The power law model and total career h-index sequences. Journal of Informetrics, 2, 288-297.Web of ScienceGoogle Scholar

  • Ye, F. Y. & R. Rousseau (2010). Probing the h-core: an investigation of the tail-core ratio for rank distributions. Scientometrics, 84(2), 431-439.CrossrefWeb of ScienceGoogle Scholar

  • Zhao, S.X., Rousseau, R., & Ye, F.Y. (2011). h-Degree as a basic measure in weighted networks. Journal of Informetrics, 5(4), 668-677. CrossrefWeb of ScienceGoogle Scholar

Appendix: Datasets for calculating the h-top examples

A1. The top journal set in the field of mathematics

A2. The top journal set in the field of library and information science

Footnotes

    About the article

    Received: 2017-11-13

    Accepted: 2017-12-18

    Published Online: 2018-04-24


    Citation Information: Data and Information Management, ISSN (Online) 2543-9251, DOI: https://doi.org/10.1515/dim-2017-0011.

    Export Citation

    © 2018 Fred Y. Ye, Lutz Bornmann. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

    Comments (0)

    Please log in or register to comment.
    Log in