Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 29:4:6496.
doi: 10.1038/srep06496.

Network-based statistical comparison of citation topology of bibliographic databases

Affiliations

Network-based statistical comparison of citation topology of bibliographic databases

Lovro Šubelj et al. Sci Rep. .

Abstract

Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Profile of citation networks extracted from bibliographic databases.
Panels (A–F) show different distributions, plots and profiles of citation networks extracted from bibliographic databases. These are (from left to right): the field bow-tie decompositions, where the arrows illustrate the direction of the links and the areas of components are proportional to the number of nodes contained; the degree, in-degree and out-degree distributions P(k), P(kin) and P(kout), respectively; the corresponding neighbour connectivity plots N(k), N(kin) and N(kout); the clustering profiles of the standard and both unbiased coefficients C(k), B(k) and D(k), respectively; and the hop plots for the standard and undirected diameters δ and δ′, respectively (see Methods).
Figure 2
Figure 2. Comparison of bibliographic databases through statistics of citation networks.
Panels (A–F) show studentized statistics residuals of citation networks extracted from bibliographic databases. The residuals are listed in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student t-tests (labelled with respective P-values). Panel (G) shows the residuals of merely independent statistics, where the shaded region is 95% confidence interval. Panel (H) shows pairwise Spearman correlations of independent statistics listed in the same order as in panel (G) (left) and the P-values of the corresponding Fisher independence z-tests (right). Panel (I) shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically significant inconsistencies at P-value = 0.05 (see Methods).
Figure 3
Figure 3. Comparison of bibliographic and online databases through statistics of networks.
Panels (A–D) show studentized statistics residuals of citation networks extracted from bibliographic databases, while panels (E) and (F) show residuals of social and technological networks extracted from online databases. The residuals are listed in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student t-tests (labelled with respective P-values). Panel (G) shows the residuals of merely independent statistics, where the shaded region is 95% confidence interval. Panel (H) shows pairwise Spearman correlations of independent statistics listed in the same order as in panel (G) (left) and the P-values of the corresponding Fisher independence z-tests (right). Panel (I) shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically significant inconsistencies at P-value = 0.05 (see Methods).

Similar articles

Cited by

References

    1. Ginsparg P. ArXiv at 20. Nature 476, 145–147 (2011). - PubMed
    1. Ley M. The DBLP computer science bibliography: Evolution, research issues, perspectives. In: Proceedings of the International Symposium on String Processing and Information Retrieval, 1–10 (London, UK, 2002).
    1. Bollacker K. D., Lawrence S. & Giles C. L. CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the International International Conference on Autonomous Agents, 116–123 (Minneapolis, MN, USA, 1998).
    1. McCallum A. K., Nigam K., Rennie J. & Seymore K. Automating the construction of internet portals with machine learning. Inform. Retrieval 3, 127–163 (2000).
    1. Wang D., Song C. & Barabási A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013). - PubMed

Publication types

LinkOut - more resources