Network-based statistical comparison of citation topology of bibliographic databases

doi:10.1038/srep06496

. 2014 Sep 29:4:6496.

doi: 10.1038/srep06496.

Network-based statistical comparison of citation topology of bibliographic databases

Lovro Šubelj¹, Dalibor Fiala², Marko Bajec¹

Affiliations

¹ University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, SI-1000 Ljubljana, Slovenia.
² University of West Bohemia, Faculty of Applied Sciences, Univerzitní 8, CZ-30614 Plzeň, Czech Republic.

PMID: 25263231
PMCID: PMC4178292
DOI: 10.1038/srep06496

Network-based statistical comparison of citation topology of bibliographic databases

Lovro Šubelj et al. Sci Rep. 2014.

. 2014 Sep 29:4:6496.

doi: 10.1038/srep06496.

Authors

Lovro Šubelj¹, Dalibor Fiala², Marko Bajec¹

Affiliations

¹ University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, SI-1000 Ljubljana, Slovenia.
² University of West Bohemia, Faculty of Applied Sciences, Univerzitní 8, CZ-30614 Plzeň, Czech Republic.

PMID: 25263231
PMCID: PMC4178292
DOI: 10.1038/srep06496

Abstract

Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies.

PubMed Disclaimer

Figures

**Figure 1. Profile of citation networks extracted from bibliographic databases.**
Panels (A–F) show different distributions, plots and profiles of citation networks extracted from bibliographic databases. These are (from left to right): the field bow-tie decompositions, where the arrows illustrate the direction of the links and the areas of components are proportional to the number of nodes contained; the degree, in-degree and out-degree distributions P(k), P(*k_in*) and P(*k_out*), respectively; the corresponding neighbour connectivity plots N(k), N(*k_in*) and N(*k_out*); the clustering profiles of the standard and both unbiased coefficients C(k), B(k) and D(k), respectively; and the hop plots for the standard and undirected diameters δ and δ′, respectively (see Methods).

**Figure 2. Comparison of bibliographic databases through statistics of citation networks.**
Panels (A–F) show studentized statistics residuals of citation networks extracted from bibliographic databases. The residuals are listed in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student t-tests (labelled with respective P-values). Panel (G) shows the residuals of merely independent statistics, where the shaded region is 95% confidence interval. Panel (H) shows pairwise Spearman correlations of independent statistics listed in the same order as in panel (G) (left) and the P-values of the corresponding Fisher independence z-tests (right). Panel (I) shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically significant inconsistencies at P-value = 0.05 (see Methods).

**Figure 3. Comparison of bibliographic and online databases through statistics of networks.**
Panels (A–D) show studentized statistics residuals of citation networks extracted from bibliographic databases, while panels (E) and (F) show residuals of social and technological networks extracted from online databases. The residuals are listed in decreasing order, while the shaded regions are 95% and 99% confidence intervals of independent Student t-tests (labelled with respective P-values). Panel (G) shows the residuals of merely independent statistics, where the shaded region is 95% confidence interval. Panel (H) shows pairwise Spearman correlations of independent statistics listed in the same order as in panel (G) (left) and the P-values of the corresponding Fisher independence z-tests (right). Panel (I) shows the critical difference diagram of Nemenyi post-hoc test for the independent statistics. The diagram illustrates the overall ranking of the databases, where those connected by a thick line show no statistically significant inconsistencies at P-value = 0.05 (see Methods).

See this image and copyright information in PMC

Cited by

Evaluating the impact of citations of articles based on knowledge flow patterns hidden in the citations.
Wang M, Zhang J, Jiao S, Zhang T. Wang M, et al. PLoS One. 2019 Nov 21;14(11):e0225276. doi: 10.1371/journal.pone.0225276. eCollection 2019. PLoS One. 2019. PMID: 31751395 Free PMC article.
Adherence to reporting guidelines increases the number of citations: the argument for including a methodologist in the editorial process and peer-review.
Vilaró M, Cortés J, Selva-O'Callaghan A, Urrutia A, Ribera JM, Cardellach F, Basagaña X, Elmore M, Vilardell M, Altman D, González JA, Cobo E. Vilaró M, et al. BMC Med Res Methodol. 2019 May 31;19(1):112. doi: 10.1186/s12874-019-0746-4. BMC Med Res Methodol. 2019. PMID: 31151417 Free PMC article.
Exploring Spatio-temporal Dynamics of Cellular Automata for Pattern Recognition in Networks.
Miranda GHB, Machicao J, Bruno OM. Miranda GHB, et al. Sci Rep. 2016 Nov 22;6:37329. doi: 10.1038/srep37329. Sci Rep. 2016. PMID: 27874024 Free PMC article.
A Unified Framework for Complex Networks with Degree Trichotomy Based on Markov Chains.
Hui DSW, Chen YC, Zhang G, Wu W, Chen G, Lui JCS, Li Y. Hui DSW, et al. Sci Rep. 2017 Jun 16;7(1):3723. doi: 10.1038/s41598-017-03613-z. Sci Rep. 2017. PMID: 28623348 Free PMC article.
On entropy research analysis: cross-disciplinary knowledge transfer.
Basurto-Flores R, Guzmán-Vargas L, Velasco S, Medina A, Calvo Hernandez A. Basurto-Flores R, et al. Scientometrics. 2018;117(1):123-139. doi: 10.1007/s11192-018-2860-1. Epub 2018 Aug 6. Scientometrics. 2018. PMID: 30237641 Free PMC article.

See all "Cited by" articles

References

1. Ginsparg P. ArXiv at 20. Nature 476, 145–147 (2011). - PubMed
1. Ley M. The DBLP computer science bibliography: Evolution, research issues, perspectives. In: Proceedings of the International Symposium on String Processing and Information Retrieval, 1–10 (London, UK, 2002).
1. Bollacker K. D., Lawrence S. & Giles C. L. CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the International International Conference on Autonomous Agents, 116–123 (Minneapolis, MN, USA, 1998).
1. McCallum A. K., Nigam K., Rennie J. & Seymore K. Automating the construction of internet portals with machine learning. Inform. Retrieval 3, 127–163 (2000).
1. Wang D., Song C. & Barabási A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013). - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ginsparg P. ArXiv at 20. Nature 476, 145–147 (2011). - PubMed

[2] Ginsparg P. ArXiv at 20. Nature 476, 145–147 (2011). - PubMed

[3] Ley M. The DBLP computer science bibliography: Evolution, research issues, perspectives. In: Proceedings of the International Symposium on String Processing and Information Retrieval, 1–10 (London, UK, 2002).

[4] Ley M. The DBLP computer science bibliography: Evolution, research issues, perspectives. In: Proceedings of the International Symposium on String Processing and Information Retrieval, 1–10 (London, UK, 2002).

[5] Bollacker K. D., Lawrence S. & Giles C. L. CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the International International Conference on Autonomous Agents, 116–123 (Minneapolis, MN, USA, 1998).

[6] Bollacker K. D., Lawrence S. & Giles C. L. CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the International International Conference on Autonomous Agents, 116–123 (Minneapolis, MN, USA, 1998).

[7] McCallum A. K., Nigam K., Rennie J. & Seymore K. Automating the construction of internet portals with machine learning. Inform. Retrieval 3, 127–163 (2000).

[8] McCallum A. K., Nigam K., Rennie J. & Seymore K. Automating the construction of internet portals with machine learning. Inform. Retrieval 3, 127–163 (2000).

[9] Wang D., Song C. & Barabási A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013). - PubMed

[10] Wang D., Song C. & Barabási A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Network-based statistical comparison of citation topology of bibliographic databases

Affiliations

Network-based statistical comparison of citation topology of bibliographic databases

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous