Getting Credit for Data Creation and Curation

New Features at the Arctic Data Center Support Registration of Dataset Citations and Usage.

Journals, funding agencies, and researchers are increasingly acknowledging the importance of making data publicly available (see this book by the Make Data Count initiative for a general discussion of the landscape). Benefits of open data practices include visibility of research, reproducibility of results, prevention of effort duplication, and the possibility to conduct new, innovative types of high quality research with aggregate datasets. In such an open science landscape, data citation practices are crucial for giving data creators credit for their work. The Make Data Count initiative additionally encourages researchers to cite data for the purposes of increased research discovery by driving traffic between data and articles, and generation of reliable open data metrics for use by all research stakeholders (Lowenberg et al., 2019). According to the Scholix interoperability initiative, the role of data repositories in this process should be to generate usage and citation metrics for the datasets they host and share them with community ‘hubs’ such as OpenAIRE, CrossRef, and DataCite (Cousijn et al., 2019). Per these recommendations by the larger data citation research community, the Arctic Data Center has taken multiple steps towards producing data citation information for all datasets in our collection, including a new feature enabling dataset owners to directly register citations to their datasets.

Supporting Data Citation at the Arctic Data Center

Using the scythe R package developed by our team, we regularly query journal publishers for citations that include the DOI of any Arctic Data Center dataset and register those connections as dataset citations. We also conducted a programmatic text search for citation mentions over all of our dataset abstracts, since some researchers use the abstracts to refer to publications affiliated with their data.

Though we’ve made progress with these methods, tracking all dataset use in publications is a very difficult task to complete programmatically, since in many cases, data that is used in a publication isn’t formally cited. According to a paper by Belter (2014), oceanographic datasets were more often informally mentioned in the body of an article rather than formally cited in the Acknowledgments or Reference sections. Another study by Zhao et al. (2017) found that datasets used in science publications were only cited 6% of the time and referred to using their DOI 9% of the time, with the rest of the references using language that is less standardized, traceable, or permanently identifiable. Data use is difficult to track in this landscape, and we know formal data citations aren’t telling the full story of how often data is relied on in scientific publications.

Individual researchers and data owners can help us with this. That is why we recently implemented a “Register Citation” feature allowing researchers to register known citations to their datasets. Researchers may register a citation for any occasions where they know a certain publication uses or refers to a certain dataset, and the citation will be viewable on the dataset profile within 24 hours. 

To register a citation, navigate to the dataset using the DOI and click on the citations tab. Once there, a dialog box will pop up and you’ll be able to register the citation with us. Click the ‘register citation’ link and you’ll see a very simple form asking for the DOI of the paper and if the paper CITES the dataset (that is, the dataset is explicitly identified or linked to somewhere in the text or references) or USES the dataset (that is, uses the dataset but doesn’t formally cite it).

The register citation feature.

Moving forward

We plan to integrate our data citation systems with DataCite, which would make Arctic Data Center data citations available through CrossRef and, two DOI registration systems connected with many major publishers worldwide that enable cross-publisher citation linking. We’re also looking to continue developing our programmatic search for citations with different text mining techniques that would identify citations in varied contexts, and expanding the pool of publications we search across (currently we query SCOPUS, Elsevier, and PubMed for citations).

We hope that this information is helpful to you. Our goal with this initiative is to foster the growth and improvement of data citation practices in the Arctic science community. You are welcome to reach out to us at with any feedback or questions about these new features.


Belter, C. W. (2014). Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets. PLoS ONE, 9(3). doi:10.1371/journal.pone.0092590

Cousijn, H., Feeney, P., Lowenberg, D., Presani, E., & Simons, N. (2019). Bringing Citations and Usage Metrics Together to Make Data Count. Data Science Journal, 18(1), 9. doi:10.5334/dsj-2019-009

Lowenberg D, Chodacki J, Fenner M, Kemp J, Jones MB (2019) Open Data Metrics: Lighting the Fire.

Zhao, M., Yan, E., Li, K. (2017). Data set mentions and citations: A content analysis of full-text publications. Journal of the Association for Information Science and Technology, 69(1), 32-46. doi:10.1002/asi.23919

Written by Maya Samet
Data Science Fellow at the Arctic Data Center