In early October, the Arctic Data Center held its third Data Science Training program of 2019, hosting US-based Arctic researchers with research activities spanning pan-arctically across Alaska, Canada, and Siberia. Fifteen participants, across eleven institutions, and from a wide range of data privacy and open source experience levels, came together striving to support one another while learning from the Arctic Data Center instructors. It was an intensive and energizing week. Participants worked to both understand each other’s research methodology and to enhance their own skill sets with an emphasis on the importance of open and reproducible science.
“I found utility in this course far more than its advertised value and that is why I love it.”
– Andreas Muenchow, Physical Oceanographer and Glaciologist in coastal Greenland, Alaska, and Siberia
In addition to introducing researchers to the Arctic Data Center, ensuring they are familiar with NSF Arctic data policies and are aware of the data documentation, preservation and other support services provided by the Center, the data science training provides a comprehensive overview of fundamental topics in data management and reproducible science. Across a five day period, researchers were introduced to best practices for data and metadata, data documentation and publishing, RStudio and Git/GitHub, R and RMarkdown, version control, collaboration and conflict management, data modeling and tidy data, data cleaning and manipulation, data management planning, data visualization, and reproducibility and provenance. Additionally, skills regarding diversity in thinking preferences, data and authorship policy development, and other aspects of collaboration were integrated.
Researchers participated in a shared learning experience and gleaned their own unique appreciation of what the course had to offer, aligning with their wide ranging backgrounds and research interests. Below some participants share their experiences from the course.
Michael Sousa is an Alaska researcher and soil science graduate student at the University of Minnesota. He almost exclusively uses R in his lab. Michael was excited that the training was primarily R focused and that the use of open access software was emphasized because “it holds one-another accountable”. Michael believes that having your data open and using open source software is the best way to ensure your work is reproducible and that others won’t run into the roadblock of restricted access for a certain software.
Jennie DeMarco, a Siberian Arctic researcher and a professor at Western University, is in the midst of writing a grant and found the training to be timely, particularly with regard to using the DMPTool for creating a data management plan.
A dual US-Canadian and an associate professor at the University of North Dakota, Timothy Pasch, has spent time travelling and living with Iñuit and Iñupiat people in Nunavut and Alaska, gathering sovereign human subjects data regarding small business development and incorporating traditional beliefs in language with technology. Something that made him eager to be a part of the training is that he is able to learn how to share data, in addition to being able to differentiate between open science and sovereign science. He mentioned,
“In terms of human subjects data, we can’t just use the cloud as a repository, we really need secure spaces that cannot be hacked, and the Arctic Data Center provides a place for that.”
For Siberian Arctic researcher and postdoc at Colgate University, Anna Talucci, a big take-away from the Arctic Data Center training was that she felt like she had “better tools to work with undergraduate researchers…[and that] being able to give them good habits to start with, is a really great thing.”
Leslie Hartten, a research meteorologist, associated with NOAA and CIRES, works with various scientific communities on merged observatory data files. She noted that it can often be difficult to agree upon a list of semantics, or variables on how things are to be defined. She commented that the Arctic Data Center team have a “mix of the technical computer oriented meta-data knowledge, the philosophical open data knowledge, and scientific knowledge,” needed to have a clear and collaborative data conversation. She found that she is “getting language that [she] can use in [these scientific community] discussions in the future.”
From the perspective of the Arctic Data Center, these short courses provide an invaluable opportunity to interact with a diversity of researchers and understand their needs, both as a data center and also in terms of facilitating learning around open science. We value their perspectives, their desire to take this course and their willingness to disseminate this knowledge in their own research communities. As Matt Jones, the Principal Investigator at the Arctic Data Center puts it,
“We are embedded in a global societal need for reproducible science.”
Having unique perspectives and backgrounds of pan-Arctic researchers in attendance will help us move forward in the realm of reproducible science and synthesis for Arctic data. We look forward to meeting and interacting with the future researchers that come our way!
Support was provided by the Arctic Data Center as part of NSF award number 1546024 to cover the costs of participation for eligible Arctic researchers.
Cézanna Semnacher,
Community Engagement and Outreach Coordinator
Personal perspective: As a newer member of the Arctic Data Center team, I was high in spirits getting to take a course on R with a diverse group of arctic researchers. A newcomer to R, and programming in general, I found the course to be gentle in its introduction, yet rewardingly vigorous in what I was learning over the course of five days. Matt, Amber, and Jeanette did an impeccable job guiding our bimodal-ed cohort.