The Arctic Data Center has created a variety of open access curricula that guides individuals on topics including open data, data ethics, software, methodology, and analysis. This course is taught in person, and the content is also available for personal use. Each section in the Fundamentals of Data Management resource encompasses the research and data life cycle processes. This hands-on guide, with survey and environmental data, has been designed for those with research interests in the social sciences and limited R experience. There is also content on ethical and reproducible research practices.
Open Data
Data Ethics
Social Science
Intro to R
Intro to Git
Data Analysis
Publishing Data
Team Collaboration
Open Data
- Open Data and Reproducibility: This section defines and highlights the importance of open data and open science.
- Writing Data Management Plans: Data management plans may be required when submitting research proposals or IRB applications. There are helpful tools that guide this process such as the dmptool.org.
- Data Portals: Data portals are hosted by the Arctic Data Center’s website and create the opportunity for users to gather published data.
Data Ethics
- Data Ethics: This section provides insight on data ethics in the context of open science and the Arctic Data Center.
- Human Subjects Research Considerations: This section discusses general Institutional Review Board (IRB) requirements and research best practices for working with people and Indigenous communities.
- Open Data and Ethics Summary: A review of open data and ethics can be found in this section.
Intro to R
- Introduction to R: R is an open source statistical software that reads in data and performs statistical analyses.
- Introduction to RMarkdown: RMarkdown promotes a reproducible workflow as it is an environment that combines statistics and report writing and can be integrated with GitHub for further collaboration.
Intro to Git
- Introduction to Git: The following link will direct users to a more in-depth explanation on the mechanics of git, Github, and R in the form of an online book. The chapter begins with step-by-step instructions on how to connect git to RStudio, and then dives into the reasons why people use git and GitHub while providing examples along the way.
- git Collaboration and Conflicts: This guide furthers the conversation on git by providing guidance on git and team collaborations.
Data Analysis
- Data Modeling Essentials: Data modeling requires tidy data to ensure that the computer is correctly understanding the intended analysis. An example of tidy data is having individual csv files for each entity measured along with concise and understandable column names.
- Cleaning and Manipulating Data: Prior to uploading data into a programming language, having data that is readable by the computer is an important step.
- Data Visualization: R packages are a tool that can help simplify code because an R package encompasses prewritten code. There are data visualization R packages that can be downloaded from an R script. An example of an R package being used is downloading ggplot2 to create histograms.
- Geospatial Analysis: Geospatial analysis is an avenue that can be explored for data that has location and is mappable.
Publishing Data
- Data Publishing: One of the last stages during research can be data publishing. This tutorial highlights how to publish data at the Arctic Data Center.
- Provenance and Reproducibility: This section defines provenance and reproducibility along with providing an example of an RMarkdown document.
Team Collaboration
- Thinking Preferences: When working in a team dynamic, it can be helpful to understand the team’s “thinking preferences” via the Whole Brain Game.
- Collaboration, Authorship, & Data Policies: Policies on collaboration, authorship, and data can help ensure that all team members are on the same page.
Social Science Methodology