Fundamentals of Data Management

Introduction

The Arctic Data Center has created a variety of open access curricula that guides individuals on topics including open data, data ethics, software, methodology, and analysis. This course is taught in person, and the content is also available for personal use. Each section in the Fundamentals of Data Management resource encompasses the research and data life cycle processes. This hands-on guide, with survey and environmental data, has been designed for those with research interests in the social sciences and limited R experience. There is also content on ethical and reproducible research practices.

Table of Contents

Open Data

Open Data and Reproducibility
This section defines and highlights the importance of open data and open science.

Writing Data Management Plans
Data management plans may be required when submitting research proposals or IRB applications. There are helpful tools that guide this process such as the dmptool.org.

Data Portals
Data portals are hosted by the Arctic Data Center’s website and create the opportunity for users to gather published data.

Data Ethics

Data Ethics
This section provides insight on data ethics in the context of open science and the Arctic Data Center.

Human Subjects Research Considerations
This section discusses general Institutional Review Board (IRB) requirements and research best practices for working with people and Indigenous communities.

Open Data and Ethics Summary
A review of open data and ethics can be found in this section.

Social Science Methodology

Reproducible Survey Workflows
This guide demonstrates how various survey platforms including Qualtrics, SurveyMonkey, and GoogleForms can be integrated into R. 

Text Analysis
With survey data examples, this tutorial explains how to use text analysis in R when qualitative data is available. 

Intro to R

Introduction to R
R is an open source statistical software that reads in data and performs statistical analyses.

Introduction to RMarkdown
RMarkdown promotes a reproducible workflow as it is an environment that combines statistics and report writing and can be integrated with GitHub for further collaboration.

Intro to git

Introduction to git
The following link will direct users to a more in-depth explanation on the mechanics of git, Github, and R in the form of an online book. The chapter begins with step-by-step instructions on how to connect git to RStudio, and then dives into the reasons why people use git and GitHub while providing examples along the way.

git collaboration and conflicts
This guide furthers the conversation on git by providing guidance on git and team collaborations.


Data Analysis

Data Modeling Essentials
Data modeling requires tidy data to ensure that the computer is correctly understanding the intended analysis. An example of tidy data is having individual csv files for each entity measured along with concise and understandable column names.

Cleaning and Manipulating Data
Prior to uploading data into a programming language, having data that is readable by the computer is an important step.


Data Visualization
R packages are a tool that can help simplify code because an R package encompasses prewritten code. There are data visualization R packages that can be downloaded from an R script. An example of an R package being used is downloading ggplot2 to create histograms.

Geospatial Analysis
Geospatial analysis is an avenue that can be explored for data that has location and is mappable. 

Publishing Data

Data Publishing
One of the last stages during research can be data publishing. This tutorial highlights how to publish data at the Arctic Data Center. 

Provenance and Reproducibility
This section defines provenance and reproducibility along with providing an example of an RMarkdown document.

Team Collaboration

Thinking Preferences
When working in a team dynamic, it can be helpful to understand the team’s “thinking preferences” via the Whole Brain Game.

Collaboration, authorship and data policies
Policies on collaboration, authorship, and data can help ensure that all team members are on the same page.

References and Images

Budden, A. E., Clark, S. J., Haycock-Chavez, N., Johnson, N., Jones, M. B. (2022). Fundamentals in Data Management for Qualitative and Quantitative Arctic Research. NCEAS Learning Hub. https://learning.nceas.ucsb.edu/2022-04-arctic/index.html

All of the images used are from the website thenounproject.com.