How can repositories support social scientific data? Inside the Arctic Data Center’s recent workshop

In April, the Arctic Data Center, along with Arctic social scientific researchers Noor Johnson and Timothy Pasch, hosted a workshop to discuss data archival and re-use challenges and opportunities for Arctic social scientists. Like so many other groups in the time of COVID-19, we pivoted our planned in-person meeting to a virtual event in Zoom. Also like many other groups, we saw this as an opportunity to increase participation in ways not previously possible with our in-person planning (e.g. Halpern et al, 2020). We had 29 attendees in total, including four Arctic Data Center staff and four participants from the NSF Office of Polar Programs. The remaining 21 participants were active Arctic social science researchers representing diverse disciplines from anthropology to political science.

We began the workshop together in a plenary session, introducing ourselves and our topic. Noor Johnson and Timothy Pasch set the stage for a productive discussion, welcoming everyone and establishing the purpose of the workshop: to discuss the current state of data archival and re-use in the Arctic social scientific community and to propose concrete action steps to mitigate challenges associated with that data archival and re-use process.

Screenshot of our wonderful participants.

To set the stage and stimulate discussion, we heard from several researchers in lightning talks. This allowed us to immerse ourselves in the research practices of social scientists and explore the diversity of their data. These talks included:

  • Noor Johnson, who showcased how ELOKA works with co-produced data from researchers and residents of the Arctic;
  • Matthew Berman, who discussed the challenges of working with sensitive data (like from the US Census) from the perspective of an economist;
  • Kristal Jones, who spoke about practical limitations to archiving that stem from a dearth of open source workflows for qualitative data and the difficulty of archiving unstructured (non-tabular) data;
  • Merce Crosas, who demonstrated DataVerse, one effective repository that handles social scientific data well using integrations with other networks for searchable capabilities; and
  • Ben Marwick, who summarized well that the issue with sharing data among the social sciences may well be linked to the “prestige economy” of knowledge and the instinct among researchers to protect their data to avoid being scooped.

Before launching into our breakout sessions, Matt Jones, Arctic Data Center lead, provided an overview of the Center’s activities, highlighting the benefits of data sharing. The focus of the first breakout session was to explore and identify current challenges that exist across topic areas. We divided into three groups focused on: data science training for social scientists, the culture of data sharing in the social sciences, and data preservation and sharing infrastructure for the social sciences. After some small group discussion, we regrouped to share our thoughts and learn from the other groups. The biggest challenges brought up by the three groups were:

A screenshot of how essential the members of the training breakout group thought different data training needs are.
  • There is a great need for data science training in the social sciences at all levels, not just incoming graduate students. Training shouldn’t just be a copy and paste of other resources that exist because those other resources are not social science-specific – working with messy social scientific data should be part of any training course that’s developed.
  • There are ethical challenges inherent in data sharing and re-use when that data is about human subjects or protected sites. Preservation is the first step towards creating a culture of data sharing – right now, some social scientists aren’t preserving their data as well as they should due to a number of trust issues with that protected data in an open repository space.

  • Interoperability is low within social scientific disciplines due to the lack of a shared vocabulary and / or metadata standards.

The second day focused on discussing ways forward to address those challenges, both within our three focused groups and as a larger group. Ideas ranged from those small and easily accomplished – such as increasing outreach to encourage social scientists to archive their metadata as a minimum with the Arctic Data Center – to those large and collaboration-rich opportunities – such as developing a consistent vocabulary for social scientists to use when submitting data and metadata for archival and re-use. 

We experienced a convergence of ideas coming from the different breakout groups, suggesting a deep synthesis among the training, culture, and infrastructure needs of the social scientific community. The Arctic Data Center has developed plans to address a number of topics that came up in this brainstorming session, including creating a training specifically for Arctic social scientists, and will work with other aligned organizations to begin tackling the larger objectives. The first step in this process? Publishing a full report on the challenges identified, solutions proposed, and collaborations suggested later this year.