Computational models are growing in their capacity to consume large datasets and create complex, fine-scale outputs that sometimes reach multiple terabytes in size. Although in theory model output can be regenerated by re-running the model, doing so may require access to high-performance supercomputing, making reproduction costly and impractical. Therefore, when the model output data have value for reuse, it can be beneficial to archive them for others to use without needing to re-run the model. However, because the outputs of these models can be extremely large, they can be costly to store. Additionally, as more models are developed and improved, earlier outputs may become obsolete or outdated. For example, model scenarios from the 1990s may not be as relevant for analysis for climate research in the 2020s.

To manage the cost of storage and maintain access for Arctic researchers and communities, the Arctic Data Center is introducing a new policy to distribute large multi-terabyte model output datasets without committing to archiving them permanently. Note that this policy does not apply to multi-terabyte observational or experimental data; we will always archive this data because unlike model outputs, they cannot be recreated.

Under this policy, the Arctic Data Center will:

Consider storing large model output datasets (e.g., roughly larger than 0.5 TB) if they are valuable for analytical reuse to Arctic researchers and the broader Arctic community
Continue to store the model code itself, along with documentation and sample data sufficient to understand and regenerate the output
Re-evaluate the decision to store each large model output dataset once every five years, with input from the Arctic research community on whether the data are sufficiently valuable and sufficiently accessed to justify continued storage and distribution

Small model output datasets that are not burdensome to store may be archived regardless of this policy, which is only meant to apply to large datasets.

Researchers interested in archiving large model outputs are encouraged to contact Arctic Data Center staff well in advance of publication or reporting deadlines with a statement on the model output’s value to researchers and the community, how the output could be re-used by others, and an estimation of how long you believe the model outputs will be useful. Our team will provide guidance on our storage capacity, file structuring and chunking strategies, and data transfer plan.

Once archived, large model output datasets will include:

Model code
Documentation
Example inputs and output files
Full model outputs

These datasets will undergo a review every 5 years to reassess their continued utility. During the review, we will contact the dataset creators for discussion, and may reach out to additional experts in the field to function as advisors. If the full model output is determined to have diminished utility, they will be removed. However, code, documentation, and example input/output files will remain in the archived dataset to ensure reproducibility.

Please contact the ADC support team at support@arcticdata.io for any questions and/or assistance in processing and publishing large model outputs and explore our data submission guidelines page for additional information.

Written by the Arctic Data Center curation & outreach teams.