======================================================================================

MOSAiC Distributed Network Position Track Processing

Raw buoy GPS positions from the MOSAiC Distributed Network (DN) are processed for quality control by removing erroneous observations (i.e., position or date out of range, duplicates, etc.) and extracting buoy drift tracks from the cleaned data.  

Contact: 
Jenny Hutchings (jennifer.hutchings@oregonstate.edu)
Angela Bliss (angela.bliss@oregonstate.edu)

20 May 2021
======================================================================================

INPUT DATA:
Downloaded buoy observations should be compiled into single .csv files and saved in a directory (../data/SingleBuoys_Full/). Files must have columns for latitude, longitude, and the datetime in UTC (yyyy-mm-dd hh:mm:ss). 

OUTPUT DATA: 
The data files are saved as .csv files (ASCII format). Each file contains three columns for the datetime, latitude (decimal degrees), and longitude (decimal degrees). 

FILENAME CONVENTION: 
DNid_IMEI_sensorID.csv

Where “DNid" is an identifier unique to the DN, “IMEI” is the iridium identification number that is unique to the hardware, and sensorID is the identifier used in the AWI sensor registry for the MOSAiC field campaign.   

======================================================================================
RUNNING THE CODE

The code used to clean raw buoy tracks for the MOSAiC Distributed Network includes a series of IDL routines with bash script wrappers. The final time series of buoy positions are extracted from the cleaned track output using a python script. A description of each script is provided at the end of this document.

Before running this code downloaded buoy observations should be compiled into single files and saved in a directory (../data/SingleBuoys_Full/). Make directories to save the intermediary files (cleaned tracks and some plots) as follows:  ../data/Tracks_Clean/data and ../data/Tracks_Clean/plots. Note that very large files (> 1.5MB) must be split before processing (e.g., use "split -l 20000 filename") otherwise the IDL code will hang. The processed files can then be concatenated back together after the data are cleaned and before running the python script.


To process the data, start by running the process_buoys_full_v5.sh script: 


$ bash process_buoys_full_v5.sh


Next, final formatting of the data files is completed by running the archive_DN.py script. Note that the lookup table DN_buoy_list_v1.csv is required to run the python script. 


$ python archive_DN.py    


======================
Bash Scripts
======================
process_buoys_full_v5.sh
ProcessCleanTracks_v5.sh

process_buoys_full_v5.sh – This script extracts position data (latitude, longitude, and datetime) from the raw buoy data and saves .csv files of the raw tracks in the ../Tracks_Full/ directory for further processing. The full tracks are saved with filenames that use the DNid code and IMEI numbers of the buoy's first deployment (e.g. DNid_IMEI_Full.csv). The script then calls ProcessCleanTracks_v5.sh to start processing the data.

ProcessCleanTracks_v5.sh – This script reads a filelist of file names to be processed from the ../Tracks_Full/ directory and runs the IDL code described below to clean erroneous data from the buoy tracks. Intermediary cleaned data files are saved to the ../Tracks_Clean/data/ folder and some quicklook plots are saved to the ../Tracks_Clean/plots/ folder.

======================
IDL
======================
DNclean_adc.pro
read_track.pro
flag_track.pro
flagging.pro
plot_track.pro

DNclean_adc.pro - This is the main cleaning program. For each buoy, it produces two files: a file with the cleaned track data and a quicklook plot to check the data. The remaining idl subroutines are called by DNclean_adc.pro:

read_track.pro – This reads in the full drift track data, converts times to julian day (days since 2019-09-26), sorts data in increasing time order, flags duplicate observations, calls subroutines to flag the data, and removes flagged data. Returns the cleaned track data.

flag_track.pro – This routine prints the number of valid observations in the track and calls the flagging.pro routine. 

flagging.pro – This routine checks for invalid latitude and longitudes and dates that are out of range. Bad data are flagged and removed from the time series. 

plot_track.pro – Makes a quicklook line plot of buoy latitude versus date to check for data gaps.  

======================
Python
======================
archive_dn.py

archive_dn.py – This script is run after the buoy tracks are cleaned. It requires the lookup table DN_buoy_list_v1.csv to find buoy deployment datetimes and assign identification codes to make descriptive filenames. The script creates a filelist of clean track files in the ../Tracks_Clean/data/ directory to process and reads in the lookup table. For each file, the code finds all buoy deployments matching the imei number in the lookup table and extracts timeseries for each buoy deployment from the track data. The final processed tracks are saved to the ../DN_positions_v1/ directory.