“Raw” data comes from https://gcoos5.geos.tamu.edu/erddap. The data hosted there has already had some processing done, but this cleans it even further.
After processing the cleaned data will be in ./data/cleaned/
. A version of the cleaned data is hosted by USF IMaRS at here .
The code below is used for fetching the data. Cleaning is done per-cruise, code can be found alongside visualizations in the “Cruise Reports”.
setup
if (! nzchar (system.file (package = "librarian" ))) {
install.packages ("librarian" )
}
librarian:: shelf (
quiet = TRUE ,
librarian, conflicted, ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr,
forcats, lubridate, glue, fs, magrittr, here, rerddap, cli,
)
These packages will be installed:
'rerddap'
It may take some time.
also installing the dependencies 'hoardr', 'ncdf4'
setup
conflicts_prefer (
dplyr:: filter (),
dplyr:: select ()
)
[conflicted] Will prefer dplyr::filter over any other package.
[conflicted] Will prefer dplyr::select over any other package.
setup
# source(here("R", "ctd_download.R"))
list all cruises
"https://gcoos5.geos.tamu.edu/erddap/" %>%
Sys.setenv (RERDDAP_DEFAULT_URL = .)
# search for all cruise ctd tables in ERDDAP
all_cruise_info <-
ed_search_adv (
query = "Walton Smith CTD" ,
# maxTime = "2016-01-01T01:00:00Z",
page_size = 1e4
)
# find unique cruise names
all_cruise_ids <-
str_extract (all_cruise_info$ info$ title, "(WS|SAV|WB|H) \\ d{3,5}" ) %>%
unique ()
# print info about cruise IDs found
cli:: cli_alert_info (
c (
"{col_green( \" Number of files queried \" )}: " ,
"{nrow(all_cruise_info$info)} files \n " ,
"{col_blue( \" N cruises: \" )} {length(all_cruise_ids)} \n\n "
)
)
ℹ Number of files queried: 3542 files
N cruises: 69
list all cruises
[1] "WS0603" "WS0612" "WS0618" "WS0623" "WS0704" "WS0718"
[7] "WS0802" "WS0807" "WS0901" "WS0906" "WS0914" "WS0919"
[13] "WS0923" "WS1004" "WS1007" "WS1015" "WS1018" "WS1102"
[19] "WS1106" "WS1113" "WS1116" "WS1202" "WS1207" "WS1212"
[25] "WS1418" "WS15103" "WS15152" "WS15208" "WS15264" "WS15320"
[31] "WS16004" "WS16074" "WS16207" "WS16263" "WS16319" "WS17030"
[37] "WS17086" "WS17170" "WS17212" "WS17282" "WS18008" "WS18120"
[43] "WS18218" "WS18285" "WS18351" "WS19028" "WS19119" "WS19210"
[49] "WS19266" "WS19322" "WS20006" "WS20231" "WS20279" "WS20342"
[55] "WS21032" "WS21093" "WS21151" "WS21212" "WS21338" "WS22022"
[61] "WS22072" "WS22141" "WS22215" "WS22281" "WS22337" "WS23010"
[67] "WS23061" "SAV1803" "SAV18173"
get ready for download
dir_data_dwnld_save <- here ("data" , "raw" , "ctd" ) # download path
dir_data_avg_save <- here ("data" , "processed" , "ctd" ) # averaged path
dir_create (dir_data_avg_save)