R Notebook
Calculating anomaly and trend with sea ice thickness time series
In this exercise, we will use the sea ice thickness data in the Arctic region, available through the PolarWatch data server, to study changes in monthly average sea ice thickness values. We will calculate both the long-term trend at each location as well as estimate a changing seasonal.
The exercise demonstrates the following techniques:
- Downloading, as a netcdf file, twice daily sea ice thickness data for a region sdelected at random for the years 1982 to late 2024.
- Calculating a monthly mean from the twice-daily data.
- Calculating the trend and changing seasonal of monthly sea ice thickness means using state-space models
- Visualizing the result of the state-space analysis.
Getting the data
First we will load the packages that will be used:
#load needed libraries
library(dplyr)
library(ggplot2)
library(lubridate)
library(KFAS)
library(ncdf4)
library(tidyr)Next the script to download the data (it is advised not to run this because it can take a long time to download and can fail - the resulting file is included), the region selected in projected coordinates was chosen arbitarily:
download_url <- "https://polarwatch.noaa.gov/erddap/griddap/ncei_polarAPPX20_nhem.nc?cdr_sea_ice_thickness[(1982-10-31T14:00:00Z):1:(2024-10-31T14:00:00Z)][(-388546.6):1:(338411.6)][(-438681.7):1:(338411.6)]"
download.file(download_url, "ncei_polar.nc", 'wb')The next step is to read in the netcdf file:
root <- nc_open('data/ncei_polar.nc')
time <- ncvar_get(root, 'time')
rows <- ncvar_get(root, 'rows')
columns <- ncvar_get(root, 'columns')
ice_thick <- ncvar_get(root, 'cdr_sea_ice_thickness')
nc_close(root)“time” needs to be converted to an ‘R’ time, and then year and month extracted:
time <- as.POSIXlt(time, origin = '1970-01-01', tz = "GMT")
years <- year(time)
uniqueYears <- unique(years)
months <- month(time)One solution to calculate the mean time series is to “melt” the data to long form, and then use functions like ‘apply()’ or ‘tapply()’ or any of the appropriate “tidyverse” functions to calculate the mean value for each month. Code below shows how to “melt” the data, but we will not do so because as you will find out you will quickly run out of memory and likely crash your R.
out <- list(year = years, month = months, rows = rows, columns = columns)
df <- as.data.frame(as.vector(ice_thick))
meta <- expand.grid(out, stringsAsFactors = FALSE)
alldf <- cbind(meta, df)A simpler, less elegant but more straightforward method (and easier to debug) is to use loops:
no_years <- length(unique(years))
month_avg <- array(NA, dim = c(length(rows), length(columns), 12, no_years))
for (myMonth in 1:12){
for (yearCount in 1:length(uniqueYears)){
myYear <- uniqueYears[yearCount]
data_point <- which((months == myMonth) & (years == myYear))
temp_data <- ice_thick[, , data_point]
month_avg[, , myMonth, yearCount] <- apply(temp_data, c(1, 2), function(x) mean(x, na.rm = TRUE))
}
}
# re-from array to be a time-series at each point
month_avg_series <- array(month_avg, dim = c(length(rows), length(columns), 12*length(uniqueYears)))
# remove the last two months of 2024 which are missing
month_avg_series <- month_avg_series[, , 1:514]
# create a date-time object to be used with plotting and other functions
date_time <- seq.Date(from = as.Date("1982-01-01"), to = as.Date("2024-10-01"), by = "month")In the code above, the “apply()” function applies the function “mean()” elementwise, and ‘month_avg_series’ just flattens the calculated array so that there is a time-series for each time period at each location, rather than being subscripted by month and year.
In order to examine the series, we will use a state-space decomposition of the data, which separates the data into a nonparametric trend and plus a seasonal component that can vary in phase and amplitude. This is implemented in the ‘R’ package ‘KFAS’. To start, a function is defined that sets up the state-space model, another that is used with the optimization routine to re-evaluate the function for the new parameters, and a function to call what is needed to do all of the calculation steps:
### define the model in KFAS
state_space_decomp <- function(dataSeries){
# set model starting values
irreg_init <- 0.5 * log(1)
level_init <- 0.5 * log(.01)
season_init <- 0.5 * log(.1)
modelts_inits <- c(irreg_init, level_init, season_init)
# define the state-space model in KFAS
model_ice <- SSModel(dataSeries ~ SSMtrend(degree = 1 , Q = list(NA)) +
SSMseasonal(12, Q=NA, sea.type = "trigonometric"), H = matrix(NA))
# fit the model to the data
model_ice_Fit <- fitSSM(model = model_ice, inits = modelts_inits, updatefn = update_modelts)
# calculate the smoothed values for the optimal parameters
smooth_ice <- KFS(model_ice_Fit$model, filtering = "state", smoothing = "state")
# extract the estimated trend and seasonal
level <- signal(smooth_ice, states = 'level')$signal
season <- signal(smooth_ice, states = 'season')$signal
smooths <- data.frame(level = level, season = season)
}
### define the function to update model for the optimization routine
update_modelts <- function(pars, model) {
finite_test <- 0.5 * log(.000001)
if (pars[1] < finite_test) {
pars[1] <- finite_test
}
model$H[1,1,1] <- exp(2. * pars[1])
temp3 <- exp(c(2 * pars[2], rep((2 * pars[3]), 11)))
diag(model$Q[,,1]) <- temp3
return(model)
}
To get an idea of the output from the state-space model we look at the first series and estimate the model:
dataSeries <- month_avg_series[1, 1, ]
# make certain we start at the first actual value in the series
nobs <- length(na.omit(dataSeries))
ice_decomp <- state_space_decomp(dataSeries)In order to examine the output, some ‘ggplot2’ plotting functions are defined:
plot_trend_data <- function(dataSeries, level, date_time){
df <- data.frame(
date_time <- date_time,
data = dataSeries,
trend = level
)
p <- ggplot(df, aes(x = date_time)) +
geom_line(aes(y = data, color = "Data"), linewidth = 0.5) + # Plot the data
geom_line(aes(y = trend, color = "Trend"), linewidth = 0.5) + # Plot the trend
labs(title = "Data and Trend Plot",
x = "Time",
y = "Ice Thickness") +
scale_color_manual(values = c("Data" = "black", "Trend" = "red")) + # Customize colors
theme_minimal()
return(p)
}
plot_season <- function(season, date_time){
df <- data.frame(
date_time = date_time,
season = season
)
ggplot(df, aes(x = date_time, y = season)) +
geom_line(size = 0.5) +
labs(title = "Ice Thickness Seasonal Component", x = "Time", y = "Ice Thickness") +
theme_minimal()
}
plot_season_month <- function(season, date_time){
myMonth <- month(date_time)
myYear <- year(date_time)
df <- data.frame(
month = myMonth,
year = myYear,
season = season
)
monthly_means <- df %>%
group_by(month) %>%
summarise(mean_value = mean(season))
df <- df %>%
left_join(monthly_means, by = "month")
p <- ggplot(df, aes(x = factor(month), y = season, group = year, color = factor(year))) +
geom_line() +
geom_line(aes(y = mean_value), color = "gray") +
labs(x = "Month", y = "Values", title = "Monthly Series and Monthly Mean") +
theme_minimal() +
theme(legend.title = element_blank())
return(p)
}
plot_season_polar <- function(season, date_time){
myMonth <- month(date_time)
myYear <- year(date_time)
df <- data.frame(
month = myMonth,
year = myYear,
season = season
)
p <- ggplot(df, aes(x = factor(month), y = season, group = year)) +
geom_line(aes(color = factor(year))) +
coord_polar() +
labs(x = "Month", y = "Values", title = "Polar Plot of Time Series") +
theme_minimal() +
theme(legend.position = "right")
return(p)
}Plot the trend versus the series:
p <- plot_trend_data(dataSeries, ice_decomp$level, date_time)
pPlot the seasonal:
p <- plot_season(ice_decomp$season, date_time)
pIn interpreting the seasonal plot, even though in theory the seasonal and trend are independent, in practice since the ice thickness has a lower bound of zero, if the trend decreases this limits the amount the seasonal can vary downward.
Plot each year’s season on one graph:
p <- plot_season_month(ice_decomp$season, date_time)
pMake a polar plot of the seasonal:
p <- plot_season_polar(ice_decomp$season, date_time)
pWhat can clearly be seen in the seasonal is how it has been slowly but consistently changing over roughly decadal time scales.
To examine another location just change the indices in the data array, that is set irow and jcol to the desired values and run the code below:
dataSeries <- month_avg_series[irow, jcol, ]
# make certain we start at the first actual value in the series
ice_decomp <- state_space_decomp(dataSeries)
p_trend <- plot_trend_data(dataSeries, ice_decomp$level, date_time)
p_season <- plot_season(ice_decomp$season, date_time)
p_season_month <- plot_season_month(ice_decomp$season, date_time)
p_season_polar <- plot_season_polar(ice_decomp$season, date_time)
To get the state-space decomposition for all of the series, just iterate over rows and columns (this can take 5 minutes or there abouts so a good time to stretch you legs):
# create arrays of NA to store results
trends <- array(NA_real_, dim = c(length(rows), length(columns), length(date_time)))
seasons <- array(NA_real_, dim = c(length(rows), length(columns), length(date_time)))
# loop over rows and columns
for (irow in seq(1, length(rows))) {
for (jcol in seq(1, length(columns))) {
dataSeries <- month_avg_series[irow, jcol, ]
# make certain we start at the first actual value in the series
ice_decomp <- state_space_decomp(dataSeries)
trends[irow, jcol, ] <- ice_decomp$level
seasons[irow, jcol, ] <- ice_decomp$season
}
}This analysis looks at each location separately and can be informative, but a better analysis, beyond the scope of this tutorial, would be to model all the locations jointly, both to take into account spatial correlation as well as the fact that neighboring locations most like will have more similar locations, so some amount of spatial smootiing or regularization would be called for.