In this post, I will show you step-by-step instructions to work on SST data in R.

Install the Necessary Tools for NetCDF

Before importing NetCDF files in R, we should install the necessary tools. Mac users require Xcode Command Line Tools, and can use MacPorts to finish the installation of NetCDF by typing the following lines into the terminal.

sudo port install netcdf
sudo port install nco
sudo port install ncview

More details can be found here; by the way, Ubuntu users can be referred to here.

Download an SST Dataset

For convenience’ sake, we download a lower resolution dataset, Kaplan Extended SST data from ESRL PSD on 5 degree latitude by 5 degree longitude ($5^{\circ} \times 5^{\circ}$) equiangular grid cells.

# set a url of the Kaplan SST data
url <- ''
# create a name for temporary files in the working directory
file <- tempfile(tmpdir = getwd()) 
# creates a file with the given name
## [1] TRUE
#download the file
download.file(url, file)

Import the NetCDF File

Before importing the file, we install an R package, ncdf4, for the interface of NetCDF.


Then, we can extract the SST anomalies and their corresponding coordinates from the file.

# open an NetCDF file <- nc_open(file)
# set coordinate variable: latitude
y <- ncvar_get(, "lat")
# set coordinate variable: longitude
x <- ncvar_get(, "lon")  
# extract SST anomalies
df <- ncvar_get(,$var[[1]])
# close an NetCDF file
# delete the file
## [1] TRUE

Note that we can type print( to gain more information.

Example: Indian Ocean SST

The following example is inspired by Deser et al.(2009). The region of Indian ocean is set between latitudes $20^{\circ}$N and $20^{\circ}$S between longitudes $40^{\circ}$E and $120^{\circ}$E.

# set the region of Indian Ocean
lat_ind <- y[which(y == -17.5):which(y == 17.5)]
lon_ind <- x[which(x == 42.5):which(x == 117.5)]

# print the total number of grids
## [1] 128
# extract the Indian Ocean SST anomalies
sst_ind <- df[which(x == 42.5):which(x == 117.5), 
              which(y == -17.5):which(y == 17.5),]

# define which location is ocean (s2: Not NA) or land (s1: NA)
s1 <- which([,,1]))
s2 <- which(![,,1]))

# print the number of grids on the land
## [1] 4
# print the dimension of sst_ind
## [1]   16    8 1937

Out of 8 × 16 = 128 grid cells, there are 4 cells on the land where no data are available. The time period are from January 1856 to April 2017. Here the data we use observed at $124$ grids and 1936 time points.

Vectorize the SST anomalies

We reshape the data as a $1936 \times 124$ matrix by vectorizing the anomalies corresponding to each time.

sst <- matrix(0, nrow = dim(sst_ind)[3], ncol = length(s2))

for(i in 1:dim(sst_ind)[3])
  sst[i,] <- sst_ind[,,i][-s1]

Detect the Dominant Patterns

For simplicity, we assume the time effect is ignorable. We use the empirical orthogonal functions (EOF) to represent the dominant patterns.

# Extract the EOFs of data
eof <- svd(sst)$v

# require an R package, fields
if (!require("fields")) {

# require an R package, RColorBrewer
if (!require("RColorBrewer")) {

# Define the location in ocean
loc <- as.matrix(expand.grid(x = lon_ind, y = lat_ind))[s2,]
coltab <- colorRampPalette(brewer.pal(9,"BrBG"))(2048)
# plot the first EOF
par(mar = c(5,5,3,3), oma=c(1,1,1,1))
quilt.plot(loc, eof[,1], nx = length(lon_ind), 
           ny = length(lat_ind), xlab = "longitude",
           ylab = "latitude", 
           main = "1st EOF", col = coltab,
           cex.lab = 3, cex.axis = 3, cex.main = 3,
           legend.cex = 20)
maps::map(database = "world", fill = TRUE, col = "gray", 
          ylim=c(-19.5, 19.5), xlim = c(39.5,119.5), add = T)

plot of chunk eof1

# plot the second EOF
par(mar = c(5,5,3,3), oma=c(1,1,1,1))
quilt.plot(loc, eof[,2], nx = length(lon_ind), 
           ny = length(lat_ind), xlab = "longitude",
           ylab = "latitude", 
           main = "2nd EOF", col = coltab,
           cex.lab = 3, cex.axis = 3, cex.main = 3,
           legend.cex = 20)
maps::map(database = "world", fill = TRUE, col = "gray", 
          ylim=c(-19.5, 19.5), xlim = c(39.5,119.5), add = T)

plot of chunk eof2

The first EOF is known as a basin-wide mode, and the second one is a dipole mode.


