Wind and Waves in Cape Town: Part 3

Dan Gray

2017/12/10

Summary

In this part of the series, I’m going to highlight using “date-tools” and a basic pipeline-aggregating approach to aggregate and organise the data for initial analysis and plotting.

The lubdridate package makes extracting key time-interval units from dateTime objects simple. These indices (i.e. year, month, day of year etc.) will become the primary grouping variables for the data aggregation.

setwd("C:/Users/daniel/Desktop/locstore/portfolio")
source("static/data/customTheme.r")

library(lubridate)
library(tidyverse)
library(magrittr)
library(knitr)
library(purrr)
library(kableExtra)
library(extrafont)

Wind <- read_delim("static/data/winds/Wind.tsv", "\t", 
                   escape_double = FALSE, 
                   col_types = cols(datetime = col_character(),
                                    visibility_distance = col_double()),trim_ws = TRUE)

# extract date/time indices of interest 
Wind$datetime <- ymd_hms(Wind$datetime)
Wind$yrDT <-year(Wind$datetime)
Wind$monthDT <-month(Wind$datetime)
Wind$dayDT <-day(Wind$datetime)
Wind$ydayDT <-yday(Wind$datetime)

Data Aggregation

Using a pipeline workflow (i.e. magrittr) I aggregated the data into two temporal-frames; by Year and by Month.

extreme_byYear <- Wind %>% group_by(yrDT,usaf_station) %>% 
  arrange(usaf_station,yrDT,wind_speed) %>% 
  filter(wind_speed > quantile(wind_speed, 0.95,na.rm=TRUE)) %>% ungroup()

extreme_byMonth <- Wind %>% group_by(monthDT,usaf_station) %>% 
  arrange(usaf_station,wind_speed) %>% 
  filter(wind_speed > quantile(wind_speed, 0.95,na.rm=TRUE))

kable(head(extreme_byMonth),"html") %>%kable_styling(font_size=12)
usaf_station elevation wind_direction wind_speed visibility_distance datetime location lat lon yrDT monthDT dayDT ydayDT
688130 3 220 67 999999 2014-05-04 15:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 4 124
688130 3 130 67 999999 2014-05-05 12:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 5 125
688130 3 130 67 999999 2014-05-05 15:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 5 125
688130 3 130 67 999999 2014-05-17 06:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 17 137
688130 3 350 67 999999 2014-05-24 03:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 24 144
688130 3 330 67 999999 2014-05-24 21:00:00 ROBBEN ISLAND -33.8 18.367 2014 5 24 144

Explorative Plots

  1. What role does seasonality play on the distribution of extreme winds (and most damaging!) throughout a calendar year?
# Plot of the seasonal patterns at each location 
ggplot(extreme_byMonth,aes(factor(monthDT),fill=location)) + 
  geom_bar() + 
  facet_wrap(~location,ncol=2,scales="free",as.table=TRUE) + 
  theme_plain(base_size=10) + 
  xlab("Month of Year") + 
  ylab("Occurence (n times)") + 
  labs(title="95th Percentile Wind Speed Occurence Patterns across a Calendar Year (2005-2015)",
       fill="Location")

This above graphic clearly illustrates the spatial heterogeneity between stations.

  1. One can inspect the “bundling” of extreme wind periods in a year by using a daily interval. Combined with the byYear temporal-frame one can begin to uncover the distribution of the most severe winds for each station by year across the entire record.
# Plot of occurence "bundles" across all years by station
ggplot(extreme_byYear, aes(ydayDT,wind_speed,color=location)) + 
  geom_point(size=0.5) + facet_wrap(~yrDT) + 
  theme_plain(base_size=10) + theme(aspect.ratio=1,legend.position="bottom") + 
  xlab("Day of Year") + 
  ylab("Wind Speed (km/h)") + 
  labs(title="Occurence Patterns of Most Extreme Winds",
       fill="Location") + guides(color=guide_legend(title="Location"))

Leveraging purrr for power plotting

Below is a BONUS example of using purrr to deal with cramped facet_plot layouts. Nest the grouping variable and map this data frame of lists (of filtered data if necessary) using map2 with ggplot2.

In addition map2 is used to write the separate frames to file (passing the file names and data to ggsave)!

# define selection of choice
location_list <- c("CAPE AGULHAS","SLANGKOP", "CAPE TOWN INTL")
date_list <- c(2000,2003,2006,2009,2012,2015)

# apply filters
small_selection <- extreme_byYear %>%
  filter(location %in% location_list) %>% filter(yrDT %in% date_list)

# set data in correct order
small_selection <- small_selection %>%
  mutate(location = factor(location, levels = location_list, 
                           ordered = TRUE))

# build plot using purrr
plots <- small_selection %>%
  group_by(location) %>%
  nest() %>%
  mutate(
    plot = map2(data, location,~ggplot(data = .x,aes(x = wind_speed, ..count.., colour=factor(yrDT),fill=factor(yrDT))) + 
                       theme_plain(base_size=12) +
       geom_density(alpha = 0.2,adjust=2,position="fill") +
       ggtitle(.y) + ylab("Density") + xlab("Wind Speed")))

# a list of data frames
head(plots)
## # A tibble: 3 x 3
##   location       data                  plot    
##   <ord>          <list>                <list>  
## 1 CAPE TOWN INTL <tibble [2,367 x 12]> <S3: gg>
## 2 SLANGKOP       <tibble [319 x 12]>   <S3: gg>
## 3 CAPE AGULHAS   <tibble [430 x 12]>   <S3: gg>
#file_names <- paste0(location_list, ".pdf")

# write to disk
map2(paste0(plots$location, ".png"), plots$plot, ggsave)
## Saving 8 x 8 in image
## Saving 8 x 8 in image
## Saving 8 x 8 in image
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL