Summary
In this part of the series, I’m going to highlight using “date-tools” and a basic pipeline-aggregating approach to aggregate and organise the data for initial analysis and plotting.
The lubdridate
package makes extracting key time-interval units from dateTime objects simple. These indices (i.e. year, month, day of year etc.) will become the primary grouping variables for the data aggregation.
setwd("C:/Users/daniel/Desktop/locstore/portfolio")
source("static/data/customTheme.r")
library(lubridate)
library(tidyverse)
library(magrittr)
library(knitr)
library(purrr)
library(kableExtra)
library(extrafont)
Wind <- read_delim("static/data/winds/Wind.tsv", "\t",
escape_double = FALSE,
col_types = cols(datetime = col_character(),
visibility_distance = col_double()),trim_ws = TRUE)
# extract date/time indices of interest
Wind$datetime <- ymd_hms(Wind$datetime)
Wind$yrDT <-year(Wind$datetime)
Wind$monthDT <-month(Wind$datetime)
Wind$dayDT <-day(Wind$datetime)
Wind$ydayDT <-yday(Wind$datetime)
Data Aggregation
Using a pipeline workflow (i.e. magrittr
) I aggregated the data into two temporal-frames; by Year and by Month.
extreme_byYear <- Wind %>% group_by(yrDT,usaf_station) %>%
arrange(usaf_station,yrDT,wind_speed) %>%
filter(wind_speed > quantile(wind_speed, 0.95,na.rm=TRUE)) %>% ungroup()
extreme_byMonth <- Wind %>% group_by(monthDT,usaf_station) %>%
arrange(usaf_station,wind_speed) %>%
filter(wind_speed > quantile(wind_speed, 0.95,na.rm=TRUE))
kable(head(extreme_byMonth),"html") %>%kable_styling(font_size=12)
usaf_station | elevation | wind_direction | wind_speed | visibility_distance | datetime | location | lat | lon | yrDT | monthDT | dayDT | ydayDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|
688130 | 3 | 220 | 67 | 999999 | 2014-05-04 15:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 4 | 124 |
688130 | 3 | 130 | 67 | 999999 | 2014-05-05 12:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 5 | 125 |
688130 | 3 | 130 | 67 | 999999 | 2014-05-05 15:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 5 | 125 |
688130 | 3 | 130 | 67 | 999999 | 2014-05-17 06:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 17 | 137 |
688130 | 3 | 350 | 67 | 999999 | 2014-05-24 03:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 24 | 144 |
688130 | 3 | 330 | 67 | 999999 | 2014-05-24 21:00:00 | ROBBEN ISLAND | -33.8 | 18.367 | 2014 | 5 | 24 | 144 |
Explorative Plots
- What role does seasonality play on the distribution of extreme winds (and most damaging!) throughout a calendar year?
# Plot of the seasonal patterns at each location
ggplot(extreme_byMonth,aes(factor(monthDT),fill=location)) +
geom_bar() +
facet_wrap(~location,ncol=2,scales="free",as.table=TRUE) +
theme_plain(base_size=10) +
xlab("Month of Year") +
ylab("Occurence (n times)") +
labs(title="95th Percentile Wind Speed Occurence Patterns across a Calendar Year (2005-2015)",
fill="Location")
This above graphic clearly illustrates the spatial heterogeneity between stations.
- One can inspect the “bundling” of extreme wind periods in a year by using a daily interval. Combined with the byYear temporal-frame one can begin to uncover the distribution of the most severe winds for each station by year across the entire record.
# Plot of occurence "bundles" across all years by station
ggplot(extreme_byYear, aes(ydayDT,wind_speed,color=location)) +
geom_point(size=0.5) + facet_wrap(~yrDT) +
theme_plain(base_size=10) + theme(aspect.ratio=1,legend.position="bottom") +
xlab("Day of Year") +
ylab("Wind Speed (km/h)") +
labs(title="Occurence Patterns of Most Extreme Winds",
fill="Location") + guides(color=guide_legend(title="Location"))
Leveraging purrr for power plotting
Below is a BONUS example of using purrr
to deal with cramped facet_plot
layouts. Nest the grouping variable and map this data frame of lists (of filtered data if necessary) using map2
with ggplot2
.
In addition map2
is used to write the separate frames to file (passing the file names and data to ggsave
)!
# define selection of choice
location_list <- c("CAPE AGULHAS","SLANGKOP", "CAPE TOWN INTL")
date_list <- c(2000,2003,2006,2009,2012,2015)
# apply filters
small_selection <- extreme_byYear %>%
filter(location %in% location_list) %>% filter(yrDT %in% date_list)
# set data in correct order
small_selection <- small_selection %>%
mutate(location = factor(location, levels = location_list,
ordered = TRUE))
# build plot using purrr
plots <- small_selection %>%
group_by(location) %>%
nest() %>%
mutate(
plot = map2(data, location,~ggplot(data = .x,aes(x = wind_speed, ..count.., colour=factor(yrDT),fill=factor(yrDT))) +
theme_plain(base_size=12) +
geom_density(alpha = 0.2,adjust=2,position="fill") +
ggtitle(.y) + ylab("Density") + xlab("Wind Speed")))
# a list of data frames
head(plots)
## # A tibble: 3 x 3
## location data plot
## <ord> <list> <list>
## 1 CAPE TOWN INTL <tibble [2,367 x 12]> <S3: gg>
## 2 SLANGKOP <tibble [319 x 12]> <S3: gg>
## 3 CAPE AGULHAS <tibble [430 x 12]> <S3: gg>
#file_names <- paste0(location_list, ".pdf")
# write to disk
map2(paste0(plots$location, ".png"), plots$plot, ggsave)
## Saving 8 x 8 in image
## Saving 8 x 8 in image
## Saving 8 x 8 in image
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL