hydro_prob_trans
and hydro_flow_trans
functions provided in hydroutils.
What's a flood frequency curve?
In hydrology, a common plot for assessing displaying the likelyhood of a flood event is the frequency curve, a plot of the cumulative distribution function for the observed annual maximums. Typically, in the United States, the Log Pearson type III distribution is fitted to the observed data, to extrapolate the 1% annual chance of exceedence event, sometimes called the 100-year flood. By using a logarithmic y-axis and showing the normal deviate on the x-axis, a distribution without skew will show up as a straight line on the plot. This special x-axis is always the complicated part in creating these plots in many languages, but in R, there is thescales
package provides a transform that ggplot2
can use to create the probability axis quite easily.Below is the output from the code example outlined below.
Bottom line, up front.
This is the code you're looking for if you came to this page with a deadline. The full example, explained in the next section, is available on my github.library(ggplot2) library(scales) ## required for probability axis ## Setup breaks for plot xbreaks = c(0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99, 0.995) log.range = log10(range(bwpeaks$FLOW)) lower = 10^floor(log.range[1]) upper = 10^ceiling(log.range[2]) cap = lower ybreaks = NULL while(cap < upper){ ybreaks = c(ybreaks, seq(cap, cap*9, by=cap)) cap = cap*10 } plt = ggplot(bwpeaks, aes(x=PROB, y=FLOW)) + geom_point() + theme_bw() plt = plt + scale_y_continuous(trans="log10", breaks=ybreaks, name="Flow (cfs)") plt = plt + scale_x_continuous(trans=probability_trans(distribution="norm"), breaks=xbreaks, labels=paste0((1-xbreaks)*100,"%"), name="Annual Chance of Exceedence") plt = plt + ggtitle("Brandywine Creek nr. Wilmington, Delaware") plt
The first block of code sets up the breaks, the grid lines used in
ggplot2
, the majority of which is there to create breaks at the multiples of each power of 10, starting at the lowest value of the plot through the maximum. This makes the y-axis a little bit easier to read, than just showing each power of 10.The dataframe
bwpeaks
contains my dataset to plot, the annual peak flows, sorted, with the Weibull plotting position used for the frequencies. The probability_trans function
from the scales
package provides the fancy x-axis. Two options exist to put the annual chance of exceedence, in the typical order. The first, shown above, is to store the data as the probability of non-exceedence, and set the labels on the x-axis to 1-xbreaks
. The second option is to store the annual chance of exceedence, and reverse the x-axis using limits=c(1,0)
. But wait! How do I get data from a DSS file into a format for that code?
Part 1: Getting the data out of a DSS file.
If you've got a DSS file, it's quite easy to open, read, and write data from it into DataContainer objects using the
get
and times
which contains integers representing minutes since January 1st, 1970 and values
, containing the observations. There's also a paired data container, which contains a single set of x ordinates and one or more y ordinates, stored as xOrdinates
and yOrdinates
. The rJava
package makes it easy to access these objects as if they were S3 objects.require(dssrip) ## Open file and get TSC bwfile = opendss("C:/dssrt/test.dss") bwcreek = getFullTSC(bwfile, "A=BRANDYWINE CREEK C=FLOW E=1DAY") colnames(bwcreek) = "FLOW" ## TODO - add naming the columns by C-partThe above is an example from some helper functions I've written to read DSS files.
getFullTSC
allows a search of the DSS file, which might be overly broad if the user isn't careful, and returns a xts
object that many R time series objects support.Note: Be careful with using the times produced by this object - I'm pretty sure they can be 24 hours off, if the timestamps in the DSS file are at
2400
hours, which can be interpreted as the start of the day or the end of the day. I need to check how this is being handled in my code.Part 2: Turning a daily time series into annual peaks
The alternative to this would be to just download the annual peaks from the USGS, which is easy enough to do, but I want to show off one of the cool parts of using R, the aggregate function. I also realize I could have used
stat_ecdf
from the ggplot2
package, but as far as I can tell it does not offer an option to put points at Weibull plotting positions, which besides not being the standard for flood frequency plots, conflicts with the infinite distance between 0 and 1 on the probability axis. bwcreek$WY = 1900 + ifelse(.indexmon(bwcreek)+1 < 9, .indexyear(bwcreek), 1+.indexyear(bwcreek)) ## Get peaks for each water year, assign probabilities bwpeaks = aggregate(FLOW ~ WY, bwcreek, max) bwpeaks = subset(bwpeaks, WY!=2013) ## An artifact of my timestamp issue. bwpeaks = bwpeaks[order(bwpeaks$FLOW),] ## Sort the flows n = length(bwpeaks$FLOW) bwpeaks$PROB = (1:n)/(1+n) ## Weibull plotting postions.