Saturday, April 13, 2013

HEC-DSS files and R

 
Update: If you've come here looking for a way to read DSS files in R, please check out my DSS-Rip library that wraps up the code required to link R to DSS, and simplifies converting DSS time series to R's xts format.

Motivation:


The following is the beginnings of what I hope will become a library to read and write data from HEC-DSS format files.  This is partially inspired by a desire to be able to plot the data in an environment such as ggplot2.

The process:


First, the rJava library for R allows the calling of Java code from within the R environment.  It also provides a nice R-esque/S3-style syntax for calling functions within an object, by using the $ delimiter, as I'll show later.
> library(rJava)
Next, I need to configure the location of my HEC-DSSVue install, as to call the Java functions contained within.  The next few lines may need to be varied for Windows 7.
> dss_location = "C:\\Program Files\\HEC\\HEC-DSSVue\\" 
> jars = c("hec", "heclib", "rma", "hecData") 
> jars = paste0(dss_location, "jar\\", jars, ".jar")
> libs = "-Djava.library.path=C:\\Program Files\\HEC\\HEC-DSSVue\\lib\\"
Now that I have the required JAR files and locations of required DLLs in some variables, I can start the JVM, passing it their locations.
> .jinit(classpath=jars, parameters=libs)
Here's where I create a new DSS file object by calling the static open function that creates a HecDss object:

> dssFile = .jcall("hec/heclib/dss/HecDss", "Lhec/heclib/dss/HecDss;",   method="open", "C:\\test.dss")
Finally, reading a known pathname, and plotting the time series data.  The get function returns a TimeSeriesContainer object, two properties of which are the sequence of timestamps and values at each timestamp. This should not be confused with the read function, which returns a HecMath representation of the data, useful for calling their built in time series math code, but not very helpful if we want the raw numbers.

> data = dssFile$get("/RACCOON CREEK/SWEDESBORO NJ/FLOW/12APR2013/IR-DAY/USGS/")
> plot(data$times, data$values, main="Raccoon Creek - Swedesboro, NJ", xlab="Time", ylab="Flow (cfs)")

Conclusions:


So, it's possible to read, and potentially write DSS data from within R.  I hope that by using the interface to the DSSVue program, I can avoid trying to deal with all sorts of specific cases.  Some future work may require making the DSS files more navigable from code.  This will probably require writing wrappers for the HecDss.get, HecDss.put, and HecDss.getCatalogedPathnames functions so that file can be searched and more R friendly versions of the data can be produced.  I focused this on a Windows environment, because that is what is available to me at work, but a Linux version of DSSVue exists, and a cross-platform solution would be useful.

* I added the "and Python" in the title, because with some luck, this future library and Rpy2 may be an easy way to get data from DSS files into Python.