Saturday, April 13, 2013

HEC-DSS files and R

 
Update: If you've come here looking for a way to read DSS files in R, please check out my DSS-Rip library that wraps up the code required to link R to DSS, and simplifies converting DSS time series to R's xts format.

Motivation:


The following is the beginnings of what I hope will become a library to read and write data from HEC-DSS format files.  This is partially inspired by a desire to be able to plot the data in an environment such as ggplot2.

The process:


First, the rJava library for R allows the calling of Java code from within the R environment.  It also provides a nice R-esque/S3-style syntax for calling functions within an object, by using the $ delimiter, as I'll show later.
> library(rJava)
Next, I need to configure the location of my HEC-DSSVue install, as to call the Java functions contained within.  The next few lines may need to be varied for Windows 7.
> dss_location = "C:\\Program Files\\HEC\\HEC-DSSVue\\" 
> jars = c("hec", "heclib", "rma", "hecData") 
> jars = paste0(dss_location, "jar\\", jars, ".jar")
> libs = "-Djava.library.path=C:\\Program Files\\HEC\\HEC-DSSVue\\lib\\"
Now that I have the required JAR files and locations of required DLLs in some variables, I can start the JVM, passing it their locations.
> .jinit(classpath=jars, parameters=libs)
Here's where I create a new DSS file object by calling the static open function that creates a HecDss object:

> dssFile = .jcall("hec/heclib/dss/HecDss", "Lhec/heclib/dss/HecDss;",   method="open", "C:\\test.dss")
Finally, reading a known pathname, and plotting the time series data.  The get function returns a TimeSeriesContainer object, two properties of which are the sequence of timestamps and values at each timestamp. This should not be confused with the read function, which returns a HecMath representation of the data, useful for calling their built in time series math code, but not very helpful if we want the raw numbers.

> data = dssFile$get("/RACCOON CREEK/SWEDESBORO NJ/FLOW/12APR2013/IR-DAY/USGS/")
> plot(data$times, data$values, main="Raccoon Creek - Swedesboro, NJ", xlab="Time", ylab="Flow (cfs)")

Conclusions:


So, it's possible to read, and potentially write DSS data from within R.  I hope that by using the interface to the DSSVue program, I can avoid trying to deal with all sorts of specific cases.  Some future work may require making the DSS files more navigable from code.  This will probably require writing wrappers for the HecDss.get, HecDss.put, and HecDss.getCatalogedPathnames functions so that file can be searched and more R friendly versions of the data can be produced.  I focused this on a Windows environment, because that is what is available to me at work, but a Linux version of DSSVue exists, and a cross-platform solution would be useful.

* I added the "and Python" in the title, because with some luck, this future library and Rpy2 may be an easy way to get data from DSS files into Python.

22 comments:

  1. I need to get a bunch of DSS data in a format that I can easily pull into R. This was a BIG help. Thanks for the bits to get m going!

    ReplyDelete
  2. Hei

    Where are my going wrong? Please look at my code below. Thanks.
    #DSS
    #Works only with 32 bit R
    setwd("D:/Work/DBBBSverjesjoen/Model/HEC_RAS")
    #
    library(rJava)
    #configure the location of my HEC-DSSVue install, as to call the Java functions contained within. The next few lines may need to be varied for Windows 7.
    dss_location<-"C:/Program Files (x86)/HEC/HEC-DSSVue/"
    jars = c("hec", "heclib", "rma", "hecData")
    jars = paste0(dss_location, "jar\\", jars, ".jar")
    libs = "-Djava.library.path=C:\\Program Files (x86)\\HEC\\HEC-DSSVue\\lib\\"
    .jinit(classpath=jars, parameters=libs)
    dssFile = .jcall("hec/heclib/dss/HecDss", "Lhec/heclib/dss/HecDss;", method="open", "Sverjesjoen.dss")

    Error in .jcall("hec/heclib/dss/HecDss", "Lhec/heclib/dss/HecDss;", method = "open", :
    java.lang.NoClassDefFoundError: Could not initialize class hec.heclib.util.Heclib

    ReplyDelete
    Replies
    1. The error you are getting is a common one. It means you don't have the rJava JVM pointed at the appropriate jar files or DLL. Make sure you're using 32-bit R and not 64-bit R as the javaHeclib.dll won't load in 64-bit R.

      First, I'd recommend trying my package, as it adds some helper functions to make the DSS interface easier to use. You can get from github.com/eheisman, or install with the devtools package by running:

      devtools::install_github("dss-rip","eheisman",args="--no-multiarch")

      With either the code you have above, or my package, the following to questions remain.

      1) Do you have HEC-DSSVue installed, or just HEC-RAS? Did you install it in the 'default' location, the one in the 'dss_location' string? If you set dss_location or options('dss_location') before loading DSS-Rip, you can change the default to wherever it is installed on your system.

      If you are using your code, you'll need to change both it and the 'libs' string to reflect the correct location.

      2) Have you loaded rJava before or through another package such as XLConnect? Usually this causes a problem because you can't modify the java.library.path after the JVM is already instantiated.

      Delete
  3. 1) I changed R to 32 bit version
    2) I restarted Rstudio
    3) i run devtools::install_github("dss-rip","eheisman",args="--no-multiarch")
    the result:
    * installing *source* package 'dssrip' ...
    ** R
    ** demo
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    * DONE (dssrip)
    4) i checked that i have DSS-View. Its located in "C:\Program Files (x86)\HEC\HEC-DSSVue"
    5) Loaded library(rJava) # this went well
    6) Everything after thsi went well.

    Thank you

    ReplyDelete
    Replies
    1. I'm glad to hear everything went well.

      You manage to install my R package! You can also use the following to simplify your code:

      myfile = opendss("path/to/my/file.dss")

      to open whatever your file is, and

      getFullTSC(myfile, "/A/B/C//E/F/") to read a path as an xts object, Leave out the D part to get the entire time series.

      Delete
  4. Hei

    Yes using the package was the easiest way to go :) Do you mind if i share my experience on my blog? http://rscriptsandtips.blogspot.no/. That way i can link to some gits with my code and also show some pictures!!

    ReplyDelete
    Replies
    1. Oh yes, go ahead! Might I ask what you're using DSS files in your R projects for?

      Delete
  5. Will your package be on cran at some point?

    ReplyDelete
  6. I'd like to put it up there eventually, but needs a few things done before I'd consider the quality to be good enough to submit to CRAN.

    A short version of the task list:
    - fix some known bugs
    - polish up the documentation.
    - add unit testing to automate testing of new builds

    A small problem is that it relies on DSS-Vue which isn't installed on CRAN's build servers, so any unit testing would fail with the 'NoClassDefFoundError' error.

    ReplyDelete
  7. I am using HEC-RAS for dam break modelling!! But the post-processing of data was tedious in excel for my reporting format!! I love working In R so i thought to try your method. I think that it is a break through for me.

    ReplyDelete
  8. Hi Evan,
    Thanks for putting this up, its made data post processing so much faster for us! I've run into a problem though when I want to load multiple dss files of the same name. The first time I bring in the file it works but when I try to bring in data from a different run of our model with the same file name (but from a different directory) it crashes, presumably because it thinks it already has the file . Is there a way to close all open DSS connections? Or otherwise avoid this problem?

    ReplyDelete
  9. Andrew,

    That sounds like a problem with the way DSSVue keeps track of open files. You can call "myDSSFile$close()" to tell DSSVue to close the file when you are done.

    ReplyDelete
    Replies
    1. Thanks so much for responding. That doesn't seem to work, I keep getting "myDSSfile" (actually output_23.dss or output_23) not found. Any other ideas?

      Is there a way to check what is being held open by DSSVue?

      Delete
    2. Not that I'm aware of. I haven't seen anything, specific to what you're looking for. The place I'd recommend you look is the scripting chapter (Ch8) of the DSSVue manual. It has an API reference, but it's for Java/Jython. Once you read in a DSS File using the `opendss` function I wrote, it's essentially an R list of methods for the HecDss object described in that document, so myDSSFile$close() in R is the same as myDSSFile.close() in Jython.

      Delete
  10. Thanks I'll look there and see if I can figure something out.

    ReplyDelete
  11. Hi Evan,
    Is there a way to write data back into a dss file using DSSRip? I only see documentation for the getfile functions.
    Thanks

    ReplyDelete
    Replies
    1. Andrew,
      The DSS file reference returned by opendss() is a rJava object that behaves just like the HecDss class used in Jython. You could create a TimeSeriesContainer class using the rJava interface, following the Jython example in the DSSVue manual (chapter 8, example 8 or so?) and use myDSSFile$put(myNewTSC) to write it. I wrote a couple of functions to help with this, including an undocumented xts.to.tsc function, but I can't promise it'll work for your case.

      Delete
  12. Hi Evan,

    Great work with the package and R commands. I've gotten a lot of use out of your dssrip package reading in DSS data for postprocessing. I'm looking to build on what you've done and use R to create and write to a DSS. So far I've been able to create a new file, and I believe I have the right commands to 'put' data into it. I'm having some trouble formatting the data appropriately. Looking at the documentation for Jython scripting the data has to be in a TimeSeriesContainer, and I'm not sure if the rJava / rJython package would support this. If need be I can have a separate script for writing out data to DSS, but I was looking to see if could do it from R. Have you looked at this at all, or have any suggestions?

    Thanks!

    Dan

    ReplyDelete
    Replies
    1. Dan,

      Writing to DSS is much harder than reading from DSS. as there is a lot of required metadata to assign to the TimeSeriesContainer. There is an undocumented "xts.to.tsc" method in dssrip that does this for daily data and you could use for an example to start with in R. I haven't had much luck with other interval data. You can use this as a start to create a TimeSeriesContainer that you can 'put' into a dss file object. I think Example 8 of the DSSvue Manual, Chapter 8, shows what metadata is required and how to assign it in Jython. It's very similar in R, other than needing the .jnew() function from rJava to construct the empty TimeSeriesContainer object. If you have example data like what you want to write already in DSS, calling 'get' to read the TSC and then explore it's attributes may help you figure out the correct metadata.

      Good luck,

      Evan

      Delete
    2. Thanks for the quick response! I was able to piece together something that works from your code and an example DSS file I had. Thanks again!

      Delete
  13. Hi! I am using your script to open my runs from Hec-HMS, since I have a lot of runs (i work with a time step of half an hour), I needed a script like this. The script works fine, however when i open it there is missing data in some runs. It is like in some runs (events of 5 days ) the last hours are missing. Is there any explanation?

    ReplyDelete
    Replies
    1. If you're using the script above and not the dssrip library (github.com/eheisman/dssrip), you might be seeing an artifact of how DSS stores data. It keeps timeseries in blocks, the length of which is determined by the E part / interval field. I don't recall what it uses for an interval of 30 minutes, but for 1hour it's 1 month per block, for daily data it is one calendar year per block. It pads the blocks with missing values when the data doesn't fill the block it is writing to.

      If you give dssrip a try, there is function in it called 'getFullTSC' that takes a blank D part (the date for the block or range for set of blocks) and should return only the valid values.

      Delete

Note: Only a member of this blog may post a comment.