XLConnect 0.2-4

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-4, which is available from CRAN.

This newest release comes along with a number of new features:

  • Ability to read cached cell values. There is a new argument useCachedValues to methods readWorksheet and readNamedRegion. This is especially useful for cells with complex formulas that XLConnect (actually the underlying Apache POI library) cannot evaluate and so far would have resulted in NAs being returned. With this new option, XLConnect can be instructed to read the cached cell values instead of evaluating the cells – which should also result in improved runtime performance.
  • There are new arguments keep and drop to readWorksheet and readNamedRegion which allow filtering of columns before the data is imported to R. This can result in much better performance compared to importing the full data to R and then doing the subsetting in R.
  • There is a new vignette called XLConnect for the Impatient which provides a one page overview of XLConnect’s read/write methods – to get new users started quickly.
  • The bounding box detection algorithm for readWorksheet has been reworked to better deal with leading and trailing rows and columns of blank (empty) cells. In addition, new arguments autofitRow and autofitCol have been introduced that allow removal of leading and trailing rows and columns even in case boundaries have been specified by the user. This is useful in situations where the data is expected within certain given boundaries but the exact location is unknown.
  • String conversions when reading in data from Excel now respect the data format of the cells.
  • Support for numeric missing value identifiers has been added. See method setMissingValue for more information.
  • XLConnect 0.2-4 comes along with Apache POI 3.9 (see package dependency XLConnectJars)

A summary of these news can be found in the NEWS file on CRAN.

To see some of the new features in action, please find below some examples:

# Load the XLConnect package
require(XLConnect)

# Download sample data
download.file("http://miraisolutions.files.wordpress.com/2013/01/example.xlsx", destfile = "example.xlsx")

# Load the workbook
wb = loadWorkbook("example.xlsx")

# Read the data on the sheet named "data"
data = readWorksheet(wb, sheet = "data")
# Note all the NA values due to the cells with text containing
# 'some stuff'
print(head(data))

# Let's assume we only roughly know the data is located within some
# given boundaries (exact location would be startRow = 10,
# startCol = 8, endRow = 42, endCol = 20)
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23)
# However, XLConnect is capable to find the actual data region
print(head(data))
# --> This is due to the arguments autofitRow and autofitCol
# which default to TRUE

# Let's see what would happen if we put them to FALSE
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, autofitRow = FALSE,
                     autofitCol = FALSE)
# --> Well, it reads the region exactly as specified above,
# so we get NAs obviously.

# Going back to the previous version
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23)

# Did you note the column named 'complex' which is all NAs?
# This is due to the fact that the 'complex' column is computed
# using the BESSELJ function in Excel which is not yet implemented
# in Apache POI. However, we can still read the values by instructing
# XLConnect to read the _cached_ values rather than trying to evaluate
# the formula
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
print(head(data))

# Let's get on to filtering. Let's assume we are only interested
# in the columns "car", "mpg" and "wt". Generally, we could
# do something like this:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
dataSub = data[, c("car", "mpg", "wt")]
print(head(dataSub))
# What we do here is import ALL the data and subset the data
# in R. This is fine and no problem for small data sets. For
# large data sets, however, there might be a performance penalty.
# It may be better to the subsetting on the Java side directly:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE,
                     keep = c("car", "mpg", "wt"))
print(head(data))

# Similarly, instead of 'keeping' columns, you can also drop
# them:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE,
                     drop = c("disp", "hp", "drat", "carb"))
print(head(data))

# Now on to missing values. Noted the values 999999 in column
# "gear"? We might would want those to read in as NAs. How can
# we achieve that?

# Set the missing value identifier
setMissingValue(wb, value = 999999)
# Read in the data
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
# Here we go ...
print(head(data))

XLConnect is a comprehensive and cross-platform R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it does not require any installation of Microsoft Excel or any other special drivers to be able to read & write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE).

The package can easily be installed from CRAN via install.packages("XLConnect") (use install.packages("XLConnect", type = "source") on Mac OS X). In order to get started have a look at the XLConnect and XLConnect for the Impatient package vignettes, the numerous demos available via demo(package = "XLConnect") or browse through the comprehensive reference manual.

We are always happy to hear from you! For general feedback write us an email to xlconnect@mirai-solutions.com. If you find any bugs or hope to see some currently missing features in the future, just send us an email to xlconnect-bugs@mirai-solutions.com. You may also want to check out our XLConnect FAQ.

About these ads

About Mirai Solutions

Mirai Solutions is an independent consulting company with broad expertise in statistics, finance and IT. Together with our customers, we design and implement smart sustainable solutions ranging from small-scale software components or prototypes to enterprise-wide applications and platforms, positioning your organization for long-term success. Our interdisciplinary team can help you cover all aspects in developing the tools and processes you need to interpret complex statistical data and transform them into better business decisions. Be it highly customized and specialized advisory services and risk management solutions for financial services institutions, or more general statistical or IT consulting, or training courses on related topics, Mirai offers the appropriate services for all cases. At Mirai, we pride ourselves in the high level of customer satisfaction among all our existing clients.
This entry was posted in R, XLConnect. Bookmark the permalink.

6 Responses to XLConnect 0.2-4

  1. Pingback: Read Excel files to R with XLConnect: run out of Java memoryQueryBy | QueryBy, ejjuit, query, query by, queryby.com, android doubt, ios question, sql query, sqlite query, nodejsquery, dns query, update query, insert query, kony, mobilesecurity, postquery,

  2. Walter says:

    Hi,
    I am currently having some issues when trying to run code that works fine in Console and/or RStudio, but when run through Rscript (R scripting front-end version 3.0.3 (2014-03-06));
    it breaks with follwoing message:
    Loading required package: XLConnect
    XLConnect 0.2-7 by Mirai Solutions GmbH
    http://www.mirai-solutions.com ,
    http://miraisolutions.wordpress.com
    Error in e$jobj$printStackTrace :
    could not find function “loadMethod”
    Calls: Some_func … tryCatch -> tryCatchList -> tryCatchOne -> -> $
    execution halted

    Anyone experienced similar problems or has a solution at hand?
    Greetings, Walter

  3. Hi Walter, do you have a small reproducible example that illustrates the problem? This would help to investigate the potential issue. Also what is the output of sessionInfo() in your case?

  4. And a final remark, before I am silently awaiting your opinion:
    R CMD BATCH is a valid workaround for using Rscript, as with that the script runs without problems.
    I remember that there was some issue with tryCatch and Rscript a year or so ago (version R-2.14.x) but I thought it had been solved prior to R-3.
    Seems that it is not the case or that something in XLConnect is “provoking” that error again.

  5. Sorry there are some typos in the last reply … here come the post again
    Here is the script:
    ————————————————————————————-
    #===========File: XLC_Test.R=========================
    require(XLConnect)
    file.copy(system.file(“demoFiles/template2.xlsx”,
    package = “XLConnect”),
    “datatype.xlsx”, overwrite = TRUE)
    wb <- loadWorkbook("datatype.xlsx")
    # Get current (existing) cell style for numerics
    cs <- getCellStyleForType(wb, XLC$"DATA_TYPE.NUMERIC")
    #Alternatively the following line results in the same error with Rscript
    #cs <- getCellStyleForType(wb, "numeric")
    wbRscript XLC_Test.R
    # below the DOS-shell output:
    Loading required package: XLConnect
    XLConnect 0.2-7 by Mirai Solutions GmbH
    http://www.mirai-solutions.com ,
    https://miraisolutions.wordpress.com
    [1] TRUE
    Error in object@jobj$getCellStyleForDataType :
    could not find function “loadMethod”
    Calls: getCellStyleForType … tryCatch -> tryCatchList -> tryCatchOne -> doTryCatch -> $
    Execution halted
    ————————————————————————————-
    Here sessionInfo():
    ————————————————————————————-
    > sessionInfo()
    R version 3.0.3 (2014-03-06)
    Platform: i386-w64-mingw32/i386 (32-bit)

    locale:
    [1] LC_COLLATE=English_United Kingdom.1252
    [2] LC_CTYPE=English_United Kingdom.1252
    [3] LC_MONETARY=English_United Kingdom.1252
    [4] LC_NUMERIC=C
    [5] LC_TIME=English_United Kingdom.1252

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    Greetings, Walter
    PS: pls delete/remove the last comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s