XLConnect – A platform-independent interface to Excel

XLConnect is a comprehensive and platform-independent R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it does not require any installation of Microsoft Excel or any other special drivers to be able to read & write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE). Also, XLConnect can deal with the old *.xls (BIFF) and the new *.xlsx (Office Open XML) file formats. Under the hood, XLConnect uses Apache POI (http://poi.apache.org) – a Java API to manipulate Microsoft Office documents.

XLConnect‘s main features include (as of version 0.1-5):

  • Reading & writing of worksheets (via data.frames)
  • Reading & writing of named regions/ranges (via data.frames)
  • Creating/removing/renaming/cloning worksheets
  • Adding graphics
  • (Named) cellstyles: data formats, borders, back- and foreground fill color, fill pattern, text wrapping
  • Controlling sheet visibility
  • Defining column width and row height
  • Merging/unmerging cells
  • Setting/getting cell formulas
  • Defining formula recalculation behavior (when workbooks are opened)
  • Setting auto-filters
  • Style actions: controlling application of cell styles when writing (e.g. when using templates)
  • Defining behavior when error cells are encountered

In order to make the XLConnect package a bit more accessible, this post will run through an example whose use case is the generation of a very simple Excel report showing the recent development of currency exchange rates (EUR, USD, GBP, JPY vs Swiss Franc (CHF)). Note that this post does not intend to give any investment advise!

##################################################
# Installation of Packages Required by this Demo #
##################################################

# XLConnect: Excel Connector for R (that's what this demo is about)

# fImport: Rmetrics - Economical and Financial Data Import
# forecast: Forecasting functions for time series
# zoo: S3 Infrastructure for Regular and Irregular Time Series
# ggplot2: An implementation of the Grammar of Graphics

install.packages(c("XLConnect", "fImport", "forecast", "zoo", "ggplot2"))

#################
# Load Packages #
#################

require(XLConnect)
require(fImport)
require(forecast)
require(zoo)
require(ggplot2)

####################
# Data Preparation #
####################

# Currencies we're interested in compared to CHF
currencies = c("EUR", "USD", "GBP", "JPY")

# Fetch currency exchange rates (currency to CHF) from OANDA (last 366 days)
curr = do.call("cbind", args = lapply(currencies,
               function(cur) oandaSeries(paste(cur, "CHF", sep = "/"))))
# Make a copy for later use
curr.orig = curr
# Scale currencies to exchange rate on first day in the series (baseline)
curr = curr * matrix(1/curr[1,], nrow = nrow(curr),
                     ncol = ncol(curr), byrow = TRUE) - 1
# Some data transformations to bring the data into a simple data.frame
curr = transform(curr, Time = time(curr)@Data)
names(curr) = c(currencies, "Time")
# Cyclic shift to bring the Time column to the front
curr = curr[(seq(along = curr) - 2) %% ncol(curr) + 1]

# Let's do some predictions ...

# Number of days to predict
predictDays = 20
# For each currency ...
currFit = sapply(curr[, -1], function(cur) {
  as.numeric(forecast(cur, h = predictDays)$mean)
  # Note: Normally one would obviously do some model diagnostics first...
})
# Add Time column to predictions
currFit = cbind(
  Time = seq(from = curr[nrow(curr), "Time"],
             length.out = predictDays + 1, by = "days")[-1],
  as.data.frame(currFit))

# Bind actual data with predictions
curr = rbind(curr, currFit)

###########################################
# Writing Data to Excel / Excel Reporting #
###########################################

## Let's start simple!

# Workbook filename
wbFilename = "swiss_franc.xlsx"
# Create a new workbook
wb = loadWorkbook(wbFilename, create = TRUE)

# Create a new sheet named 'Swiss_Franc'
sheet = "Swiss_Franc"
createSheet(wb, name = sheet)
# Alternatively, in a more object-oriented style:
# wb$createSheet(name = sheet)

# Create a new Excel name referring to the top left corner
# of the sheet 'Swiss_Franc' - this name is going to hold
# our currency data
dataName = "currency"
createName(wb, name = dataName, formula = paste(sheet, "$A$1", sep = "!"))
# Alternatively:
# wb$createName(name = dataName, formula = paste(sheet, "$A$1", sep = "!"))

# Write the currency data to the named region created above
# Note: the named region will be automatically redefined to encompass all
# written data
writeNamedRegion(wb, data = curr, name = dataName, header = TRUE)
# Alternatively:
# wb$writeNamedRegion(data = curr, name = dataName, header = TRUE)

# Save the workbook (this actually writes the file to disk)
saveWorkbook(wb)

# Just dumping out some data to Excel is a very common case.
# Therefore, there is a simpler way of doing this:
if(file.exists(wbFilename))
  file.remove(wbFilename)
writeNamedRegionToFile(wbFilename, data = curr, name = dataName,
                       formula = paste(sheet, "$A$1", sep = "!"),
                       header = TRUE)
# Note: writeNamedRegionToFile automatically generates the required sheet
# and Excel name!

## In the following steps we will make the currency report a bit nicer...

# In addition to writing out the data, for each currency we want to
# highlight the points in time when there was a change of more than 2%
# compared to the previous day.

# Load the workbook created above
wb = loadWorkbook(wbFilename)

# Create a date cell style with a custom format for the Time column
# (only show year, month and day without any time fields)
csDate = createCellStyle(wb, name = "date")
setDataFormat(csDate, format = "yyyy-mm-dd")
# Create a time/date cell style for the prediction records
csPrediction = createCellStyle(wb, name = "prediction")
setDataFormat(csPrediction, format = "yyyy-mm-dd")
setFillPattern(csPrediction, fill = XLC$FILL.SOLID_FOREGROUND)
setFillForegroundColor(csPrediction, color = XLC$COLOR.GREY_25_PERCENT)
# Create a percentage cell style
# Number format: 2 digits after decimal point
csPercentage = createCellStyle(wb, name = "currency")
setDataFormat(csPercentage, format = "0.00%")
# Create a highlighting cell style
csHlight = createCellStyle(wb, name = "highlight")
setFillPattern(csHlight, fill = XLC$FILL.SOLID_FOREGROUND)
setFillForegroundColor(csHlight, color = XLC$COLOR.CORNFLOWER_BLUE)
setDataFormat(csHlight, format = "0.00%")

# Index for all rows except header row
allRows = seq(length = nrow(curr)) + 1

# Apply date cell style to the Time column
setCellStyle(wb, sheet = sheet, row = allRows, col = 1, cellstyle = csDate)
# Set column width such that the full date column is visible
setColumnWidth(wb, sheet = sheet, column = 1, width = 2800)
# Apply prediction cell style
setCellStyle(wb, sheet = sheet, row = tail(allRows, n = predictDays), col = 1,
             cellstyle = csPrediction)
# Apply number format to the currency columns
currencyColumns = seq(along = currencies) + 1
for(col in currencyColumns) {
  setCellStyle(wb, sheet = sheet, row = allRows, col = col,
               cellstyle = csPercentage)
}

# Check if there was a change of more than 2% compared to the previous day
# (per currency)
idx = rollapply(curr.orig, width = 2, FUN = function(x) abs(x[2] / x[1] - 1),
                by.column = TRUE) > 0.02
idx = rbind(rep(FALSE, ncol(idx)), idx)
widx = apply(as.data.frame(idx), which)
# Apply highlighting cell style
for(i in seq(along = currencies)) {
  if(length(widx[[i]]) > 0) {
    setCellStyle(wb, sheet = sheet, row = widx[[i]] + 1, col = i + 1,
                 cellstyle = csHlight)
  }
  # Note:
  # +1 for row since there is a header row
  # +1 for column since the first column is the time column
}

saveWorkbook(wb)

## In a next step we are going to add a currency graph to our report

wb = loadWorkbook(wbFilename)

# Stack currencies into a currency variable (for use with ggplot2 below)
gcurr = reshape(curr, varying = currencies, direction = "long",
                v.names = "Value", times = currencies, timevar = "Currency")
# Also add a discriminator column to differentiate between actual and
# prediction values
gcurr[["Type"]] = ifelse(gcurr$Time %in% currFit$Time, "prediction", "actual")

# Create a png graph showing the currencies in the context of the Swiss Franc
png(filename = "swiss_franc.png", width = 800, height = 600)
ggplot(gcurr, aes(Time, Value, colour = Currency, linetype = Type)) +
  geom_line() + stat_smooth(method = "loess") + xlab("") +
  scale_y_continuous("Change to baseline", formatter = "percent") +
  opts(title = "Currencies vs Swiss Franc",
       axis.title.y = theme_text(size = 10, angle = 90, vjust = 0.3))
dev.off()

# Define where the image should be placed via a named region;
# let's put the image two columns left to the data starting in the 5th row
createName(wb, name = "graph",
           formula = paste(sheet, idx2cref(c(5, ncol(curr) + 2)), sep = "!"))
# Note: idx2cref converts indices (row, col) to Excel cell references

# Put the image created above at the corresponding location
addImage(wb, filename = "swiss_franc.png", name = "graph",
         originalSize = TRUE)

# As a short side-note: summary also works on workbooks!
summary(wb)

saveWorkbook(wb)

## Want to add formulas to your reports? Here we go...

wb = loadWorkbook(wbFilename)

# Let's add the maximum downturn (= minimum) fur the currencies at the bottom
# of the data (offset by three rows)

# Get reference coordinates (corners) of the named region containing our data
corners = getReferenceCoordinates(wb, dataName)
# Excel area references to calculate the average for
areaRefs = idx2aref(t(sapply(currencyColumns,
                    function(col) c(corners[1,1] + 1, col,
                                    corners[2,1], col))))
# Note: corners[1,1] + 1 as we do not want to include the header row in
# the formula

# Construct the required formulas
avg = paste("MIN(", areaRefs,  ")", sep = "")

# Set cell formulas
setCellFormula(wb, sheet = sheet, row = corners[2,1] + 3,
               col = currencyColumns, formula = avg)
# Add some cell styling...
csAvg = createCellStyle(wb, name = "min")
setDataFormat(csAvg, format = "0.00%")
setBorder(csAvg, side = "top", type = XLC$BORDER.MEDIUM,
          color = XLC$COLOR.BLACK)
setCellStyle(wb, sheet = sheet, row = corners[2,1] + 3,
             col = currencyColumns, cellstyle = csAvg)

# And, as we are already at it - let's also add an Excel auto-filter to
# the columns
setAutoFilter(wb, sheet = sheet, reference = aref(corners[1,], corners[2,]))

saveWorkbook(wb)

# Depending on the size and complexity of reports to create, starting from
# scratch and coding everything in R can be cumbersome (especially cell styling).
# Therefore, XLConnect supports working with templates. The application of cell
# styles when writing data to Excel can be controlled via so-called "style
# actions" (see function setStyleAction).

#######################################
# Reading / Importing Data from Excel #
#######################################

# Reading (rectangular) data from Excel is straight forward.
# There is basically two ways:
# - reading data from worksheets via readWorksheet
# - reading data from named regions via readNamedRegion

# Generally, reading from named regions is the simplest and preferred way but
# requires a named region to be set up correspondingly. If this is not the case,
# readWorksheet can be used.

## Reading from named regions

wb = loadWorkbook(wbFilename)

# Read the named region that was created in the section above
data = readNamedRegion(wb, name = dataName, header = TRUE)
# Alternatively: wb$readNamedRegion(name = dataName, header = TRUE)
# Do some calculations ...
colMeans(data[, currencyColumns])

## Reading from worksheets

data = readWorksheet(wb, sheet = sheet, header = TRUE)
# Note: When not specifying any of the boundaries (startRow, startCol, endRow,
# endCol) XLConnect will try to determine the boundaries automatically.
# If there are several data regions on a worksheet, this may not give the desired
# result and you may need to specify some boundaries.
# Actually, in our example the above won't give the desired result as the data
# read in also includes the mean values computed via the Excel formulas.
tail(data)

# Therefore we need to do the following
data = readWorksheet(wb, sheet = sheet, startRow = -1, endRow = nrow(curr) + 1,
                     startCol = -1, endCol = -1, header = TRUE)
tail(data)
# Note: -1 means that the correspondingly values will be automatically
# determined. Currently it is still necessary to specify either none or all
# boundaries. This may be changed in the future to simplify the application of
# readWorksheet and to allow something like:
# data = readWorksheet(wb, sheet = sheet, endRow = nrow(curr) + 1,
#                      header = TRUE)

# Also note that for simple cases there are the functions
# readNamedRegionFromFile and readWorksheetFromFile that prevent you having to
# go via the sequence loadWorkbook & readNamedRegion/readWorksheet:
data = readNamedRegionFromFile(wbFilename, name = dataName, header = TRUE)

data = readWorksheetFromFile(wbFilename, sheet = sheet, startRow = -1,
                             endRow = nrow(curr + 1), startCol = -1,
                             endCol = -1, header = TRUE)

## Some additional syntactic sugar:

# In case a workbook contains only a few named data regions, you can use 'with'
# to reference them directly (via the name of the named region) without having
# to read them in manually first:
with(wb, {
  colMeans(currency[, currencyColumns])
})

# Note: 'with' should only be used if the number of named regions in the workbook
# is small, since 'with' basically reads in all named regions automatically.

###########################
# Handling of Error Cells #
###########################

# XLConnect supports two different schemes on how to deal with error cells
# (cells that produce an error when trying to be read):
# - immediately stop on the first occurrence on an error cell
# - produce warnings for erroring cells and assign NA as their value

# Read example workbook containing various forms of error cells
errorCell <- system.file("demoFiles/errorCell.xlsx", package = "XLConnect")
wb = loadWorkbook(errorCell)

# Define to produce warnings on erroring cells and assume error cells to be NA
# (this is the default behavior)
onErrorCell(wb, XLC$ERROR.WARN)
# Read named region containing error cells
data = readNamedRegion(wb, name = "MyData")
print(data)

# Define to stop immediately when encountering error cells
onErrorCell(wb, XLC$ERROR.STOP)
# Read named region containing error cells
data = readNamedRegion(wb, name = "MyData")

We hope that the above example succeeded in giving a better understanding of XLConnect and its capabilities. The package can easily be installed from CRAN via install.packages(“XLConnect”). You may also want to have a look at the numerous demos available via demo(package = “XLConnect”). E.g. in order to run the “writeNamedRegion” demo simply run demo(topic = “writeNamedRegion”, package = “XLConnect”).

We are always glad to receive feedback! For general feedback write us an email to xlconnect@mirai-solutions.com. If you find any bugs or hope to see some currently missing features in the future, just send us an email to xlconnect-bugs@mirai-solutions.com. You may also want to check out our XLConnect FAQ.

Further references & resources:

About these ads

About Mirai Solutions

Mirai Solutions is an independent consulting company with broad expertise in statistics, finance and IT. Together with our customers, we design and implement smart sustainable solutions ranging from small-scale software components or prototypes to enterprise-wide applications and platforms, positioning your organization for long-term success. Our interdisciplinary team can help you cover all aspects in developing the tools and processes you need to interpret complex statistical data and transform them into better business decisions. Be it highly customized and specialized advisory services and risk management solutions for financial services institutions, or more general statistical or IT consulting, or training courses on related topics, Mirai offers the appropriate services for all cases. At Mirai, we pride ourselves in the high level of customer satisfaction among all our existing clients.
This entry was posted in R, XLConnect. Bookmark the permalink.

3 Responses to XLConnect – A platform-independent interface to Excel

  1. Marta says:

    thanks a lot, very useful indeed

  2. Pingback: XLConnect 0.2-0 « Another Word For It

  3. c497606 says:

    Thank you. This article is very helpful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s