I am trying to write a Galaxy tool that will output an Excel file.
Currently the tool wrapper calls an R script, which uses the "xlsx" package to read and write to and from Excel files.
After being invoked by Galaxy, the script is able to successfully read an input Excel file:
suppressPackageStartupMessages(library(xlsx, quietly=TRUE));
raw.data <- read.xlsx(commandArgs(trailingOnly=TRUE)[1]), sheetName="input_records");
The script then does its work on the data just read in. Then, when the work is done, I'd like to output the results to an Excel file. Here's where I run into a problem. I first tried to output the result like this:
write.xlsx(processed.data, file=commandArgs(trailingOnly=TRUE)[2]);
but write.xlsx throws an error. Looks like ".xls" and ".xlsx" are the only legal file extensions for the write.xlsx() function. Inspecting commandArgs(trailingOnly=TRUE)[2] shows that Galaxy provided a filename extension of ".dat".
I tried a workaround using this:
write.xlsx(processed.data, file=paste(commandArgs(trailingOnly=TRUE)[2]), ".xls", sep="");
The write.xlsx function no longer throws an error (since the file name supplied as a parameter has an acceptable file extension) but now Galaxy won't display the result in the History. If I look in Galaxy's database I see two files there corresponding to my tool's output. Supposing that Galaxy assigned the result the name "dataset_87", then looking in the Galaxy database I see:
dataset_87.dat
dataset_87.dat.xls
"dataset_87.dat" is empty but visible to Galaxy (and so displayed as an empty dataset in the History window). "dataset_87.dat.xls" has the results I want but isn't displayed by Galaxy in the History window.
There might be R libraries that can write xls or xlsx files without requiring a ".xls" or ".xlsx" file extension. That's a path I'm willing to explore. That said, is there a configuration option I can set that'll let me continue to use write.xlsx()?
I've been working from the suggestions made on this helpful thread:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-December/007807.html
so the relevant entries in datatypes_conf.xml are:
<datatype extension="xls" type="galaxy.datatypes.binary.Xls" display_in_upload="true" />
<datatype extension="xlsx" type="galaxy.datatypes.binary.Xlsx" display_in_upload="true" />
and the relevant additions to binary.py are:
class Xls(Binary):
'''Class describing an Excel 2003 (xls) file'''
file_ext='xls'
class Xlsx(Binary):
'''Class describing an Excel 2007 (xlsx) file'''
file_ext='xlsx'
Any suggestions appreciated.
Thanks, Patrick