plugging R into galaxy
Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with *cheetah template engine*. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
Absolutely. xy_plot is just an example of what is possible with config files. If you just want to call an R script that takes command line arguments, you can do that as well. Just use Rscript. -- jt James Taylor Assistant Professor Department of Biology Department of Mathematics & Computer Science Emory University On Nov 9, 2010, at 9:20 AM, Anthony Ferrari wrote:
Dear galaxy users,
My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy.
For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy.
I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with cheetah template engine. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example)
My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ?
Best regards,
Anthony
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Anthony, We had some discussion on the galaxy dev list about R. See for instance my posting of Oct 14th and several others as well (for ref sake a snippet is attached below). That example opens an R script outside tool config (in this case a script that opens an input file and generates some output file which is grabbed by galaxy). Hope this helps, Alex <command> R -slave --vanilla -f $in_r --args $in_data $out_data </command> <inputs> <param name="in_data" type="data" format="tabular" label="Test data file" /> <param name="in_r" type="data" format="text" label="R script to load and execute" /> </inputs> <outputs> <data name="out_data" type="data" format="tabular" label="R script output" /> </outputs> The R script provided will grab the args from the cmd line as you indiciated earlier: # R script file to grab input and output filenames from cmdline and just copy args <- commandArgs() output <- read.table(args[6], header=T) write.table(output,sep="\t",file=args[7],row.names=F) #end script Van: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user-bounces@lists.bx.psu.edu] Namens Anthony Ferrari Verzonden: dinsdag 9 november 2010 15:21 Aan: galaxy-user@lists.bx.psu.edu Onderwerp: [galaxy-user] plugging R into galaxy Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with cheetah template engine. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
Thanks to all for your comments. Alex, I have done something similar to what your suggest except that I use Rscript executable embedded in a script shell. There is something I don't understand when reading your example. How galaxy understands that is has to store your write.table() call in your $out_data variable ? Didn't you forget to redirect your R output stream with ">" ? <command> R –slave --vanilla -f $in_r --args $in_data > $out_data </command> Anthony On 9 November 2010 20:32, Bossers, Alex <Alex.Bossers@wur.nl> wrote:
Anthony,
We had some discussion on the galaxy dev list about R. See for instance my posting of Oct 14th and several others as well (for ref sake a snippet is attached below).
That example opens an R script outside tool config (in this case a script that opens an input file and generates some output file which is grabbed by galaxy).
Hope this helps,
Alex
<command>
R –slave --vanilla -f $in_r --args $in_data $out_data
</command>
<inputs>
<param name="in_data" type="data" format="tabular" label="Test data file" />
<param name="in_r" type="data" format="text" label="R script to load and execute" />
</inputs>
<outputs>
<data name="out_data" type="data" format="tabular" label="R script output" />
</outputs>
The R script provided will grab the args from the cmd line as you indiciated earlier:
# R script file to grab input and output filenames from cmdline and just copy
args <- commandArgs()
output <- read.table(args[6], header=T)
write.table(output,sep="\t",file=args[7],row.names=F)
#end script
*Van:* galaxy-user-bounces@lists.bx.psu.edu [mailto: galaxy-user-bounces@lists.bx.psu.edu] *Namens *Anthony Ferrari *Verzonden:* dinsdag 9 november 2010 15:21 *Aan:* galaxy-user@lists.bx.psu.edu *Onderwerp:* [galaxy-user] plugging R into galaxy
Dear galaxy users,
My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy.
For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy.
I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with *cheetah template engine*. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example)
My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ?
Best regards,
Anthony
Hi Anthony, good you have it going. The example showed how to load a user provided R script (which might be a security issue but ok for testing). We normally would specify the R script with path to the tools dir. This has the advantage you can change and test your script without having to refresh your running tools in galaxy. The example is correct. The Provided R script in this case gets both the arguments $in_data AND $out_data into R by the args[6 and 7]. So output of the R script is directly pulled and put into the correct expected file without any mv statements in bash shell or whatever. The > would be used if output is at STDOUT. Freddy made a good comment on the WARNINGS! You have to capture them and either trash them to /dev/null (or &-) or append them to a log file (usually STDERR can be captured by 2>&- or 2>>./somelogfile.log). If you don't take care of it they will give you a red history box even when all was fine and just a warning so you have to deal with that... accounts for more tools than just R by the way. Cheers, Alex ________________________________ Van: Anthony Ferrari [ferraria@gmail.com] Verzonden: woensdag 10 november 2010 19:12 Aan: Bossers, Alex CC: galaxy-user@lists.bx.psu.edu Onderwerp: Re: [galaxy-user] plugging R into galaxy Thanks to all for your comments. Alex, I have done something similar to what your suggest except that I use Rscript executable embedded in a script shell. There is something I don't understand when reading your example. How galaxy understands that is has to store your write.table() call in your $out_data variable ? Didn't you forget to redirect your R output stream with ">" ? <command> R –slave --vanilla -f $in_r --args $in_data > $out_data </command> Anthony On 9 November 2010 20:32, Bossers, Alex <Alex.Bossers@wur.nl<mailto:Alex.Bossers@wur.nl>> wrote: Anthony, We had some discussion on the galaxy dev list about R. See for instance my posting of Oct 14th and several others as well (for ref sake a snippet is attached below). That example opens an R script outside tool config (in this case a script that opens an input file and generates some output file which is grabbed by galaxy). Hope this helps, Alex <command> R –slave --vanilla -f $in_r --args $in_data $out_data </command> <inputs> <param name="in_data" type="data" format="tabular" label="Test data file" /> <param name="in_r" type="data" format="text" label="R script to load and execute" /> </inputs> <outputs> <data name="out_data" type="data" format="tabular" label="R script output" /> </outputs> The R script provided will grab the args from the cmd line as you indiciated earlier: # R script file to grab input and output filenames from cmdline and just copy args <- commandArgs() output <- read.table(args[6], header=T) write.table(output,sep="\t",file=args[7],row.names=F) #end script Van: galaxy-user-bounces@lists.bx.psu.edu<mailto:galaxy-user-bounces@lists.bx.psu.edu> [mailto:galaxy-user-bounces@lists.bx.psu.edu<mailto:galaxy-user-bounces@lists.bx.psu.edu>] Namens Anthony Ferrari Verzonden: dinsdag 9 november 2010 15:21 Aan: galaxy-user@lists.bx.psu.edu<mailto:galaxy-user@lists.bx.psu.edu> Onderwerp: [galaxy-user] plugging R into galaxy Dear galaxy users, My team is involved in the NGS field. We have our own local cluster and we are looking for a workflow management system. Currently we are trying to set up and test galaxy. For our quality control analysis pipeline, R statistical software will be used. So we want to be able to call R scripts from galaxy. I know that it is possible and the xy_plot.xml is a really good example (conditional & when, repeat tags). Also, in this example a <configfile> tag is used. Within this tag you put your R code. And moreover you can add lines beginning with only one '#' which will be interpreted with cheetah template engine. This is helpful to create dynamic content of your R code with respect to your submitted parameters. (an old screencast introduces this example) My question is : is there a way to avoid the <configfile> section and to let all the R code be outside the tool's xml config file? If yes, how can we specify an output file in the xml conf that will catch, for instance, a write.table() call in the R script ? Best regards, Anthony
participants (3)
-
Anthony Ferrari
-
Bossers, Alex
-
James Taylor