Re: [galaxy-user] plugging R into galaxy
The out.data is simply picked up by galaxy: As long as you define your variable capturing the output and also in your tool.xml file. Make sure the output is defined as "data" in the tool.xml file. Initially I thought the same thing, but the piping is not needed with '>', worse, it doesn't work like that. Freddy
OK. I have made some tests this morning to figure this all out. Indeed, the problem with ">" is that it catches every single message R sends to STDOUT and not only object of your interest, so this is definitely not a reliable solution. Solution suggests by Alex worked perfectly well. I would suggest also to use the 'trailingOnly=TRUE' option within the commandArgs() call in the R script. That allows you to only care about the args given after the '--args' option. You can then forget how many previous options you have in your command line (--vanilla, --slave, -f or others...). First parameter useful to your R script would then be commandArgs(trailingOnly=T)[1] and so on. There is just another little point that I would like to clarify. At the beginning my tests didn't work at all and I realize that I have to give the full path to the R script (in the command tag) to make it work. <command interpreter="bash"> r_script_wrapper.sh rcode.R --args $in_data $out_data </command> ### FAILURE <command interpreter="bash"> r_script_wrapper.sh /full/path/to/rcode.R --args $in_data $out_data </command> ### SUCCESS However, r_script_wrapper.sh and rcode.R are in the same directory. To shed a light on this, I made an `echo $PWD` in r_script_wrapper.sh and it returned : /path/to/galaxy_dist/database/job_working_directory/Num where Num is the number of the job submitted to the PBS queue. Is that a known feature ? here follows the content of r_script_wrapper.sh ----------------------- #!/bin/sh # Function that writes a message to stderr and exits fail() { echo "$@" >&2 exit 1 } # Ensure R executable is found which R > /dev/null || fail "'R' is required by this tool but was not found on path" # Extract first argument rcode=$1; shift # Ensure the file exists test -f $rcode || fail "R input file '$rcode' does not exist" # Invoke R R --vanilla --slave --file=$rcode --args $* Cheers Anthony On 11 November 2010 09:07, Freddy <freddy.debree@wur.nl> wrote:
The out.data is simply picked up by galaxy: As long as you define your variable capturing the output and also in your tool.xml file.
Make sure the output is defined as "data" in the tool.xml file.
Initially I thought the same thing, but the piping is not needed with '>', worse, it doesn't work like that.
Freddy
errata : there are no --args option in the <command> tag examples. <command interpreter="bash"> r_script_wrapper.sh rcode.R $in_data $out_data </command> ### FAILURE <command interpreter="bash"> r_script_wrapper.sh /full/path/to/rcode.R $in_data $out_data </command> ### SUCCESS On 12 November 2010 14:09, Anthony Ferrari <ferraria@gmail.com> wrote:
OK. I have made some tests this morning to figure this all out.
Indeed, the problem with ">" is that it catches every single message R sends to STDOUT and not only object of your interest, so this is definitely not a reliable solution. Solution suggests by Alex worked perfectly well.
I would suggest also to use the 'trailingOnly=TRUE' option within the commandArgs() call in the R script. That allows you to only care about the args given after the '--args' option. You can then forget how many previous options you have in your command line (--vanilla, --slave, -f or others...). First parameter useful to your R script would then be commandArgs(trailingOnly=T)[1] and so on.
There is just another little point that I would like to clarify. At the beginning my tests didn't work at all and I realize that I have to give the full path to the R script (in the command tag) to make it work.
<command interpreter="bash"> r_script_wrapper.sh rcode.R --args $in_data $out_data </command> ### FAILURE
<command interpreter="bash"> r_script_wrapper.sh /full/path/to/rcode.R --args $in_data $out_data </command> ### SUCCESS
However, r_script_wrapper.sh and rcode.R are in the same directory.
To shed a light on this, I made an `echo $PWD` in r_script_wrapper.sh and it returned :
/path/to/galaxy_dist/database/job_working_directory/Num
where Num is the number of the job submitted to the PBS queue.
Is that a known feature ?
here follows the content of r_script_wrapper.sh ----------------------- #!/bin/sh
# Function that writes a message to stderr and exits fail() { echo "$@" >&2 exit 1 }
# Ensure R executable is found which R > /dev/null || fail "'R' is required by this tool but was not found on path"
# Extract first argument rcode=$1; shift
# Ensure the file exists test -f $rcode || fail "R input file '$rcode' does not exist"
# Invoke R R --vanilla --slave --file=$rcode --args $*
Cheers
Anthony
On 11 November 2010 09:07, Freddy <freddy.debree@wur.nl> wrote:
The out.data is simply picked up by galaxy: As long as you define your variable capturing the output and also in your tool.xml file.
Make sure the output is defined as "data" in the tool.xml file.
Initially I thought the same thing, but the piping is not needed with '>', worse, it doesn't work like that.
Freddy
On Fri, Nov 12, 2010 at 1:09 PM, Anthony Ferrari <ferraria@gmail.com> wrote:
OK. I have made some tests this morning to figure this all out. Indeed, the problem with ">" is that it catches every single message R sends to STDOUT and not only object of your interest, so this is definitely not a reliable solution. Solution suggests by Alex worked perfectly well. I would suggest also to use the 'trailingOnly=TRUE' option within the commandArgs() call in the R script. That allows you to only care about the args given after the '--args' option. You can then forget how many previous options you have in your command line (--vanilla, --slave, -f or others...). First parameter useful to your R script would then be commandArgs(trailingOnly=T)[1] and so on.
That is a very handy tip, commandArgs(trailingOnly=T), thanks!
There is just another little point that I would like to clarify. At the beginning my tests didn't work at all and I realize that I have to give the full path to the R script (in the command tag) to make it work.
What happens if you do <command> rather than <command interpreter="bash">? Peter P.S. Cross posted to Galaxy-dev, could we continue this there?
On 12 November 2010 14:45, Peter <biopython@maubp.freeserve.co.uk> wrote:
OK. I have made some tests this morning to figure this all out. Indeed, the problem with ">" is that it catches every single message R sends to STDOUT and not only object of your interest, so this is definitely not a reliable solution. Solution suggests by Alex worked perfectly well. I would suggest also to use the 'trailingOnly=TRUE' option within the commandArgs() call in the R script. That allows you to only care about
args given after the '--args' option. You can then forget how many
On Fri, Nov 12, 2010 at 1:09 PM, Anthony Ferrari <ferraria@gmail.com> wrote: the previous
options you have in your command line (--vanilla, --slave, -f or others...). First parameter useful to your R script would then be commandArgs(trailingOnly=T)[1] and so on.
That is a very handy tip, commandArgs(trailingOnly=T), thanks!
There is just another little point that I would like to clarify. At the beginning my tests didn't work at all and I realize that I have to give the full path to the R script (in the command tag) to make it work.
What happens if you do <command> rather than <command interpreter="bash">?
This is worse. Here's what I got : An error occurred running this job: /var/spool/torque/mom_priv/jobs/ 77.node054.cluster.SC: line 11: ./r_script_wrapper.sh: No such file or directory This time even the wrapper is not found. And I have to specify both paths to make it work. (To r_script_wrapper.sh and rcode.R) Anthony Peter
P.S. Cross posted to Galaxy-dev, could we continue this there?
sure.
participants (3)
-
Anthony Ferrari
-
Freddy
-
Peter