Hi, I'm implementing my own tools. In one tool I may have hundreds of input files. (or maybe I'm doing it wrong?) I'm copying off the tool conf xml code from the "concatenate dataset" tool. But this requires adding files one by one. Is there a quicker way in which I can just specify these files from the job number? Say I want data from job 42 to 323, something like that. I haven't attempt to run the tool yet. I'm also curious if there is a limit to how long the command can be at the shell level. I suspect the tool won't be able to run especially with each of the galaxy files is designated in the absolute path. Timothy
On Thu, Sep 22, 2011 at 9:14 AM, Timothy Wu <2huggie@gmail.com> wrote:
Hi,
I'm implementing my own tools. In one tool I may have hundreds of input files.
That will be tricking in Galaxy - and unless there is some clever functionality I'm not aware of, very tedious for the end user. I think they'd have to manually select each file from a combo box (drop down select menu). Even if a single multi-select combobox could be used (can it? not sure off hand), it would be easy to make a mistake and miss out a file. How does your tool handle this at the command line (ignoring Galaxy)? Does it expect a directory name or pattern, or just a really long command line string with many many file names?
... I'm also curious if there is a limit to how long the command can be at the shell level. I suspect the tool won't be able to run especially with each of the galaxy files is designated in the absolute path.
Yes, there is an OS specific limit to the length of a command line string. If you do simply have hundreds of filenames in the command line string, you are likely to hit this limit - especially with absolute paths. Peter
On Thu, Sep 22, 2011 at 4:36 PM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
How does your tool handle this at the command line (ignoring Galaxy)? Does it expect a directory name or pattern, or just a really long command line string with many many file names?
Originally I have this config text file which specify a directory. And scripts will look into this directory for specific file name patterns. Since galaxy specifies its own file names, the pattern would not work. I'm actually tailoring my tools for galaxy because my original design is not flexible and it's just not well thought out. With galaxy I'm pretty happy that I get to split my tools up to be more fine-grained to attempt to stick to the Unix tool's "Write programs that do one thing and do it well" philosophy (well, more to the "one" part than to the "well" part). I am thinking of a few work-arounds. 1. Assuming that there is only one user, I could have the user specifies the first file, and than the number of files that would also be inputs, and I can have the tool figure out the file paths from the path of the first, plus the number increments. 2. For the tool prior to this, which generates these files (actually a FTP download tool which downloads .tar.gz), I would have it also to unzip and untar and than concat them. 3. For the tool prior to this, if there is anyway the tool would know which file names it is writing to. (According to what I know, it does not, not according to what's specified under "Number of Output datasets cannot be determined until tool run" ( http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files ), than it can output a text file which list the paths of the file. The subsequent tool can take this single file as input. I don't like 1 since it requires that service is used as a single user (otherwise the numbering could mess up). I don't like 2, since it violates the principle of Unix tools. It doesn't seem like its the design decision the Galaxy team would take. Furthermore, I think unzipping is unnecessarily taking up disk space. My program just parses directly off the gzip, but without unzipping I don't know how to reasonably concat. I like 3 best, but I do not seem to know the paths of the outputs since it's Galaxy which is silently moving and renaming the files behind the scene. Any suggestions? Timothy
On Thu, Sep 22, 2011 at 4:36 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
How does your tool handle this at the command line (ignoring Galaxy)? Does it expect a directory name or pattern, or just a really long command line string with many many file names?
Originally I have this config text file which specify a directory. And
On Friday, September 23, 2011, Timothy Wu <2huggie@gmail.com> wrote: scripts will look into this directory for specific file name patterns. Since galaxy specifies its own file names, the pattern would not work.
I'm actually tailoring my tools for galaxy because my original design is
not flexible and it's just not well thought out. With galaxy I'm pretty happy that I get to split my tools up to be more fine-grained to attempt to stick to the Unix tool's "Write programs that do one thing and do it well" philosophy (well, more to the "one" part than to the "well" part).
I am thinking of a few work-arounds.
Do they need to be separate input files, or could they be concatenated? For example, rather than feeding in 10 individual FASTA files (each with multiple sequences), could it take a single concatenated FASTA file? That may not make sense for your tool, but it seems a more elegant solution if it does. Then the question becomes how best to prepare the big FASTA input within Galaxy ;-) Peter
participants (2)
-
Peter Cock
-
Timothy Wu