So one of my colleagues has a script he wants to turn into a Galaxy tool. The twist is that script:
1. Looks for files with a fixed name (e.g. “params.txt”)
2. Accepts other file names as commandline arguments, but the actual names of those files has arguments embedded in it (e.g. “nuc_100iter_b.fasta” for nucleotide data in fasta format to be run against model b for 100 iterations.)
I know, awkward and clumsy. But hardly unique for many historical bioinformatic tools. Anyway, the challenge for me is to pick the easiest path to port this script to a tool. And it seems to be fairly awkward under the Galaxy model as I understand it. Possibilities:
1. Rewrite the script argument parsing and invocation. Obviously, there will be resistance to this and with some justification (“I thought you said this could wrap any command line program …”)
2. Write a script that calls the original script after moving and renaming files according to desired arguments. Any problems with a two-script/executable tool like this? How do I specify the interpreter for both parts of the script?
3. Use config files for the fixed name files. But configuration files seem to be given a random not fixed name, correct?
4. For the file names with semantic content, extract that from the dataset metadata. Of course, then it still has to be passed to the original script somehow.
5. Use <code>
Ideas, suggestions? Obviously a rewrite is the “best” solution, but in this case we might be looking for the quickest …
----
Paul Agapow (paul-michael.agapow@hpa.org.uk)
Bioinformatics, Centre for Infections, Health Protection Agency