Getting (or setting) physical file name
So one of my colleagues has a script he wants to turn into a Galaxy tool. The twist is that script: 1. Looks for files with a fixed name (e.g. "params.txt") 2. Accepts other file names as commandline arguments, but the actual names of those files has arguments embedded in it (e.g. "nuc_100iter_b.fasta" for nucleotide data in fasta format to be run against model b for 100 iterations.) I know, awkward and clumsy. But hardly unique for many historical bioinformatic tools. Anyway, the challenge for me is to pick the easiest path to port this script to a tool. And it seems to be fairly awkward under the Galaxy model as I understand it. Possibilities: 1. Rewrite the script argument parsing and invocation. Obviously, there will be resistance to this and with some justification ("I thought you said this could wrap any command line program ...") 2. Write a script that calls the original script after moving and renaming files according to desired arguments. Any problems with a two-script/executable tool like this? How do I specify the interpreter for both parts of the script? 3. Use config files for the fixed name files. But configuration files seem to be given a random not fixed name, correct? 4. For the file names with semantic content, extract that from the dataset metadata. Of course, then it still has to be passed to the original script somehow. 5. Use <code> Ideas, suggestions? Obviously a rewrite is the "best" solution, but in this case we might be looking for the quickest ... ---- Paul Agapow (paul-michael.agapow@hpa.org.uk) Bioinformatics, Centre for Infections, Health Protection Agency ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************
On Thu, Sep 15, 2011 at 6:11 PM, Paul-Michael Agapow <Paul-Michael.Agapow@hpa.org.uk> wrote:
So one of my colleagues has a script he wants to turn into a Galaxy tool. The twist is that script:
1. Looks for files with a fixed name (e.g. “params.txt”)
2. Accepts other file names as commandline arguments, but the actual names of those files has arguments embedded in it (e.g. “nuc_100iter_b.fasta” for nucleotide data in fasta format to be run against model b for 100 iterations.)
I know, awkward and clumsy. But hardly unique for many historical bioinformatic tools. Anyway, the challenge for me is to pick the easiest path to port this script to a tool. And it seems to be fairly awkward under the Galaxy model as I understand it. Possibilities:
1. Rewrite the script argument parsing and invocation. Obviously, there will be resistance to this and with some justification (“I thought you said this could wrap any command line program …”)
If this is your own tool, this is the cleanest solution and helps beyond just using it within Galaxy.
2. Write a script that calls the original script after moving and renaming files according to desired arguments. Any problems with a two-script/executable tool like this?
That's what I'd go for - a wrapper script which takes command line arguments like a sane command line tool, and uses them to prepare the input files for the weird script. Your tool should automatically be called from a temp working directory so you can probably just make the specially named files right there, and try using links to alias the input files rather than copying them (faster as less IO).
How do I specify the interpreter for both parts of the script?
If your script is marked as executable with a suitable hash bang, it doesn't even need the Galaxy interpreter in the XML file. For the internal script it doesn't matter at all - Galaxy doesn't need to know. Peter
participants (2)
-
Paul-Michael Agapow
-
Peter Cock