
On Thu, Feb 17, 2011 at 8:07 AM, Peter Cock <p.j.a.cock@googlemail.com>wrote:
On Thu, Feb 17, 2011 at 12:37 PM, Sean Davis wrote:
On Thu, Feb 17, 2011 at 5:48 AM, Peter wrote:
Once in Galaxy all the data files have the extension .dat on disk, so I would try using a wrapper script that creates a symbolic link from the input.dat file to something like input.pdb or input.ent (and if that doesn't work, copy the file) before running the compiled code and then remove it afterwards.
Hi, Peter. I ended up doing just that. The hack in all its messiness is here: https://gist.github.com/831017
I would be wary of using ${input.name} like that - test with things like renaming the dataset in Galaxy, and pasting in a PBD file rather than uploading one. Also I suspect you can get filenames with spaces in them which will probably cause trouble. You'll notice that Galaxy generates its own *.dat filename which avoid spaces.
Personally I would generate the *.pdb or *.ent filename within the wrapper script based on the input file name (*.dat). Try:
Unfortunately, the command-line executable assumes that the filename contains the ID of the PDB record, so I actually need this right now. I'm going to have a chat with the command-line tool developer about designing a more robust interface.
os.symlink(fname,fname+".pdb") ... symdcmd = "SymD %s.pdb" % fname
Separately from this, you may need to extend Galaxy to define pdb as a new file format (ideally with a data type sniffer).
This kind of question is better asked on the dev list (CC'dd)
Thanks. That is the next step.
I haven't done this myself yet (but I may well need to before long).
I extended based on filename extension and added the datatype to data.py. This works like a charm, but it isn't foolproof, obviously (no sniffer yet). The PDB format isn't too complicated, but it is flexible, so I need to find out exactly what is required as opposed to "possible". I see that biopython has a class and parser for it, so I might be able to use that rather directly. Sean