On Thu, Feb 17, 2011 at 8:07 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Thu, Feb 17, 2011 at 12:37 PM, Sean Davis wrote:
>
> On Thu, Feb 17, 2011 at 5:48 AM, Peter wrote:
>>
>> Once in Galaxy all the data files have the extension .dat on disk, so
>> I would try using a wrapper script that creates a symbolic link from the
>> input.dat file to something like input.pdb or input.ent (and if that
>> doesn't
>> work, copy the file) before running the compiled code and then remove
>> it afterwards.
>>
>
> Hi, Peter.  I ended up doing just that.  The hack in all its messiness is
> here:
> https://gist.github.com/831017

I would be wary of using ${input.name} like that - test with things
like renaming the dataset in Galaxy, and pasting in a PBD file
rather than uploading one. Also I suspect you can get filenames
with spaces in them which will probably cause trouble. You'll
notice that Galaxy generates its own *.dat filename which avoid
spaces.

Personally I would generate the *.pdb or *.ent filename within
the wrapper script based on the input file name (*.dat). Try:


Unfortunately, the command-line executable assumes that the filename contains the ID of the PDB record, so I actually need this right now.  I'm going to have a chat with the command-line tool developer about designing a more robust interface.

 
os.symlink(fname,fname+".pdb")
...
symdcmd = "SymD %s.pdb" % fname


>>
>> Separately from this, you may need to extend Galaxy to define pdb
>> as a new file format (ideally with a data type sniffer).
>>
>> This kind of question is better asked on the dev list (CC'dd)
>>
>
> Thanks.  That is the next step.

I haven't done this myself yet (but I may well need to before long).


I extended based on filename extension and added the datatype to data.py.  This works like a charm, but it isn't foolproof, obviously (no sniffer yet).  The PDB format isn't too complicated, but it is flexible, so I need to find out exactly what is required as opposed to "possible".  I see that biopython has a class and parser for it, so I might be able to use that rather directly. 

Sean