Tool needs a particular file extension
Hi, I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav). Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is. I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>: ln $signal ${signal}.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol OUT=$output SIGNAL=${signal}.wav resulting in the job command line: ln /tmp/tmp7AZvx7/files/000/dataset_2.dat /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav I’m getting an error message from the tool: sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found this suggests that the hard link didn’t get made. I tried copying the file instead but got the same result. I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible. Any pointers appreciated. Steve — Department of Computing, Macquarie University http://web.science.mq.edu.au/~cassidy
Hi Steve, try do like this example [1] [1] https://github.com/leobiscassi/autodock_vina_tools/blob/dev/tools/prepare_li... Cheers, On Fri, Oct 21, 2016 at 2:48 PM Steve Cassidy <steve.cassidy@mq.edu.au> wrote:
Hi, I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav). Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is.
I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>:
ln $signal* ${signal}.wav* & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol OUT=$output *SIGNAL=${signal}.wav*
resulting in the job command line:
ln /tmp/tmp7AZvx7/files/000/dataset_2.dat */tmp/tmp7AZvx7/files/000/dataset_2.dat.wav* & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL= */tmp/tmp7AZvx7/files/000/dataset_2.dat.wav*
I’m getting an error message from the tool:
*sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found * this suggests that the hard link didn’t get made. I tried copying the file instead but got the same result.
I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible.
Any pointers appreciated.
Steve — Department of Computing, Macquarie University http://web.science.mq.edu.au/~cassidy
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Best regards, *Léo Biscassi*
Using a soft link for this is a common pattern, and should be followed with && (ideally using XML CDATA to avoid escaping everything like & etc), and quote the filenames just in case there are any spaces. e.g. https://github.com/galaxyproject/tools-iuc/blob/master/tools/trinity/run_de_... For reference, in tools-iuc there are over 400 soft link examples: $ grep "ln -s" tools/*/*.xml | wc -l 446 Peter On Fri, Oct 21, 2016 at 5:48 PM, Steve Cassidy <steve.cassidy@mq.edu.au> wrote:
Hi, I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav). Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is.
I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>:
ln $signal ${signal}.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol OUT=$output SIGNAL=${signal}.wav
resulting in the job command line:
ln /tmp/tmp7AZvx7/files/000/dataset_2.dat /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav
I’m getting an error message from the tool:
sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found
this suggests that the hard link didn’t get made. I tried copying the file instead but got the same result.
I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible.
Any pointers appreciated.
Steve — Department of Computing, Macquarie University http://web.science.mq.edu.au/~cassidy
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Thanks all, it seems that my real problem is that the audio file (.wav) is not being identified as a valid datatype and ending up as a zero length text file. So, I need to start to explore the world of datatypes. Following the docs (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can modify datatypes_conf.xml in my Galaxy sources and add a new datatype for wav files: <datatype extension="wav" type="galaxy.datatypes.binary:Binary" display_in_upload="true" mimetype="audio/wav" subclass="True”/> but, I get a message "The uploaded binary file contains inappropriate content” and a zero length file just as I did before adding this - although the datatype is now set to ‘wav’. I didn’t add a sniffer for this and set the datatype explicitly on upload. Also, this doesn’t seem like a modular way to add datatypes - how do I include datatypes in my tool definition? I can see from some other tools that I include a datatypes_conf.xml in my tool folder. When I try that and test with planemo the new type isn’t found. Pointers welcome. Thanks, Steve — Department of Computing, Macquarie University http://web.science.mq.edu.au/~cassidy On 21 Oct. 2016, at 12:58 pm, Peter Cock <p.j.a.cock@googlemail.com<mailto:p.j.a.cock@googlemail.com>> wrote: Using a soft link for this is a common pattern, and should be followed with && (ideally using XML CDATA to avoid escaping everything like & etc), and quote the filenames just in case there are any spaces. e.g. https://github.com/galaxyproject/tools-iuc/blob/master/tools/trinity/run_de_... For reference, in tools-iuc there are over 400 soft link examples: $ grep "ln -s" tools/*/*.xml | wc -l 446 Peter On Fri, Oct 21, 2016 at 5:48 PM, Steve Cassidy <steve.cassidy@mq.edu.au> wrote: Hi, I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav). Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is. I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>: ln $signal ${signal}.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol OUT=$output SIGNAL=${signal}.wav resulting in the job command line: ln /tmp/tmp7AZvx7/files/000/dataset_2.dat /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav I’m getting an error message from the tool: sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found this suggests that the hard link didn’t get made. I tried copying the file instead but got the same result. I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible. Any pointers appreciated. Steve — Department of Computing, Macquarie University http://web.science.mq.edu.au/~cassidy ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Steve, You are on the right track, but something in the WAV file has triggered one of Galaxy's security protections to try to block uploading of potentially dangerous files. There may be some settings here you can relax - I've not had to deal with this myself. Peter On Fri, Oct 21, 2016 at 8:55 PM, Steve Cassidy <steve.cassidy@mq.edu.au> wrote:
Thanks all, it seems that my real problem is that the audio file (.wav) is not being identified as a valid datatype and ending up as a zero length text file. So, I need to start to explore the world of datatypes.
Following the docs (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can modify datatypes_conf.xml in my Galaxy sources and add a new datatype for wav files:
<datatype extension="wav" type="galaxy.datatypes.binary:Binary" display_in_upload="true" mimetype="audio/wav" subclass="True”/>
but, I get a message "The uploaded binary file contains inappropriate content” and a zero length file just as I did before adding this - although the datatype is now set to ‘wav’.
I didn’t add a sniffer for this and set the datatype explicitly on upload.
Also, this doesn’t seem like a modular way to add datatypes - how do I include datatypes in my tool definition? I can see from some other tools that I include a datatypes_conf.xml in my tool folder. When I try that and test with planemo the new type isn’t found.
Pointers welcome.
Thanks,
Steve
Hi Steve, Galaxy try to sniff the data to guess the appropriate datatype. For Binaries, if any datatype (sniffer) is found from https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary... <https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary.py> , you get this message "The binary uploaded file contains inappropriate content." For your wave file, you will have to add a class (https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary... <https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary.py#L1334>) for the wave format and implement a sniffer (https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary... <https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary.py#L1354>) with a test if/else https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary... <https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary.py#L1358> Typically for Binary, you can get the first n bytes which used to be a text and check it’s equal to, i hope, "wave". There are bunch of example in the file. And finally, a Pull Request on https://github.com/galaxyproject/galaxy <https://github.com/galaxyproject/galaxy> :) Good luck Gildas ----------------------------------------------------------------- Gildas Le Corguillé - Bioinformatician/Bioanalyste Plateform ABiMS (Analyses and Bioinformatics for Marine Science) http://abims.sb-roscoff.fr <http://abims.sb-roscoff.fr/> Member of the Workflow4Metabolomics project http://workflow4metabolomics.org <http://workflow4metabolomics.org/> Station Biologique de Roscoff - UPMC/CNRS - FR2424 Place Georges Teissier 29680 Roscoff FRANCE tel: +33 2 98 29 23 81 ------------------------------------------------------------------
Le 22 oct. 2016 à 12:36, Peter Cock <p.j.a.cock@googlemail.com> a écrit :
Hi Steve,
You are on the right track, but something in the WAV file has triggered one of Galaxy's security protections to try to block uploading of potentially dangerous files. There may be some settings here you can relax - I've not had to deal with this myself.
Peter
On Fri, Oct 21, 2016 at 8:55 PM, Steve Cassidy <steve.cassidy@mq.edu.au> wrote:
Thanks all, it seems that my real problem is that the audio file (.wav) is not being identified as a valid datatype and ending up as a zero length text file. So, I need to start to explore the world of datatypes.
Following the docs (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can modify datatypes_conf.xml in my Galaxy sources and add a new datatype for wav files:
<datatype extension="wav" type="galaxy.datatypes.binary:Binary" display_in_upload="true" mimetype="audio/wav" subclass="True”/>
but, I get a message "The uploaded binary file contains inappropriate content” and a zero length file just as I did before adding this - although the datatype is now set to ‘wav’.
I didn’t add a sniffer for this and set the datatype explicitly on upload.
Also, this doesn’t seem like a modular way to add datatypes - how do I include datatypes in my tool definition? I can see from some other tools that I include a datatypes_conf.xml in my tool folder. When I try that and test with planemo the new type isn’t found.
Pointers welcome.
Thanks,
Steve
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (4)
-
Gildas Le Corguillé
-
Léo Biscassi
-
Peter Cock
-
Steve Cassidy