Limits for enumerate for multiple input files
Dear all I am using this control to allow the user to input multiple files: <param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" /> and I am using this for loop in the cheetah code to access the control: <command interpreter="bash"> script.sh ... #for $i, $input_vcf in enumerate( $input_vcfs ): "${input_vcf}", #end for </command> It appears that when a user selects many files (25 in this case) the bash command in the command tag never gets executed. Therefore the job is never queued. The history item shows 'Waiting to run' indefinitely. Calling the script.sh manually with 25 input files works fine. Any hint as to how to debug this would be greatly appreciated. Thanks a lot Ulf ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
On Tue, Apr 22, 2014 at 3:02 PM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear all
I am using this control to allow the user to input multiple files:
<param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" />
and I am using this for loop in the cheetah code to access the control: <command interpreter="bash"> script.sh ... #for $i, $input_vcf in enumerate( $input_vcfs ): "${input_vcf}", #end for </command>
The $i and enumerate seem unnecessary here.
It appears that when a user selects many files (25 in this case) the bash command in the command tag never gets executed. Therefore the job is never queued. The history item shows 'Waiting to run' indefinitely. Calling the script.sh manually with 25 input files works fine.
Any hint as to how to debug this would be greatly appreciated.
Thanks a lot Ulf
Can you see anything in the log about the job, and in particular the command line it would attempt to run? Peter
Hi Peter I removed the unnecessary code. If I run the tool with just a couple of inputs I see entries in the log files either from galaxy.jobs.runners.drmaa or from galaxy.jobs.runners.local that the job is being dispatched as normal. Unfortunately there is no sign of the job in the log files when using more input files. The command line that is supposed to be run is: bash home/galaxy/galaxy-dist/tools/vcf_processing/vcf_to_fasta.sh /galaxy/database/files/042/dataset_42275.dat 40 10 50 0 40 0.9 20 /galaxy/database/files/041/dataset_41720.dat, /galaxy/database/files/041/dataset_41980.dat, the first dat file being the output and the ones at the end being a comma separated list of the input files. On the command line this command works with much longer input files lists. Any ideas? Or is there a better practice to pass a large number of input files to a bash script? Thanks Ulf On 22/04/14 15:26, Peter Cock wrote:
On Tue, Apr 22, 2014 at 3:02 PM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear all
I am using this control to allow the user to input multiple files:
<param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" />
and I am using this for loop in the cheetah code to access the control: <command interpreter="bash"> script.sh ... #for $i, $input_vcf in enumerate( $input_vcfs ): "${input_vcf}", #end for </command>
The $i and enumerate seem unnecessary here.
It appears that when a user selects many files (25 in this case) the bash command in the command tag never gets executed. Therefore the job is never queued. The history item shows 'Waiting to run' indefinitely. Calling the script.sh manually with 25 input files works fine.
Any hint as to how to debug this would be greatly appreciated.
Thanks a lot Ulf
Can you see anything in the log about the job, and in particular the command line it would attempt to run?
Peter
************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
On Tue, Apr 22, 2014 at 4:34 PM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Hi Peter
I removed the unnecessary code.
If I run the tool with just a couple of inputs I see entries in the log files either from galaxy.jobs.runners.drmaa or from galaxy.jobs.runners.local that the job is being dispatched as normal.
Unfortunately there is no sign of the job in the log files when using more input files.
The command line that is supposed to be run is:
bash home/galaxy/galaxy-dist/tools/vcf_processing/vcf_to_fasta.sh /galaxy/database/files/042/dataset_42275.dat 40 10 50 0 40 0.9 20 /galaxy/database/files/041/dataset_41720.dat, /galaxy/database/files/041/dataset_41980.dat,
the first dat file being the output and the ones at the end being a comma separated list of the input files. On the command line this command works with much longer input files lists.
I wouldn't bother with the commas - that is just wasting characters and eating into the maximum command line string length.
Any ideas?
Check the limit with "xargs --show-limits" or "getconf ARG_MAX", our CentOS server reports: $ getconf ARG_MAX 2621440
Or is there a better practice to pass a large number of input files to a bash script?
Thanks Ulf
If there is any chance of your constructed command line string exceeding the system limit, I would construct an input file containing the filenames (e.g. one per line). That might be a practical solution anyway. For the file bases approach, I would used the Galaxy <configfile> tag. Some of the tools bundled with Galaxy also use this (find them with grep), or for example one of mine: https://github.com/peterjc/pico_galaxy/blob/master/tools/mira4/mira4_de_novo... Regards, Peter
Don't have a ton to add to Peter's suggestion I just wanted chime in that there is definitely not some inherit Galaxy limit on the backend for the number selections allow in a multiple input widget. I just tried a test with 4 times this number of inputs and it worked fine. This could well be reaching some sort of maximum command line length - but 20 some files seems a very low limit - in the past when I have hit such limits it has been on hundreds or thousands of files - not dozens. It is difficult to work with a large number of inputs because select2 takes over in the UI and provides a very frustrating data input experience for these inputs (see conversation here http://dev.list.galaxyproject.org/HTML-form-of-select-parameter-tt4663142.ht...). It may be worth applying this patch (https://bitbucket.org/galaxy/galaxy-central/pull-request/379/disable-select2...) suggested by Jeremy in that thread to simplify the tool UX and try again. -John On Tue, Apr 22, 2014 at 11:23 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Tue, Apr 22, 2014 at 4:34 PM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Hi Peter
I removed the unnecessary code.
If I run the tool with just a couple of inputs I see entries in the log files either from galaxy.jobs.runners.drmaa or from galaxy.jobs.runners.local that the job is being dispatched as normal.
Unfortunately there is no sign of the job in the log files when using more input files.
The command line that is supposed to be run is:
bash home/galaxy/galaxy-dist/tools/vcf_processing/vcf_to_fasta.sh /galaxy/database/files/042/dataset_42275.dat 40 10 50 0 40 0.9 20 /galaxy/database/files/041/dataset_41720.dat, /galaxy/database/files/041/dataset_41980.dat,
the first dat file being the output and the ones at the end being a comma separated list of the input files. On the command line this command works with much longer input files lists.
I wouldn't bother with the commas - that is just wasting characters and eating into the maximum command line string length.
Any ideas?
Check the limit with "xargs --show-limits" or "getconf ARG_MAX", our CentOS server reports:
$ getconf ARG_MAX 2621440
Or is there a better practice to pass a large number of input files to a bash script?
Thanks Ulf
If there is any chance of your constructed command line string exceeding the system limit, I would construct an input file containing the filenames (e.g. one per line). That might be a practical solution anyway.
For the file bases approach, I would used the Galaxy <configfile> tag. Some of the tools bundled with Galaxy also use this (find them with grep), or for example one of mine:
https://github.com/peterjc/pico_galaxy/blob/master/tools/mira4/mira4_de_novo...
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
John Chilton
-
Peter Cock
-
Ulf Schaefer