Re: [galaxy-dev] Per-tool configuration

27 Jun 2014

      On Wed, Jun 18, 2014 at 12:14 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
...
On Wed, Jun 18, 2014 at 12:04 PM, Jan Kanis <jan.code@jankanis.nl> wrote:
...
I am not using job splitting, because I am implementing this for a client
with a small (one machine) galaxy setup.
Ah - this also explains why a job size limit is important for you.
...
Implementing a query limit feature in galaxy core would probably be the best
idea, but that would also probably require an admin screen to edit those
limits, and I don't think I can sell the required time to my boss under the
contract we have with the client.
The wrapper script idea I outlined to you earlier would be the least
invasive (although might cause trouble if BLAST is run at the command
line outside Galaxy), while your idea of inserting the check script into
the Galaxy Tool XML just before running BLAST itself should also
work well.
While looking an Jan's pull request to insert a query size limit before
running BLAST https://github.com/peterjc/galaxy_blast/pull/43
I realised that this will not work so well if job-splitting is enabled.

If using the job-splitting parallelism setting in Galaxy, then the BLAST
query FASTA file is broken up into chunks of 1000 sequences. This
means the new check would be make at the chunk level - so it could
in effect catch extremely long query sequences (e.g. chromosomes),
but could not block anyone submitting one query FASTA file containing
many thousands of moderate length query sequences (e.g. genes).

John - that Trello issue you logged, https://trello.com/c/0XQXVhRz
Generic infrastructure to let deployers specify limits for tools based
on input metadata (number of sequences, file size, etc...)

Would it be fair to say this is not likely to be implemented in the near
future? i.e. Should we consider implementing the BLAST query limit
approach as a short term hack?

Thanks,

Peter