On Wed, Jun 18, 2014 at 12:14 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Wed, Jun 18, 2014 at 12:04 PM, Jan Kanis <jan.code@jankanis.nl> wrote:
I am not using job splitting, because I am implementing this for a client with a small (one machine) galaxy setup.
Ah - this also explains why a job size limit is important for you.
Implementing a query limit feature in galaxy core would probably be the best idea, but that would also probably require an admin screen to edit those limits, and I don't think I can sell the required time to my boss under the contract we have with the client.
The wrapper script idea I outlined to you earlier would be the least invasive (although might cause trouble if BLAST is run at the command line outside Galaxy), while your idea of inserting the check script into the Galaxy Tool XML just before running BLAST itself should also work well.
While looking an Jan's pull request to insert a query size limit before running BLAST https://github.com/peterjc/galaxy_blast/pull/43 I realised that this will not work so well if job-splitting is enabled. If using the job-splitting parallelism setting in Galaxy, then the BLAST query FASTA file is broken up into chunks of 1000 sequences. This means the new check would be make at the chunk level - so it could in effect catch extremely long query sequences (e.g. chromosomes), but could not block anyone submitting one query FASTA file containing many thousands of moderate length query sequences (e.g. genes). John - that Trello issue you logged, https://trello.com/c/0XQXVhRz Generic infrastructure to let deployers specify limits for tools based on input metadata (number of sequences, file size, etc...) Would it be fair to say this is not likely to be implemented in the near future? i.e. Should we consider implementing the BLAST query limit approach as a short term hack? Thanks, Peter