On Tue, Jun 17, 2014 at 4:57 PM, Jan Kanis <jan.code@jankanis.nl> wrote:
Too bad there aren't any really good options. I will use the environment variable approach for the query size limit.
Are you using the optional job splitting (parallelism) feature in Galaxy? That seems to be me to be a good place to insert a Galaxy level job size limit. e.g. BLAST+ jobs are split into 1000 query chunks, so you might wish to impose a 25 chunk limit? Long term being able to set limits on the input file parameters of each tool would be nicer - e.g. Limit BLASTN to at most 20,000 queries, limit MIRA to at most 50GB FASTQ files, etc.
For the gene bank links I guess modifying the .loc file is the least bad way. Maybe it can be merged into galaxy_blast, that would at least solve the interoperability problems.
It would have to be sufficiently general, and backward compatible. FYI other people have also looked at extending the blast *.loc files (e.g. adding a category column for helping filter down a very large BLAST database list).
@Peter: One potential problem in merging my blast2html tool could be that I have written it in python3, and the current tool wrapper therefore installs python3 and a host of its dependencies, making for a quite large download.
Without seeing your code, it is hard to say, but actually writing Python code which works unmodified under Python 2.7 and Python 3 is quite doable (and under Python 2.6 with a few more provisos). Both NumPy and Biopython do this if you wanted some reassurance. On the other hand, Galaxy itself will need to more to Python 3 at some point, and certainly individual tools will too. This will probably mean (as with Linux Python packages) having double entries on the ToolSehd (one for Python 2, one for Python 3), e.g ToolShed package for NumPy under Python 2 (done) and under Python 3 (needed). Peter