On Tue, Jun 17, 2014 at 4:57 PM, Jan Kanis <jan.code@jankanis.nl> wrote:Are you using the optional job splitting (parallelism) feature in Galaxy?
> Too bad there aren't any really good options. I will use the environment
> variable approach for the query size limit.
That seems to be me to be a good place to insert a Galaxy level
job size limit. e.g. BLAST+ jobs are split into 1000 query chunks,
so you might wish to impose a 25 chunk limit?
Long term being able to set limits on the input file parameters
of each tool would be nicer - e.g. Limit BLASTN to at most
20,000 queries, limit MIRA to at most 50GB FASTQ files, etc.
It would have to be sufficiently general, and backward compatible.
> For the gene bank links I guess modifying the .loc file is the least
> bad way. Maybe it can be merged into galaxy_blast, that would at
> least solve the interoperability problems.
FYI other people have also looked at extending the blast *.loc
files (e.g. adding a category column for helping filter down a
very large BLAST database list).
Without seeing your code, it is hard to say, but actually writing
> @Peter: One potential problem in merging my blast2html tool
> could be that I have written it in python3, and the current tool
> wrapper therefore installs python3 and a host of its dependencies,
> making for a quite large download.
Python code which works unmodified under Python 2.7 and
Python 3 is quite doable (and under Python 2.6 with a few
more provisos). Both NumPy and Biopython do this if you
wanted some reassurance.
On the other hand, Galaxy itself will need to more to Python 3
at some point, and certainly individual tools will too. This will
probably mean (as with Linux Python packages) having double
entries on the ToolSehd (one for Python 2, one for Python 3),
e.g ToolShed package for NumPy under Python 2 (done)
and under Python 3 (needed).
Peter