On Mon, Jun 16, 2014 at 4:18 AM, John Chilton <jmchilton@gmail.com> wrote:Hmm. Jan emailed me off list earlier about this. We could insert
> Hello Jan,
>
> Thanks for the clarification. Not quite what I was expecting so I am
> glad I asked - I don't have great answers for either case so hopefully
> other people will have some ideas.
>
> For the first use case - I would just specify some default input to
> supply to the input wrapper - lets call this N - add a parameter to
> the tool wrapper "--limit-size=N" - test that and then allow it to be
> overridden via an environment variable - so in your command block use
> "--limit-size=\${BLAST_QUERY_LIMIT:N}". This will use N is not limit
> is set, but deployers can set limits. There are a number of ways to
> set such variables - DRM specific environment files, login rc files,
> etc.... Just this last release I added the ability to define
> environment variables right in job_conf.xml
> (https://bitbucket.org/galaxy/galaxy-central/pull-request/378/allow-specification-of-environment/diff).
> I thought the tool shed might have a way to collect such definitions
> as well and insert them into package files - but Google failed to find
> this for me.
a pre-BLAST script to check the size of the query FASTA file,
and abort if it is too large (e.g. number of queries, total sequence
length, perhaps scaled according to the database size if we want
to get clever?).
I was hoping there was a more general mechanism in Galaxy -
after all, BLAST is by no means the only computationally
expensive tool ;)
We have had query files of 20,000 and more genes against NR
(both BLASTP and BLASTX), but our Galaxy has task-splitting
enabled so this becomes 20 (or more) individual cluster jobs
of 1000 queries each. This works fine apart from the occasional
glitch with the network drive when the data is merged afterwards.
(We know this failed once shortly after the underlying storage
had been expanded, and would have been under heavy load
rebalancing the data across the new disks.)
Currently the BLAST wrappers use the *.loc files directly, but
> Not sure about how to proceed with the second use case - extending the
> .loc file should work locally - I am not sure it is feasible within
> the context of the existing tool shed tools, data manager, etc.... You
> could certainly duplicate this stuff with your modifications - this
> how down sides in terms of interoperability though.
this is likely to switch to the newer "Data Manager" approach.
That may or may not complicate local modifications like adding
extra columns...
Thanks John,
> Sorry I don't have great answers for either question,
> -John
Peter