On Mon, Jun 16, 2014 at 4:18 AM, John Chilton <jmchilton@gmail.com> wrote:
Hello Jan,
Thanks for the clarification. Not quite what I was expecting so I am glad I asked - I don't have great answers for either case so hopefully other people will have some ideas.
For the first use case - I would just specify some default input to supply to the input wrapper - lets call this N - add a parameter to the tool wrapper "--limit-size=N" - test that and then allow it to be overridden via an environment variable - so in your command block use "--limit-size=\${BLAST_QUERY_LIMIT:N}". This will use N is not limit is set, but deployers can set limits. There are a number of ways to set such variables - DRM specific environment files, login rc files, etc.... Just this last release I added the ability to define environment variables right in job_conf.xml (https://bitbucket.org/galaxy/galaxy-central/pull-request/378/allow-specifica...). I thought the tool shed might have a way to collect such definitions as well and insert them into package files - but Google failed to find this for me.
Hmm. Jan emailed me off list earlier about this. We could insert a pre-BLAST script to check the size of the query FASTA file, and abort if it is too large (e.g. number of queries, total sequence length, perhaps scaled according to the database size if we want to get clever?). I was hoping there was a more general mechanism in Galaxy - after all, BLAST is by no means the only computationally expensive tool ;) We have had query files of 20,000 and more genes against NR (both BLASTP and BLASTX), but our Galaxy has task-splitting enabled so this becomes 20 (or more) individual cluster jobs of 1000 queries each. This works fine apart from the occasional glitch with the network drive when the data is merged afterwards. (We know this failed once shortly after the underlying storage had been expanded, and would have been under heavy load rebalancing the data across the new disks.)
Not sure about how to proceed with the second use case - extending the .loc file should work locally - I am not sure it is feasible within the context of the existing tool shed tools, data manager, etc.... You could certainly duplicate this stuff with your modifications - this how down sides in terms of interoperability though.
Currently the BLAST wrappers use the *.loc files directly, but this is likely to switch to the newer "Data Manager" approach. That may or may not complicate local modifications like adding extra columns...
Sorry I don't have great answers for either question, -John
Thanks John, Peter