Hi David,
The NCBI BLAST+ wrappers have a <parallelism> tag setup,
which becomes active if you have use_tasked_jobs = True in
your config/galaxy.ini file (aka universe_wsgi.ini).
Specifically, the wrappers use this:
<!-- If job splitting is enabled, break up the query file into parts -->
<parallelism method="multi" split_inputs="query" split_mode="to_size"
split_size="1000" merge_outputs="output1" />
This is hard coded to break up the query FASTA file into batches
of 1000 sequences (e.g. a transcriptome of 20k genes becomes
20 jobs), which has worked nicely on our cluster.
Separately, each job uses -num_threads "\${GALAXY_SLOTS:-8}"
in the command line string, i.e. uses the $GALAXY_SLOTS
environment variable (set via the Galaxy job configuration), or
if not set, defaults to using 8 threads.
I've essentially rephrased the README file here - did you see
that, or does it need more information added?
Thanks,
Peter
On Tue, May 3, 2016 at 6:58 PM, David Kovalic <kovalic@analome.com> wrote:
> Hello,
>
> We would like to split fasta query files and run multiple concurrent jobs to
> minimize our processing wall clock time for large jobs.
>
> After chatting with folks at GCC 2015 I understand this is possible, my
> problem is I cant find instructions on hos to configure
> CloudMan/ncbi_blast_plus to do this. For those of you who know me it
> probably goes without saying that I can't figure it out myself ;)
>
> Peter/Enis/others, can you help us out with this question?
>
> Thanks,
>
> David
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/