We initially had issues with Collections containing thousands of datasets that was related to the limit of jobs in the Slurm queue - excessively increasing this limit fixed our issue. 


Cheers, 
Mo Heydarian



On Thu, Aug 30, 2018 at 11:18 AM Peter Cock <p.j.a.cock@googlemail.com> wrote:
There is a sweet spot for splitting your BLAST query fasta file
by sequence - one big file with 25000 sequences is not great,
but one sequence per file is the worst possible option.

This is due to all the extra overheads, you would have 25000
jobs submitted to the cluster, each of which would load the
BLAST binary and database off disk etc. And there are also
going to be Galaxy overheads with a large collection as well.

I would suggest somewhere around 500 to 1000 gene sequences
per FASTQ query file is likely a safe choice. If you have very
long sequences (e.g. chromosomes or contigs), then use less.

As to the number of threads for each BLAST job, more is better,
but what to pick will depend on your cluster and how often there
are threads free on nodes. I would suggest trying 4, 8 or 16 threads.

I hope that helps.

Peter


On Thu, Aug 30, 2018 at 3:50 PM Jochen Bick <jochen.bick@usys.ethz.ch> wrote:
>
> Thanks Peter,
>
> so my idea was to split my problem into single blast jobs and run them
> only on one core...
> So my file has 25000 sequences and I'm blasting them against all NCBI
> proteins (nr). This just take to long time. I guess because the database
> is also very big? I tested this on the first 10 sequences and it took
> about 10mins. But maybe this is still not faster than running all at once?
> How many cores would you give such a job?
>
> Cheers Jochen
>
> On 30.08.2018 16:44, Peter Cock wrote:
> > If there are any limits, it would be down to the Galaxy Admin's job
> > settings - something generic with collections.
> >
> > Personally I've not done this - I tend to concatenate FASTA files
> > to make large files with multiple sequences instead.
> >
> > (And then we have the optional task splitting enabled so that Galaxy
> > breaks up the multiple-sequence FASTA file into chunks which
> > get shared out on our cluster for better throughput before
> > concatenating the output back into a single file.)
> >
> > Peter
> > On Thu, Aug 30, 2018 at 3:37 PM Jochen Bick <jochen.bick@usys.ethz.ch> wrote:
> >>
> >> Hi,
> >>
> >> is there any limit to run BLAST jobs from a collection of single FASTA
> >> files? I started a job but is does not get executed... its just sending
> >> for about an hour.
> >>
> >> Cheers Jochen
> >>
> >> --
> >> ETH Zurich
> >> *Jochen Bick*
> >> Animal Physiology
> >> Institute of Agricultural Sciences
> >> Postal address: Universitätstrasse 2 / LFW B 58.1
> >> Office: Tannenstrasse 1 / TAN D 6.2
> >> 8092 Zurich, Switzerland
> >>
> >> Phone +41 44 632 28 25
> >> jochen.bick@usys.ethz.ch <mailto:jochen.bick@usys.ethz.ch>
> >> www.ap.ethz.ch
> >> ___________________________________________________________
> >> Please keep all replies on the list by using "reply all"
> >> in your mail client.  To manage your subscriptions to this
> >> and other Galaxy lists, please use the interface at:
> >>   https://lists.galaxyproject.org/
> >>
> >> To search Galaxy mailing lists use the unified search at:
> >>   http://galaxyproject.org/search/
>
> --
> ETH Zurich
> *Jochen Bick*
> Animal Physiology
> Institute of Agricultural Sciences
> Postal address: Universitätstrasse 2 / LFW B 58.1
> Office: Tannenstrasse 1 / TAN D 6.2
> 8092 Zurich, Switzerland
>
> Phone +41 44 632 28 25
> jochen.bick@usys.ethz.ch <mailto:jochen.bick@usys.ethz.ch>
> www.ap.ethz.ch
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/