Hello all,
Continuing the search for slowness in my local Galaxy server (see
http://lists.bx.psu.edu/pipermail/galaxy-dev/2009-December/001549.html ),
The datatypes/sequence.py file is also scanning and parsing entire files when creating a
new FASTA/FASTQ file.
It's nice and fun and informative for small files, but with a 2.7GB FASTA file - the
python process stays at 100% CPU for a long long time, causing everything else to be very
slow.
The offending code is at sequence.py, method "set_meta", lines 30-39.
I think Illumina expects 25x coverage of the human genome in a single run by the end of
the year - this will roughly translates to 8 FASTQ files of more than 8GB each => FASTA
files of 4GB each... Galaxy will not be able to just casually scan these files.
-gordon