Local instance is running way too slow!!!
Dear All, I successfully install Galaxy onto my new MBP with 16Gb or Ram but when I tried to use Galaxy, it is painfully slow! The first test I did was to create Admin and import data (RNA-seq fastq, about 6Gb in size) into database and then history and it worked fine. The second test was to run fasqgroomer on this fasq and it took forever (3 hours+). Anybody got in idea of why it is so slow? Would it be possible that Galaxy was set up to run a single process instead of 8-core processor? If that is the case, how to fix it? Please help! Di Nguyen Postdoc, U of W, Seattle, WA
galaxy just wraps existing tools... so it's probably not galaxy that is slow per se, but the fastqgroomer too. Each tool has its own performance characteristics. I don't use fastqgroomer, so I don't know how it can be expected to perform. Are you sure you need it? If you know that your error is scaled in sanger units (iontorrent and casava 1.8 fastqs are), then you may not. If you look at your activity monitor you can see if CPU or disk is the limiting factor for the work you are doing. Brad On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:
Dear All,
I successfully install Galaxy onto my new MBP with 16Gb or Ram but when I tried to use Galaxy, it is painfully slow! The first test I did was to create Admin and import data (RNA-seq fastq, about 6Gb in size) into database and then history and it worked fine. The second test was to run fasqgroomer on this fasq and it took forever (3 hours+).
Anybody got in idea of why it is so slow? Would it be possible that Galaxy was set up to run a single process instead of 8-core processor? If that is the case, how to fix it?
Please help!
Di Nguyen Postdoc, U of W, Seattle, WA ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Brad Langhorst langhorst@neb.com 978-380-7564
Hello all, we were having the same issues with groomer taking up to 12 hours for large files. I had a look at the code and saw it was only using the single core. I changed the code to split the fastq input into multiple file parts and process it in parallel and reassemble the results. It also reassembles the aggregator data (which prints the final summary). For using 8 cores we saw a 7x improvement. Naturally the data-output is identical. One limitation is that it does not support fastq that has multiple lines per single sequence. I have read that this practice is discouraged anyway as it was problematic (though it was in the original spec) and I haven't seen this occur in our data so far. I believe there is still room to improve as the Python readLine has suboptimal performance as it will do too much file I/O without enough buffering. I'm new to bioinformatics, though i come from a history of R&D comp eng. If anyone is at the Chicago Galaxy conference, you can talk to Warren Kaplan about this. I can provide the code. regards Kenny ------ Bioinformatics Architect Garvan Institute On Wed, Jul 25, 2012 at 5:54 AM, Langhorst, Brad <Langhorst@neb.com> wrote:
galaxy just wraps existing tools... so it's probably not galaxy that is slow per se, but the fastqgroomer too. Each tool has its own performance characteristics.
I don't use fastqgroomer, so I don't know how it can be expected to perform.
Are you sure you need it?
If you know that your error is scaled in sanger units (iontorrent and casava 1.8 fastqs are), then you may not.
If you look at your activity monitor you can see if CPU or disk is the limiting factor for the work you are doing.
Brad On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:
Dear All,
I successfully install Galaxy onto my new MBP with 16Gb or Ram but when I tried to use Galaxy, it is painfully slow! The first test I did was to create Admin and import data (RNA-seq fastq, about 6Gb in size) into database and then history and it worked fine. The second test was to run fasqgroomer on this fasq and it took forever (3 hours+).
Anybody got in idea of why it is so slow? Would it be possible that Galaxy was set up to run a single process instead of 8-core processor? If that is the case, how to fix it?
Please help!
Di Nguyen Postdoc, U of W, Seattle, WA ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Brad Langhorst langhorst@neb.com 978-380-7564
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
If you problem really just is that fastq groomer is slower, I implemented several small optimizations for fastq groomer that I think resulted in a big improvement in performance. It seems it is not really used at my institution any more so I never pushed the changes out to our production server or pushed to hard on the pull request. But it did some testing as I was making the changes, and none of the changes broke the functional tests so there is some chance they don't break anything. You can pull my changes from here if you are interested: https://bitbucket.org/galaxy/galaxy-central/pull-request/20/fastq_groomer-op... -John ------------------------------------------------ John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 On Tue, Jul 24, 2012 at 6:32 PM, Kenny Sabir <traksewt@gmail.com> wrote:
Hello all,
we were having the same issues with groomer taking up to 12 hours for large files. I had a look at the code and saw it was only using the single core. I changed the code to split the fastq input into multiple file parts and process it in parallel and reassemble the results. It also reassembles the aggregator data (which prints the final summary).
For using 8 cores we saw a 7x improvement. Naturally the data-output is identical. One limitation is that it does not support fastq that has multiple lines per single sequence. I have read that this practice is discouraged anyway as it was problematic (though it was in the original spec) and I haven't seen this occur in our data so far.
I believe there is still room to improve as the Python readLine has suboptimal performance as it will do too much file I/O without enough buffering.
I'm new to bioinformatics, though i come from a history of R&D comp eng. If anyone is at the Chicago Galaxy conference, you can talk to Warren Kaplan about this. I can provide the code.
regards Kenny
------ Bioinformatics Architect Garvan Institute
On Wed, Jul 25, 2012 at 5:54 AM, Langhorst, Brad <Langhorst@neb.com> wrote:
galaxy just wraps existing tools... so it's probably not galaxy that is slow per se, but the fastqgroomer too. Each tool has its own performance characteristics.
I don't use fastqgroomer, so I don't know how it can be expected to perform.
Are you sure you need it?
If you know that your error is scaled in sanger units (iontorrent and casava 1.8 fastqs are), then you may not.
If you look at your activity monitor you can see if CPU or disk is the limiting factor for the work you are doing.
Brad On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:
Dear All,
I successfully install Galaxy onto my new MBP with 16Gb or Ram but when I tried to use Galaxy, it is painfully slow! The first test I did was to create Admin and import data (RNA-seq fastq, about 6Gb in size) into database and then history and it worked fine. The second test was to run fasqgroomer on this fasq and it took forever (3 hours+).
Anybody got in idea of why it is so slow? Would it be possible that Galaxy was set up to run a single process instead of 8-core processor? If that is the case, how to fix it?
Please help!
Di Nguyen Postdoc, U of W, Seattle, WA ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Brad Langhorst langhorst@neb.com 978-380-7564
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (4)
-
Di Nguyen
-
John Chilton
-
Kenny Sabir
-
Langhorst, Brad