Add library to dataset performance metric: developer vs production instances
Hello, Today I was routinely adding a 27GB Illumina lane on my galaxy instance running on a cluster node. Just the regular cloned-from-hg type of instance with set_metadata_externally, no more tuning. It took more than 10 minutes to have the dataset imported into a data library via the filesystem path upload method... not copying it into galaxy, just "linking". galaxy.jobs INFO 2011-09-19 18:05:08,641 job 120 dispatched (...) galaxy.jobs DEBUG 2011-09-19 18:16:52,822 job 120 ended galaxy.datatypes.metadata DEBUG 2011-09-19 18:16:52,824 Cleaning up external metadata files Since I cannot add datasets to libraries in usegalaxy.org and compare, I was wondering if someone can state an approximated average time *for a production* galaxy installation to do that operation. I would like to have some empirical number to show on how a production deployment[1] could speed things up, as opposed to having individual galaxy instances per user in a cluster (as per IT policies): http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/ Thanks in advance ! Roman [1] http://usegalaxy.org/production
Hi Roman, This is a good question for the development community to provide feedback on, so I'll cross-post your question over to that list. Best, Jen Galaxy team On 9/19/11 2:30 PM, Roman Valls wrote:
Hello,
Today I was routinely adding a 27GB Illumina lane on my galaxy instance running on a cluster node. Just the regular cloned-from-hg type of instance with set_metadata_externally, no more tuning.
It took more than 10 minutes to have the dataset imported into a data library via the filesystem path upload method... not copying it into galaxy, just "linking".
galaxy.jobs INFO 2011-09-19 18:05:08,641 job 120 dispatched (...) galaxy.jobs DEBUG 2011-09-19 18:16:52,822 job 120 ended galaxy.datatypes.metadata DEBUG 2011-09-19 18:16:52,824 Cleaning up external metadata files
Since I cannot add datasets to libraries in usegalaxy.org and compare, I was wondering if someone can state an approximated average time *for a production* galaxy installation to do that operation.
I would like to have some empirical number to show on how a production deployment[1] could speed things up, as opposed to having individual galaxy instances per user in a cluster (as per IT policies):
http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/
Thanks in advance ! Roman
[1] http://usegalaxy.org/production ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
We routinely put large compressed fastq files into data libraries by that method (linking, no copy) and it is very fast, since the patch that stopped it decompressing the files. You should probably make sure you specify the file format (fastqsanger) so Galaxy does not attempt to sniff the file to learn its datatype. John Duddy Sr. Staff Software Engineer Illumina, Inc. 9885 Towne Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail: jduddy@illumina.com -----Original Message----- From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Jennifer Jackson Sent: Thursday, September 29, 2011 12:13 PM To: Roman Valls; Galaxy-Dev Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-dev] [galaxy-user] Add library to dataset performance metric: developer vs production instances Hi Roman, This is a good question for the development community to provide feedback on, so I'll cross-post your question over to that list. Best, Jen Galaxy team On 9/19/11 2:30 PM, Roman Valls wrote:
Hello,
Today I was routinely adding a 27GB Illumina lane on my galaxy instance running on a cluster node. Just the regular cloned-from-hg type of instance with set_metadata_externally, no more tuning.
It took more than 10 minutes to have the dataset imported into a data library via the filesystem path upload method... not copying it into galaxy, just "linking".
galaxy.jobs INFO 2011-09-19 18:05:08,641 job 120 dispatched (...) galaxy.jobs DEBUG 2011-09-19 18:16:52,822 job 120 ended galaxy.datatypes.metadata DEBUG 2011-09-19 18:16:52,824 Cleaning up external metadata files
Since I cannot add datasets to libraries in usegalaxy.org and compare, I was wondering if someone can state an approximated average time *for a production* galaxy installation to do that operation.
I would like to have some empirical number to show on how a production deployment[1] could speed things up, as opposed to having individual galaxy instances per user in a cluster (as per IT policies):
http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/
Thanks in advance ! Roman
[1] http://usegalaxy.org/production ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
participants (3)
-
Duddy, John
-
Jennifer Jackson
-
Roman Valls